Thread: On hardcoded type aliases and typmod for user types

On hardcoded type aliases and typmod for user types

From

Martijn van Oosterhout

Date:

31 August 2005, 10:40:51

Hi,

As part of previous discussions about typmod for user type, Tom
mentioned that you would need to make type and function names
equivalent. As it turns out, if you refactor a few rules, you can
actually make it work and manage them seperately. For this to work the
current "col_name_keyword" non-terminal has to be divided into few more
categories. The criterion is mostly whether it is followed by a left
parenthsis.

1. Some can be moved to unreserved words, mostly the types BIT, VARCHAR,
BIGINT etc since they don't need special rules anymore.

2. Words that have special productions (eg CONVERT, COALESCE etc),
these can still only be used as column names, not types or
(user-defined) functions.

3. Words which can be column names functions but not types. These never
appear normally with a parenthesis, so they will be interpreted as a
function if there is one. (eg SETOF, NATIONAL, etc)

4. Words that can be column names and types but not functions. These
are artifacts of the grammer due to the fact that VARYING is
unrestricted. These are BIT, NCHAR, CHAR_P and CHARACTER.

After this, you can have user-functions that take an arbitrary set of
parameters. There'll need to be a bit of code to verify the arguments.
It results in a net gain of 15 keywords for functions and about 30 for
types.

My question is, should users be able to create types schema.int4 and
schema.integer simultaneously. Currently it allows you but it's not
handled very well (\dT doesn't list both). Should this be allowed?
Should aliasing for DEC and DECIMAL -> NUMERIC be done for
user-defined types?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: On hardcoded type aliases and typmod for user types

From

Tom Lane

Date:

31 August 2005, 12:11:12

Martijn van Oosterhout <kleptog@svana.org> writes:
> My question is, should users be able to create types schema.int4 and
> schema.integer simultaneously. Currently it allows you but it's not
> handled very well (\dT doesn't list both). Should this be allowed?
> Should aliasing for DEC and DECIMAL -> NUMERIC be done for
> user-defined types?

IMHO, ideally the aliasing should *only* apply to the built-in types.
The current hack only approximates this (IIRC, the translation happens
for any unqualified type name, independently of one's search path).

One possible approach is to remove the aliasing translation from the
grammar altogether, and add a notion of "alias" entries in pg_type that
would be found through normal lookup and then replaced by the underlying
type by parse analysis rather than by the grammar.  We could not do this
in the existing system because of the need to handle typmods for some of
the aliases ... but maybe it would work given generalized typmod
support.  There are still a few special cases though, like CHARACTER
VARYING, which seems like it more or less has to be wired into the
grammar.

BTW, the proposed refactoring sounds messier to me than does decreeing
type and function names equivalent ...
        regards, tom lane

Re: On hardcoded type aliases and typmod for user types

From

Martijn van Oosterhout

Date:

31 August 2005, 14:59:23

On Wed, Aug 31, 2005 at 11:11:04AM -0400, Tom Lane wrote:
> IMHO, ideally the aliasing should *only* apply to the built-in types.
> The current hack only approximates this (IIRC, the translation happens
> for any unqualified type name, independently of one's search path).
>
> One possible approach is to remove the aliasing translation from the
> grammar altogether, and add a notion of "alias" entries in pg_type that
> would be found through normal lookup and then replaced by the underlying
> type by parse analysis rather than by the grammar.  We could not do this
> in the existing system because of the need to handle typmods for some of
> the aliases ... but maybe it would work given generalized typmod
> support.  There are still a few special cases though, like CHARACTER
> VARYING, which seems like it more or less has to be wired into the
> grammar.

Yeah, I was thinking about alias entries. I was thinking that domains
might already do a lot of the work. But then it's not really aliasing
anymore.

> BTW, the proposed refactoring sounds messier to me than does decreeing
> type and function names equivalent ...

Actually, it's not that bad. The non-terminals relating to types
collapse into about three with only "xxx VARYING" and "DOUBLE
PRECISION" and a few others remaining. The keywords are split into
three sets like I mentioned. Then you make the rules for GenericType
look enough like the expansion of func_expr, that bison can consider
the context after before deciding it's a function or a type.

The changes are not that great. The number of rules (according to the
-v output) goes from 1610 to 1601 and states from 2810 to 2777. OTOH, a
chunk of code moves from gram.y to the adt directory I guess. The
grammar is just the beginning of the work.

Though maybe the point is that we can take the easy way and implement
the slightly more difficult if it turns out the be necessary.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: On hardcoded type aliases and typmod for user types

From

Tom Lane

Date:

31 August 2005, 15:31:14

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Wed, Aug 31, 2005 at 11:11:04AM -0400, Tom Lane wrote:
>> One possible approach is to remove the aliasing translation from the
>> grammar altogether, and add a notion of "alias" entries in pg_type that
>> would be found through normal lookup and then replaced by the underlying
>> type by parse analysis rather than by the grammar.

> Yeah, I was thinking about alias entries. I was thinking that domains
> might already do a lot of the work. But then it's not really aliasing
> anymore.

Right, a domain isn't quite the same thing.  But you could probably use
domains temporarily for prototyping it.

One reason that I think a domain isn't the same thing is that I believe
domains don't have typmods.  Although you could imagine a domain passing
a typmod down to its base type, that's not what the spec expects AFAICS.
You're supposed to writecreate domain mytype as varchar(4);
There's nothing likecreate domain mytype(n) as varchar(n);
in the spec (and no I don't really wish to invent it...)

> Though maybe the point is that we can take the easy way and implement
> the slightly more difficult if it turns out the be necessary.

That seems fair to me.  Now that we have knowledge in the archives about
how to do it the hard way if needed, we can take the easy way until we
run into an actual need for the hard way.

I still like the idea of pushing the aliasing out of the grammar,
though.  Come to think of it, we could probably even handle the
multiple-word stuff that way: let the grammar convert CHARACTER VARYING
to "character varying" and have an alias with that name in the catalog.

One thing you'd need to look at is that format_type is aware of the
special properties of the alias names: at present they never need to be
schema-qualified, but this would no longer be certainly the case with
the aliasing approach.  A possible answer is for format_type to work by
replacing (say) INT4OID with the OID of the alias type that has the
desired spelling, and then use the same TypeIsVisible test as is applied
to any user type.  Another thing that is involved there is not
double-quoting the generated names ... we don't want it to emit
"character varying" but the user-type path would do that.

Hmm... actually there's a bit of an issue here, which is that it's not
clear whether schema qualification makes sense for the multi-word type
names.  For instancepg_catalog.character varying
seems both ugly and syntactically ambiguous.  So maybe we need to stick
to the present solution for the multi-word type names: they are expanded
by the grammar to pre-qualified names, and so you cannot have a user
type selected by such a name, and format_type keeps its current special
case approach to generating them.
        regards, tom lane

Re: On hardcoded type aliases and typmod for user types

From

Martijn van Oosterhout

Date:

31 August 2005, 17:44:14

On Wed, Aug 31, 2005 at 02:25:54PM -0400, Tom Lane wrote:
> I still like the idea of pushing the aliasing out of the grammar,
> though.  Come to think of it, we could probably even handle the
> multiple-word stuff that way: let the grammar convert CHARACTER VARYING
> to "character varying" and have an alias with that name in the catalog.

Currently, a user-defined type, function, table etc is one IDENT, I
don't see any point in changing that. The standard wants "character
varying" and we can support that but we don't need to extend that to
user-defined types. If people really want that they can quote it.
Besides, the character stuff is somewhat special as it has the
"character set" stuff so will need special grammer anyway.

> One thing you'd need to look at is that format_type is aware of the
> special properties of the alias names: at present they never need to be
> schema-qualified, but this would no longer be certainly the case with
> the aliasing approach.  A possible answer is for format_type to work by
> replacing (say) INT4OID with the OID of the alias type that has the
> desired spelling, and then use the same TypeIsVisible test as is applied
> to any user type.  Another thing that is involved there is not
> double-quoting the generated names ... we don't want it to emit
> "character varying" but the user-type path would do that.

I was thinking actually of setting the type searching code to search
pg_catalog before the normal search_path. The types being hardwired
into the grammer essentially implied this so I thought I would avoid
surprises.

This ofcourse would mean that all built-in types would automatically
override user-defined ones, which actually sucks if PostgreSQL keeps
including more types by default. OTOH, then types in pg_catalog never
need to be qualified, making it easy for format_type.

Now i think about it it may not be a good idea, for all its benefits.
Perhaps only doing it for multiword types. Damn special casing.
>
> Hmm... actually there's a bit of an issue here, which is that it's not
> clear whether schema qualification makes sense for the multi-word type
> names.  For instance
>     pg_catalog.character varying

It doesn't work. The current grammer, even now, treats anything schema
qualified as non-special. You can't schema qualify char(4) even if you
want to. Incidently, these typmod changes for user types would make
this work as a side-effect.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: On hardcoded type aliases and typmod for user types

From

Tom Lane

Date:

31 August 2005, 18:18:49

Martijn van Oosterhout <kleptog@svana.org> writes:
> I was thinking actually of setting the type searching code to search
> pg_catalog before the normal search_path. The types being hardwired
> into the grammer essentially implied this so I thought I would avoid
> surprises.

That strikes me as an unnecessary reduction in flexibility.  As long as
we make the hardwired type names translate to qualified names (same as
they do now) we don't have to assume any such thing.

(What I might actually favor doing that for is operators, because the
schema-qualified syntax for operators is so gross.  But I don't see a
need for it for type names.)

>> Hmm... actually there's a bit of an issue here, which is that it's not
>> clear whether schema qualification makes sense for the multi-word type
>> names.  For instance
>> pg_catalog.character varying

> It doesn't work. The current grammer, even now, treats anything schema
> qualified as non-special. You can't schema qualify char(4) even if you
> want to. Incidently, these typmod changes for user types would make
> this work as a side-effect.

Right.  I think thatpg_catalog.varchar(n)
is reasonable and should be accepted, but I'm fine with decreeing thatcharacter varying(n)
will always be special non-schema-qualified syntax (which the grammar
will in fact translate into the former).

The point about character sets is a bit distressing; here we are
designing a new general-purpose mechanism and we can already see
cases it doesn't handle.  Can we fix that?
        regards, tom lane

Re: On hardcoded type aliases and typmod for user types

From

Martijn van Oosterhout

Date:

01 September 2005, 04:26:27

On Wed, Aug 31, 2005 at 05:14:13PM -0400, Tom Lane wrote:
> That strikes me as an unnecessary reduction in flexibility.  As long as
> we make the hardwired type names translate to qualified names (same as
> they do now) we don't have to assume any such thing.

Ack, there's fortunatly only a handful of those.

> The point about character sets is a bit distressing; here we are
> designing a new general-purpose mechanism and we can already see
> cases it doesn't handle.  Can we fix that?

Err, well. My thought was a certain group of type-suffix options would
be permitted (only zero or one at a time), for example:
  WITH TIME ZONE  WITHOUT TIME ZONE  CHARACTER SET xxx

And have the grammer accept these after any type. For example, the type
NUMERIC WITH TIME ZONE would be syntactically valid but the code would
then reject it. You have a issue then because the typmod function
should then be able to return a completely different type because the
system looked up "timestamp" and now the function determines that with
that option, it should actually be "timestamptz".

As for the specific mechanism, well, my options were (in the TYPE
declaration statement:
  TYPMODFUNC = function( intarray [, sometype] ) RETURNS int32 or intarray

This restricts the arguments between the brackets to integers, is this
reasonable? The sometype would be something to handle the suffix
options. (Text pair? option,value). Returning an intarray if a new type
is allowed.
  TYPMODFUNC = function( recordtype [, sometype ] ) RETURNS int32 or intarray

The record type could then indicate what's supported, except you can't
pass a variable number of arguments (for NUMERIC). How about fill up
from the front, leave NULLs for all the unfilled ones. The STRICT flag
could tell if all fields need to be filled (ugh).
  TYPMODFUNC = function( internal [, sometype ] ) RETURNS int32 or intarray

Simply pass the (Node*) from the parser and let the function sort it
out itself. Except now they have to be written in C. Is this
unreasonable, it's called fairly early on, all the issues with no valid
snapshot apply here and you can't defer the evaluation till later.

I'm not sure how to choose, they all handle the current situation fine
but what do we want to allow users to do in the future? Is the SQL
standard likely to come up with SOMETYPE(ident) as a declaration, in
which case we need the second or third options. Ident can be converted
to a constant string for these purposes.

And then there's output to consider, currently timestamp etc have
special cases. But if you're going to allow "CHARACTER SET xxx" to
apply to any type, you need a way to reconstruct the values for output.
Requireing the user the provide an inverse function is one (possibly
unreliable) way. Storing the arguments directly is another. And is one
int32 typmod sufficient? This character set per column has been talked
about for a while, but where was the information going to be stored?

There's several issues to be sorted out yet, I fear.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: On hardcoded type aliases and typmod for user types

From

Hannu Krosing

Date:

01 September 2005, 05:12:36

On N, 2005-09-01 at 09:26 +0200, Martijn van Oosterhout wrote:
> On Wed, Aug 31, 2005 at 05:14:13PM -0400, Tom Lane wrote:
> > That strikes me as an unnecessary reduction in flexibility.  As long as
> > we make the hardwired type names translate to qualified names (same as
> > they do now) we don't have to assume any such thing.
> 
> Ack, there's fortunatly only a handful of those.
> 
> > The point about character sets is a bit distressing; here we are
> > designing a new general-purpose mechanism and we can already see
> > cases it doesn't handle.  Can we fix that?
> 
> Err, well. My thought was a certain group of type-suffix options would
> be permitted (only zero or one at a time), for example:
> 
>    WITH TIME ZONE
>    WITHOUT TIME ZONE
>    CHARACTER SET xxx
> 
> And have the grammer accept these after any type. 

Maybe make the last one "WITH CHARACTER SET xxx" and promote WITH to a
real keyword.

It seems a good idea to have WITH as a real keyword anyway, as at least
ANSI/ISO syntax for recursive queries seem to require it too.

-- 
Hannu Krosing <hannu@skype.net>

Re: On hardcoded type aliases and typmod for user types

From

Dennis Bjorklund

Date:

01 September 2005, 06:18:17

On Thu, 1 Sep 2005, Martijn van Oosterhout wrote:

> Err, well. My thought was a certain group of type-suffix options would
> be permitted (only zero or one at a time), for example:
> 
>    WITH TIME ZONE
>    WITHOUT TIME ZONE
>    CHARACTER SET xxx

String types have 3 modifiers, the length, the charset and the collation. 
The syntax of these are defined by the standard so at least that syntax 
ought to be allowed (even if there are more work to actually do anything 
with charset and collation info).

-- 
/Dennis Björklund

Re: On hardcoded type aliases and typmod for user types

From

Martijn van Oosterhout

Date:

01 September 2005, 06:33:57

On Thu, Sep 01, 2005 at 11:18:04AM +0200, Dennis Bjorklund wrote:
> String types have 3 modifiers, the length, the charset and the collation.
> The syntax of these are defined by the standard so at least that syntax
> ought to be allowed (even if there are more work to actually do anything
> with charset and collation info).

From a quick browse in google it looks like:
  CHAR(4) COLLATE xxx CHARACTER SET yyy

is valid syntax. So I guess that becomes 0 or more modifiers. And a
single int32 is not going to cut it.

More fields (typmod2, typmod3)? Fields explicitly for this purpose
(typcollate and typcharset), array of int32?

Which would make the second argument to the typmod function ARRAY OF
(textpair) or some such. If the function doesn't accept that then this
is a quick indicator that no options are allowed.

Quick thing, should 'mytype' and 'mytype()' be considered the same and
should they default to typmod -1? Currently '()' is not even accepted.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: On hardcoded type aliases and typmod for user types

From

Martijn van Oosterhout

Date:

01 September 2005, 07:47:19

On Thu, Sep 01, 2005 at 11:12:26AM +0300, Hannu Krosing wrote:
> Maybe make the last one "WITH CHARACTER SET xxx" and promote WITH to a
> real keyword.
>
> It seems a good idea to have WITH as a real keyword anyway, as at least
> ANSI/ISO syntax for recursive queries seem to require it too.

Sorry, CHARACTER SET is defined by SQL standard. I don't understand
what it is there for though, I thought the point of UNICODE/UTF-8 was
to get rid of all this crap. I also can't find the bit that explains
what should happen if two strings of different character sets are
concatinated. The only thing I can think this useful for is default
input/output charset, overriding client_encoding, and internally
everything is still UNICODE.

The COLLATE stuff is neat, if we can get it work. Maybe CHARSET is a
roundabout way to specify the COLLATE order?

Incidently, I just downloaded the SQL99 spec and am slightly confused
by some of the things they'd added. Am I the only one?
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: On hardcoded type aliases and typmod for user types

From

Tom Lane

Date:

01 September 2005, 09:50:28

Martijn van Oosterhout <kleptog@svana.org> writes:
>    TYPMODFUNC =3D function( internal [, sometype ] ) RETURNS int32 or intar=
> ray

> Simply pass the (Node*) from the parser and let the function sort it
> out itself. Except now they have to be written in C. Is this
> unreasonable,

Nope.  You're not going to be writing any interesting datatypes without
using C, anyway.  I'd go with this one to minimize flexibility problems.

I'd be inclined to stick with the convention that internally typmod is
a single int32; that is really wired into way too many APIs to consider
changing.  varchar could do something like using 24 bits for the length
and 8 bits for an encoded indication of the charset.
        regards, tom lane

Re: On hardcoded type aliases and typmod for user types

From

Martijn van Oosterhout

Date:

01 September 2005, 13:01:38

On Thu, Sep 01, 2005 at 08:50:27AM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > Simply pass the (Node*) from the parser and let the function sort it
> > out itself. Except now they have to be written in C. Is this
> > unreasonable,
>
> Nope.  You're not going to be writing any interesting datatypes without
> using C, anyway.  I'd go with this one to minimize flexibility problems.

Ack

> I'd be inclined to stick with the convention that internally typmod is
> a single int32; that is really wired into way too many APIs to consider
> changing.  varchar could do something like using 24 bits for the length
> and 8 bits for an encoded indication of the charset.

With the unfortunate effect that strings are limited to 16Mb instead of
1Gb. Not sure if people will be happy with that one. For my locale
experiments I used my taggedtypes module to embed the locale into the
data itself, I imagine something similar could be used.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: On hardcoded type aliases and typmod for user types

From

Tom Lane

Date:

01 September 2005, 13:46:27

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Thu, Sep 01, 2005 at 08:50:27AM -0400, Tom Lane wrote:
>> varchar could do something like using 24 bits for the length
>> and 8 bits for an encoded indication of the charset.

> With the unfortunate effect that strings are limited to 16Mb instead of
> 1Gb.

No, only that you can't declare a specific maxlength for a varchar that
exceeds 16Mb ... and guess what, there is already a smaller limit than
that, on the grounds that if a user writes varchar(1000000000) he needs
a cluebat anyway.  I think you might have momentarily confused typmod
with the varlena length word of an individual value.
        regards, tom lane

Re: On hardcoded type aliases and typmod for user types

From

Bruce Momjian

Date:

14 June 2006, 14:56:28

Thread added to TODO.

---------------------------------------------------------------------------

Martijn van Oosterhout wrote:
-- Start of PGP signed section.
> Hi,
> 
> As part of previous discussions about typmod for user type, Tom
> mentioned that you would need to make type and function names
> equivalent. As it turns out, if you refactor a few rules, you can
> actually make it work and manage them seperately. For this to work the
> current "col_name_keyword" non-terminal has to be divided into few more
> categories. The criterion is mostly whether it is followed by a left
> parenthsis.
> 
> 1. Some can be moved to unreserved words, mostly the types BIT, VARCHAR,
> BIGINT etc since they don't need special rules anymore.
> 
> 2. Words that have special productions (eg CONVERT, COALESCE etc),
> these can still only be used as column names, not types or
> (user-defined) functions.
> 
> 3. Words which can be column names functions but not types. These never
> appear normally with a parenthesis, so they will be interpreted as a
> function if there is one. (eg SETOF, NATIONAL, etc)
> 
> 4. Words that can be column names and types but not functions. These
> are artifacts of the grammer due to the fact that VARYING is
> unrestricted. These are BIT, NCHAR, CHAR_P and CHARACTER.
> 
> After this, you can have user-functions that take an arbitrary set of
> parameters. There'll need to be a bit of code to verify the arguments.
> It results in a net gain of 15 keywords for functions and about 30 for
> types.
> 
> My question is, should users be able to create types schema.int4 and
> schema.integer simultaneously. Currently it allows you but it's not
> handled very well (\dT doesn't list both). Should this be allowed?
> Should aliasing for DEC and DECIMAL -> NUMERIC be done for
> user-defined types?
> 
> Have a nice day,
> -- 
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> > tool for doing 5% of the work and then sitting around waiting for someone
> > else to do the other 95% so you can sue them.
-- End of PGP section, PGP failed!

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +