Thread: On hardcoded type aliases and typmod for user types
Hi, As part of previous discussions about typmod for user type, Tom mentioned that you would need to make type and function names equivalent. As it turns out, if you refactor a few rules, you can actually make it work and manage them seperately. For this to work the current "col_name_keyword" non-terminal has to be divided into few more categories. The criterion is mostly whether it is followed by a left parenthsis. 1. Some can be moved to unreserved words, mostly the types BIT, VARCHAR, BIGINT etc since they don't need special rules anymore. 2. Words that have special productions (eg CONVERT, COALESCE etc), these can still only be used as column names, not types or (user-defined) functions. 3. Words which can be column names functions but not types. These never appear normally with a parenthesis, so they will be interpreted as a function if there is one. (eg SETOF, NATIONAL, etc) 4. Words that can be column names and types but not functions. These are artifacts of the grammer due to the fact that VARYING is unrestricted. These are BIT, NCHAR, CHAR_P and CHARACTER. After this, you can have user-functions that take an arbitrary set of parameters. There'll need to be a bit of code to verify the arguments. It results in a net gain of 15 keywords for functions and about 30 for types. My question is, should users be able to create types schema.int4 and schema.integer simultaneously. Currently it allows you but it's not handled very well (\dT doesn't list both). Should this be allowed? Should aliasing for DEC and DECIMAL -> NUMERIC be done for user-defined types? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > My question is, should users be able to create types schema.int4 and > schema.integer simultaneously. Currently it allows you but it's not > handled very well (\dT doesn't list both). Should this be allowed? > Should aliasing for DEC and DECIMAL -> NUMERIC be done for > user-defined types? IMHO, ideally the aliasing should *only* apply to the built-in types. The current hack only approximates this (IIRC, the translation happens for any unqualified type name, independently of one's search path). One possible approach is to remove the aliasing translation from the grammar altogether, and add a notion of "alias" entries in pg_type that would be found through normal lookup and then replaced by the underlying type by parse analysis rather than by the grammar. We could not do this in the existing system because of the need to handle typmods for some of the aliases ... but maybe it would work given generalized typmod support. There are still a few special cases though, like CHARACTER VARYING, which seems like it more or less has to be wired into the grammar. BTW, the proposed refactoring sounds messier to me than does decreeing type and function names equivalent ... regards, tom lane
On Wed, Aug 31, 2005 at 11:11:04AM -0400, Tom Lane wrote: > IMHO, ideally the aliasing should *only* apply to the built-in types. > The current hack only approximates this (IIRC, the translation happens > for any unqualified type name, independently of one's search path). > > One possible approach is to remove the aliasing translation from the > grammar altogether, and add a notion of "alias" entries in pg_type that > would be found through normal lookup and then replaced by the underlying > type by parse analysis rather than by the grammar. We could not do this > in the existing system because of the need to handle typmods for some of > the aliases ... but maybe it would work given generalized typmod > support. There are still a few special cases though, like CHARACTER > VARYING, which seems like it more or less has to be wired into the > grammar. Yeah, I was thinking about alias entries. I was thinking that domains might already do a lot of the work. But then it's not really aliasing anymore. > BTW, the proposed refactoring sounds messier to me than does decreeing > type and function names equivalent ... Actually, it's not that bad. The non-terminals relating to types collapse into about three with only "xxx VARYING" and "DOUBLE PRECISION" and a few others remaining. The keywords are split into three sets like I mentioned. Then you make the rules for GenericType look enough like the expansion of func_expr, that bison can consider the context after before deciding it's a function or a type. The changes are not that great. The number of rules (according to the -v output) goes from 1610 to 1601 and states from 2810 to 2777. OTOH, a chunk of code moves from gram.y to the adt directory I guess. The grammar is just the beginning of the work. Though maybe the point is that we can take the easy way and implement the slightly more difficult if it turns out the be necessary. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Wed, Aug 31, 2005 at 11:11:04AM -0400, Tom Lane wrote: >> One possible approach is to remove the aliasing translation from the >> grammar altogether, and add a notion of "alias" entries in pg_type that >> would be found through normal lookup and then replaced by the underlying >> type by parse analysis rather than by the grammar. > Yeah, I was thinking about alias entries. I was thinking that domains > might already do a lot of the work. But then it's not really aliasing > anymore. Right, a domain isn't quite the same thing. But you could probably use domains temporarily for prototyping it. One reason that I think a domain isn't the same thing is that I believe domains don't have typmods. Although you could imagine a domain passing a typmod down to its base type, that's not what the spec expects AFAICS. You're supposed to writecreate domain mytype as varchar(4); There's nothing likecreate domain mytype(n) as varchar(n); in the spec (and no I don't really wish to invent it...) > Though maybe the point is that we can take the easy way and implement > the slightly more difficult if it turns out the be necessary. That seems fair to me. Now that we have knowledge in the archives about how to do it the hard way if needed, we can take the easy way until we run into an actual need for the hard way. I still like the idea of pushing the aliasing out of the grammar, though. Come to think of it, we could probably even handle the multiple-word stuff that way: let the grammar convert CHARACTER VARYING to "character varying" and have an alias with that name in the catalog. One thing you'd need to look at is that format_type is aware of the special properties of the alias names: at present they never need to be schema-qualified, but this would no longer be certainly the case with the aliasing approach. A possible answer is for format_type to work by replacing (say) INT4OID with the OID of the alias type that has the desired spelling, and then use the same TypeIsVisible test as is applied to any user type. Another thing that is involved there is not double-quoting the generated names ... we don't want it to emit "character varying" but the user-type path would do that. Hmm... actually there's a bit of an issue here, which is that it's not clear whether schema qualification makes sense for the multi-word type names. For instancepg_catalog.character varying seems both ugly and syntactically ambiguous. So maybe we need to stick to the present solution for the multi-word type names: they are expanded by the grammar to pre-qualified names, and so you cannot have a user type selected by such a name, and format_type keeps its current special case approach to generating them. regards, tom lane
On Wed, Aug 31, 2005 at 02:25:54PM -0400, Tom Lane wrote: > I still like the idea of pushing the aliasing out of the grammar, > though. Come to think of it, we could probably even handle the > multiple-word stuff that way: let the grammar convert CHARACTER VARYING > to "character varying" and have an alias with that name in the catalog. Currently, a user-defined type, function, table etc is one IDENT, I don't see any point in changing that. The standard wants "character varying" and we can support that but we don't need to extend that to user-defined types. If people really want that they can quote it. Besides, the character stuff is somewhat special as it has the "character set" stuff so will need special grammer anyway. > One thing you'd need to look at is that format_type is aware of the > special properties of the alias names: at present they never need to be > schema-qualified, but this would no longer be certainly the case with > the aliasing approach. A possible answer is for format_type to work by > replacing (say) INT4OID with the OID of the alias type that has the > desired spelling, and then use the same TypeIsVisible test as is applied > to any user type. Another thing that is involved there is not > double-quoting the generated names ... we don't want it to emit > "character varying" but the user-type path would do that. I was thinking actually of setting the type searching code to search pg_catalog before the normal search_path. The types being hardwired into the grammer essentially implied this so I thought I would avoid surprises. This ofcourse would mean that all built-in types would automatically override user-defined ones, which actually sucks if PostgreSQL keeps including more types by default. OTOH, then types in pg_catalog never need to be qualified, making it easy for format_type. Now i think about it it may not be a good idea, for all its benefits. Perhaps only doing it for multiword types. Damn special casing. > > Hmm... actually there's a bit of an issue here, which is that it's not > clear whether schema qualification makes sense for the multi-word type > names. For instance > pg_catalog.character varying It doesn't work. The current grammer, even now, treats anything schema qualified as non-special. You can't schema qualify char(4) even if you want to. Incidently, these typmod changes for user types would make this work as a side-effect. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > I was thinking actually of setting the type searching code to search > pg_catalog before the normal search_path. The types being hardwired > into the grammer essentially implied this so I thought I would avoid > surprises. That strikes me as an unnecessary reduction in flexibility. As long as we make the hardwired type names translate to qualified names (same as they do now) we don't have to assume any such thing. (What I might actually favor doing that for is operators, because the schema-qualified syntax for operators is so gross. But I don't see a need for it for type names.) >> Hmm... actually there's a bit of an issue here, which is that it's not >> clear whether schema qualification makes sense for the multi-word type >> names. For instance >> pg_catalog.character varying > It doesn't work. The current grammer, even now, treats anything schema > qualified as non-special. You can't schema qualify char(4) even if you > want to. Incidently, these typmod changes for user types would make > this work as a side-effect. Right. I think thatpg_catalog.varchar(n) is reasonable and should be accepted, but I'm fine with decreeing thatcharacter varying(n) will always be special non-schema-qualified syntax (which the grammar will in fact translate into the former). The point about character sets is a bit distressing; here we are designing a new general-purpose mechanism and we can already see cases it doesn't handle. Can we fix that? regards, tom lane
On Wed, Aug 31, 2005 at 05:14:13PM -0400, Tom Lane wrote: > That strikes me as an unnecessary reduction in flexibility. As long as > we make the hardwired type names translate to qualified names (same as > they do now) we don't have to assume any such thing. Ack, there's fortunatly only a handful of those. > The point about character sets is a bit distressing; here we are > designing a new general-purpose mechanism and we can already see > cases it doesn't handle. Can we fix that? Err, well. My thought was a certain group of type-suffix options would be permitted (only zero or one at a time), for example: WITH TIME ZONE WITHOUT TIME ZONE CHARACTER SET xxx And have the grammer accept these after any type. For example, the type NUMERIC WITH TIME ZONE would be syntactically valid but the code would then reject it. You have a issue then because the typmod function should then be able to return a completely different type because the system looked up "timestamp" and now the function determines that with that option, it should actually be "timestamptz". As for the specific mechanism, well, my options were (in the TYPE declaration statement: TYPMODFUNC = function( intarray [, sometype] ) RETURNS int32 or intarray This restricts the arguments between the brackets to integers, is this reasonable? The sometype would be something to handle the suffix options. (Text pair? option,value). Returning an intarray if a new type is allowed. TYPMODFUNC = function( recordtype [, sometype ] ) RETURNS int32 or intarray The record type could then indicate what's supported, except you can't pass a variable number of arguments (for NUMERIC). How about fill up from the front, leave NULLs for all the unfilled ones. The STRICT flag could tell if all fields need to be filled (ugh). TYPMODFUNC = function( internal [, sometype ] ) RETURNS int32 or intarray Simply pass the (Node*) from the parser and let the function sort it out itself. Except now they have to be written in C. Is this unreasonable, it's called fairly early on, all the issues with no valid snapshot apply here and you can't defer the evaluation till later. I'm not sure how to choose, they all handle the current situation fine but what do we want to allow users to do in the future? Is the SQL standard likely to come up with SOMETYPE(ident) as a declaration, in which case we need the second or third options. Ident can be converted to a constant string for these purposes. And then there's output to consider, currently timestamp etc have special cases. But if you're going to allow "CHARACTER SET xxx" to apply to any type, you need a way to reconstruct the values for output. Requireing the user the provide an inverse function is one (possibly unreliable) way. Storing the arguments directly is another. And is one int32 typmod sufficient? This character set per column has been talked about for a while, but where was the information going to be stored? There's several issues to be sorted out yet, I fear. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
On N, 2005-09-01 at 09:26 +0200, Martijn van Oosterhout wrote: > On Wed, Aug 31, 2005 at 05:14:13PM -0400, Tom Lane wrote: > > That strikes me as an unnecessary reduction in flexibility. As long as > > we make the hardwired type names translate to qualified names (same as > > they do now) we don't have to assume any such thing. > > Ack, there's fortunatly only a handful of those. > > > The point about character sets is a bit distressing; here we are > > designing a new general-purpose mechanism and we can already see > > cases it doesn't handle. Can we fix that? > > Err, well. My thought was a certain group of type-suffix options would > be permitted (only zero or one at a time), for example: > > WITH TIME ZONE > WITHOUT TIME ZONE > CHARACTER SET xxx > > And have the grammer accept these after any type. Maybe make the last one "WITH CHARACTER SET xxx" and promote WITH to a real keyword. It seems a good idea to have WITH as a real keyword anyway, as at least ANSI/ISO syntax for recursive queries seem to require it too. -- Hannu Krosing <hannu@skype.net>
On Thu, 1 Sep 2005, Martijn van Oosterhout wrote: > Err, well. My thought was a certain group of type-suffix options would > be permitted (only zero or one at a time), for example: > > WITH TIME ZONE > WITHOUT TIME ZONE > CHARACTER SET xxx String types have 3 modifiers, the length, the charset and the collation. The syntax of these are defined by the standard so at least that syntax ought to be allowed (even if there are more work to actually do anything with charset and collation info). -- /Dennis Björklund
On Thu, Sep 01, 2005 at 11:18:04AM +0200, Dennis Bjorklund wrote: > String types have 3 modifiers, the length, the charset and the collation. > The syntax of these are defined by the standard so at least that syntax > ought to be allowed (even if there are more work to actually do anything > with charset and collation info). From a quick browse in google it looks like: CHAR(4) COLLATE xxx CHARACTER SET yyy is valid syntax. So I guess that becomes 0 or more modifiers. And a single int32 is not going to cut it. More fields (typmod2, typmod3)? Fields explicitly for this purpose (typcollate and typcharset), array of int32? Which would make the second argument to the typmod function ARRAY OF (textpair) or some such. If the function doesn't accept that then this is a quick indicator that no options are allowed. Quick thing, should 'mytype' and 'mytype()' be considered the same and should they default to typmod -1? Currently '()' is not even accepted. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
On Thu, Sep 01, 2005 at 11:12:26AM +0300, Hannu Krosing wrote: > Maybe make the last one "WITH CHARACTER SET xxx" and promote WITH to a > real keyword. > > It seems a good idea to have WITH as a real keyword anyway, as at least > ANSI/ISO syntax for recursive queries seem to require it too. Sorry, CHARACTER SET is defined by SQL standard. I don't understand what it is there for though, I thought the point of UNICODE/UTF-8 was to get rid of all this crap. I also can't find the bit that explains what should happen if two strings of different character sets are concatinated. The only thing I can think this useful for is default input/output charset, overriding client_encoding, and internally everything is still UNICODE. The COLLATE stuff is neat, if we can get it work. Maybe CHARSET is a roundabout way to specify the COLLATE order? Incidently, I just downloaded the SQL99 spec and am slightly confused by some of the things they'd added. Am I the only one? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > TYPMODFUNC =3D function( internal [, sometype ] ) RETURNS int32 or intar= > ray > Simply pass the (Node*) from the parser and let the function sort it > out itself. Except now they have to be written in C. Is this > unreasonable, Nope. You're not going to be writing any interesting datatypes without using C, anyway. I'd go with this one to minimize flexibility problems. I'd be inclined to stick with the convention that internally typmod is a single int32; that is really wired into way too many APIs to consider changing. varchar could do something like using 24 bits for the length and 8 bits for an encoded indication of the charset. regards, tom lane
On Thu, Sep 01, 2005 at 08:50:27AM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > Simply pass the (Node*) from the parser and let the function sort it > > out itself. Except now they have to be written in C. Is this > > unreasonable, > > Nope. You're not going to be writing any interesting datatypes without > using C, anyway. I'd go with this one to minimize flexibility problems. Ack > I'd be inclined to stick with the convention that internally typmod is > a single int32; that is really wired into way too many APIs to consider > changing. varchar could do something like using 24 bits for the length > and 8 bits for an encoded indication of the charset. With the unfortunate effect that strings are limited to 16Mb instead of 1Gb. Not sure if people will be happy with that one. For my locale experiments I used my taggedtypes module to embed the locale into the data itself, I imagine something similar could be used. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Thu, Sep 01, 2005 at 08:50:27AM -0400, Tom Lane wrote: >> varchar could do something like using 24 bits for the length >> and 8 bits for an encoded indication of the charset. > With the unfortunate effect that strings are limited to 16Mb instead of > 1Gb. No, only that you can't declare a specific maxlength for a varchar that exceeds 16Mb ... and guess what, there is already a smaller limit than that, on the grounds that if a user writes varchar(1000000000) he needs a cluebat anyway. I think you might have momentarily confused typmod with the varlena length word of an individual value. regards, tom lane
Thread added to TODO. --------------------------------------------------------------------------- Martijn van Oosterhout wrote: -- Start of PGP signed section. > Hi, > > As part of previous discussions about typmod for user type, Tom > mentioned that you would need to make type and function names > equivalent. As it turns out, if you refactor a few rules, you can > actually make it work and manage them seperately. For this to work the > current "col_name_keyword" non-terminal has to be divided into few more > categories. The criterion is mostly whether it is followed by a left > parenthsis. > > 1. Some can be moved to unreserved words, mostly the types BIT, VARCHAR, > BIGINT etc since they don't need special rules anymore. > > 2. Words that have special productions (eg CONVERT, COALESCE etc), > these can still only be used as column names, not types or > (user-defined) functions. > > 3. Words which can be column names functions but not types. These never > appear normally with a parenthesis, so they will be interpreted as a > function if there is one. (eg SETOF, NATIONAL, etc) > > 4. Words that can be column names and types but not functions. These > are artifacts of the grammer due to the fact that VARYING is > unrestricted. These are BIT, NCHAR, CHAR_P and CHARACTER. > > After this, you can have user-functions that take an arbitrary set of > parameters. There'll need to be a bit of code to verify the arguments. > It results in a net gain of 15 keywords for functions and about 30 for > types. > > My question is, should users be able to create types schema.int4 and > schema.integer simultaneously. Currently it allows you but it's not > handled very well (\dT doesn't list both). Should this be allowed? > Should aliasing for DEC and DECIMAL -> NUMERIC be done for > user-defined types? > > Have a nice day, > -- > Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > > tool for doing 5% of the work and then sitting around waiting for someone > > else to do the other 95% so you can sue them. -- End of PGP section, PGP failed! -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +