Parser translations and schemas - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Parser translations and schemas |
Date | |
Msg-id | 14341.1019667213@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Parser translations and schemas
|
List | pgsql-hackers |
Here's today's tidbit for those who like to argue about nitty little details of behavior ... Presently, the parser (in particular gram.y) has quite a few special transformations for certain type and function names. For example, You write You get char(N) bpchar trim(BOTH foo) btrim(foo) The question for the day is: should these transformations be applied to schema-qualified names? And should the parser force the transformed names to be looked up in the system schema (pg_catalog), or should it allow them to be searched for using the regular namespace search path? I want to make the following proposal: 1. Transformations are applied only to unqualified names. If you write a qualified name then it is treated as a plain-vanilla identifier and looked up in the catalogs without transformation, even if the name component happens to match a name that would be transformed standing alone. 2. If a transformation is applied then the resulting name will always be forced to be looked up in the system schema; ie, the output will effectively be "pg_catalog.something" not just "something". Some examples: You write You get char(N) pg_catalog.bpchar pg_catalog.char pg_catalog.char (not bpchar) real pg_catalog.float4 myschema.real myschema.real (not float4) trim(BOTH foo) pg_catalog.btrim(foo) pg_catalog.trim(BOTH foo) an error (since the special production allowing BOTH won't be used) I have a number of reasons for thinking that this is a reasonable way to go. Point one: transforming qualified names seems to violate the "principle of least surprise". If I write myschema.real I would not expect that to be converted to myschema.float4, especially if I weren't aware that Postgres internally calls REAL "float4". Point two: I don't believe that we need to do it to meet the letter of the SQL spec. AFAICT the spec treats all the names of built-in types and functions as keywords, not as names belonging to a system schema. So special behavior is required for TRIM(foo) but not for DEFINITION_SCHEMA.TRIM(foo). Point three: if we do transform a name, then we are expecting a particular system type or function to be selected, and we ought to ensure that that happens; thus explicitly qualifying the output name seems proper. Again, this seems less surprising than other alternatives. If I have a datatype myschema.float4 that I've put into the search path in front of pg_catalog, I think I'd be surprised to have it get picked when I write REAL. Another reason for doing it this way is that I think it's necessary for reversibility. For example, consider what format_type should put out when it's trying to write a special-cased type name. If it needs to emit numeric(10,4) then it *cannot* stick "pg_catalog." on the front of that --- the result wouldn't parse. (At least not unless we uglify the grammar a whole lot more to allow pg_catalog.FOO everywhere that just FOO currently has a special production.) So we need to make parsing rules that guarantee that numeric(10,4) will be interpreted as pg_catalog.numeric and not something else, regardless of the active search path. On the other hand, a plain user datatype that happens to be named "real" should be accessible as myschema.real without interference from the real->float4 transformation. A corner case that maybe requires more discussion is what about type and function names that are reserved per spec, but which we do not need any special transformation for? For example, the spec thinks that OCTET_LENGTH() is a keyword, but our implementation treats it as an ordinary function name. I feel that the grammar should not prefix "pg_catalog." to any name that it hasn't transformed or treated specially in any way, even if that name is reserved per spec. Note that this will not actually lead to any non-spec-compliant behavior as long as one allows the system to search pg_catalog before any user-provided schemas --- which is in fact the default behavior, as it's currently set up. Another point is that I believe that REAL should be transformed to pg_catalog.float4, but the quoted identifier "real" should not be. This would require a bit of surgery --- presently xlateSqlType is applied to pretty much everything whether it's a quoted identifier or not. But if we allow xlateSqlType to continue to work that way, then user-schema names that happen to match one of its target names are going to behave strangely. I think we will need to get rid of xlateSqlType/xlateSqlFunc and instead implement all the name transformations we want as special productions, so that the target names are shown as keywords in keywords.c. Without this, ruleutils.c will not have a clue that the user type name "real" needs to be quoted to keep it from being transformed. BTW: as the code stands today, gram.y is prefixing "pg_catalog." to system function names but not to type names, which is inconsistent; I hadn't thought carefully enough about these issues when I was hacking the grammar for schematized datatypes. So in any case I have some work to do... Comments, concerns, better ideas? regards, tom lane
pgsql-hackers by date: