Thread: Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Heikki Linnakangas

Date:

16 August 2017, 14:53:33

On 04/04/2017 10:13 AM, Daniel Gustafsson wrote:
>> On 04 Apr 2017, at 05:52, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>
>> Daniel Gustafsson wrote:
>>> Testing DefElem options is done with both strcmp() and pg_strcasecmp() a bit
>>> mixed.  Since the option defnames are all lowercased, either via IDENT, keyword
>>> rules or “by hand” with makeString(), using strcmp() is safe (and assumed to be
>>> so in quite a lot of places).
>>>
>>> While it’s not incorrect per se to use pg_strcasecmp(), it has the potential to
>>> hide a DefElem created with a mixed-case defname where it in other places is
>>> expected to be in lowercase, which may lead to subtle bugs.
>>>
>>> The attached patch refactors to use strcmp() consistently for option processing
>>> in the command code as a pre-emptive belts+suspenders move against such subtle
>>> bugs and to make the code more consistent.  Also reorders a few checks to have
>>> all in the same “format” and removes a comment related to the above.
>>>
>>> Tested with randomizing case on options in make check (not included in patch).
>>
>> Does it work correctly in the Turkish locale?
> 
> Yes, running make check with random case defnames under tr_TR works fine.

This no longer works:

postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict (    TEMPLATE = pg_catalog.simple,    "STOPWORds" =
english
);
ERROR:  unrecognized simple dictionary parameter: "STOPWORds"

In hindsight, perhaps we should always have been more strict about that 
to begin wtih, but let's not break backwards-compatibility without a 
better reason. I didn't thoroughly check all of the cases here, to see 
if there are more like this.

It'd be nice to have some kind of a rule on when pg_strcasecmp should be 
used and when strcmp() is enough. Currently, by looking at the code, I 
can't tell.

- Heikki

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Tom Lane

Date:

16 August 2017, 18:51:32

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> This no longer works:

> postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict (
>      TEMPLATE = pg_catalog.simple,
>      "STOPWORds" = english
> );
> ERROR:  unrecognized simple dictionary parameter: "STOPWORds"

> In hindsight, perhaps we should always have been more strict about that 
> to begin wtih, but let's not break backwards-compatibility without a 
> better reason. I didn't thoroughly check all of the cases here, to see 
> if there are more like this.

You have a point, but I'm not sure that this is such a bad compatibility
break as to be a reason not to change things to be more consistent.

> It'd be nice to have some kind of a rule on when pg_strcasecmp should be 
> used and when strcmp() is enough. Currently, by looking at the code, I 
> can't tell.

My thought is that if we are looking at words that have been through the
parser, then it should *always* be plain strcmp; we should expect that
the parser already did the appropriate case-folding.  If the user
prevented case-folding by double-quoting, I don't have a lot of sympathy
for any complaints about it.  Generally speaking, what we're dealing with
here is things that are logically keywords but we did not wish to make
them real parser keywords.  But in SQL, once you quote a keyword, it's
not a keyword at all anymore.  So I think the argument that quoting
"stopwords" should be legal at all in this context is pretty weak,
and the argument that quoting a weirdly-cased version of it should
work is even weaker.

pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff
that somehow came in without going through the parser.
        regards, tom lane

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Robert Haas

Date:

16 August 2017, 19:04:57

On Wed, Aug 16, 2017 at 11:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> You have a point, but I'm not sure that this is such a bad compatibility
> break as to be a reason not to change things to be more consistent.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Daniel Gustafsson

Date:

17 August 2017, 12:08:29

> On 16 Aug 2017, at 17:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> This no longer works:
> 
>> postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict (
>>     TEMPLATE = pg_catalog.simple,
>>     "STOPWORds" = english
>> );
>> ERROR:  unrecognized simple dictionary parameter: "STOPWORds"
> 
>> In hindsight, perhaps we should always have been more strict about that 
>> to begin wtih, but let's not break backwards-compatibility without a 
>> better reason. I didn't thoroughly check all of the cases here, to see 
>> if there are more like this.
> 
> You have a point, but I'm not sure that this is such a bad compatibility
> break as to be a reason not to change things to be more consistent.

I agree with this, but I admittedly have no idea how common the above case
would be in the wild.

>> It'd be nice to have some kind of a rule on when pg_strcasecmp should be 
>> used and when strcmp() is enough. Currently, by looking at the code, I 
>> can't tell.
> 
> My thought is that if we are looking at words that have been through the
> parser, then it should *always* be plain strcmp; we should expect that
> the parser already did the appropriate case-folding.

+1

> pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff
> that somehow came in without going through the parser.

In that case it would be up to the consumer of the data to handle required
case-folding for the expected input, so pg_strcasecmp or strcmp depending on
situation.

cheers ./daniel

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Daniel Gustafsson

Date:

05 September 2017, 11:34:43

> On 17 Aug 2017, at 11:08, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 16 Aug 2017, at 17:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>>> This no longer works:
>>
>>> postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict (
>>>    TEMPLATE = pg_catalog.simple,
>>>    "STOPWORds" = english
>>> );
>>> ERROR:  unrecognized simple dictionary parameter: "STOPWORds"
>>
>>> In hindsight, perhaps we should always have been more strict about that
>>> to begin wtih, but let's not break backwards-compatibility without a
>>> better reason. I didn't thoroughly check all of the cases here, to see
>>> if there are more like this.
>>
>> You have a point, but I'm not sure that this is such a bad compatibility
>> break as to be a reason not to change things to be more consistent.
>
> I agree with this, but I admittedly have no idea how common the above case
> would be in the wild.
>
>>> It'd be nice to have some kind of a rule on when pg_strcasecmp should be
>>> used and when strcmp() is enough. Currently, by looking at the code, I
>>> can't tell.
>>
>> My thought is that if we are looking at words that have been through the
>> parser, then it should *always* be plain strcmp; we should expect that
>> the parser already did the appropriate case-folding.
>
> +1
>
>> pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff
>> that somehow came in without going through the parser.
>
> In that case it would be up to the consumer of the data to handle required
> case-folding for the expected input, so pg_strcasecmp or strcmp depending on
> situation.

This patch has been marked “Waiting on Author”, but I’m not sure what the
concensus of this thread came to with regards to quoted keywords and backwards
compatibility.  There seems to be a 2-1 vote for allowing a break, and forcing
all keywords out of the parser to be casefolded. Any other opinions?

cheers ./daniel

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Michael Paquier

Date:

17 November 2017, 05:31:48

On Tue, Sep 5, 2017 at 5:34 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
>> On 17 Aug 2017, at 11:08, Daniel Gustafsson <daniel@yesql.se> wrote:
>>> On 16 Aug 2017, at 17:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> My thought is that if we are looking at words that have been through the
>>> parser, then it should *always* be plain strcmp; we should expect that
>>> the parser already did the appropriate case-folding.
>>
>> +1
>>
>>> pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff
>>> that somehow came in without going through the parser.
>>
>> In that case it would be up to the consumer of the data to handle required
>> case-folding for the expected input, so pg_strcasecmp or strcmp depending on
>> situation.
>
> This patch has been marked “Waiting on Author”, but I’m not sure what the
> concensus of this thread came to with regards to quoted keywords and backwards
> compatibility.  There seems to be a 2-1 vote for allowing a break, and forcing
> all keywords out of the parser to be casefolded. Any other opinions?

This patch impacts the DDL grammar of aggregates, operators,
collations, text search, views, etc. Still I agree with the purpose of
this thread that it would be nice to get a more consistent behavior
even if it breaks some queries, so +1 for the argument with the
post-parser comparison which should use strcmp.

The patch needs a rebase, and there are a couple of places that need
an extra lookup I think:
$ git grep defname -- *.c | grep strcasecmp | wc -l     39

Switching to "waiting on author" for now.
Thanks,
--
Michael

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Daniel Gustafsson

Date:

27 November 2017, 18:11:39

> On 17 Nov 2017, at 03:31, Michael Paquier <michael.paquier@gmail.com> wrote:
>
> On Tue, Sep 5, 2017 at 5:34 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
>>> On 17 Aug 2017, at 11:08, Daniel Gustafsson <daniel@yesql.se> wrote:
>>>> On 16 Aug 2017, at 17:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> My thought is that if we are looking at words that have been through the
>>>> parser, then it should *always* be plain strcmp; we should expect that
>>>> the parser already did the appropriate case-folding.
>>>
>>> +1
>>>
>>>> pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff
>>>> that somehow came in without going through the parser.
>>>
>>> In that case it would be up to the consumer of the data to handle required
>>> case-folding for the expected input, so pg_strcasecmp or strcmp depending on
>>> situation.
>>
>> This patch has been marked “Waiting on Author”, but I’m not sure what the
>> concensus of this thread came to with regards to quoted keywords and backwards
>> compatibility.  There seems to be a 2-1 vote for allowing a break, and forcing
>> all keywords out of the parser to be casefolded. Any other opinions?
>
> This patch impacts the DDL grammar of aggregates, operators,
> collations, text search, views, etc. Still I agree with the purpose of
> this thread that it would be nice to get a more consistent behavior
> even if it breaks some queries, so +1 for the argument with the
> post-parser comparison which should use strcmp.

Thanks for reviewing!

> The patch needs a rebase, and there are a couple of places that need
> an extra lookup I think:
> $ git grep defname -- *.c | grep strcasecmp | wc -l
>      39

Rebased and handled a few more places which I had either missed in the last
round, or that had been added in the meantime.  “PARALLEL” in aggregatecmds.c
is intentionally using pg_strcasecmp() due to the old-style syntax which is
still supported. AFAICS this covers all relevant codepaths from the 39 above.

cheers ./daniel

Attachment

defname_strcmp-v2.patch

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Michael Paquier

Date:

28 November 2017, 04:07:15

On Tue, Nov 28, 2017 at 12:11 AM, Daniel Gustafsson <daniel@yesql.se> wrote:
>> The patch needs a rebase, and there are a couple of places that need
>> an extra lookup I think:
>> $ git grep defname -- *.c | grep strcasecmp | wc -l
>>      39
>
> Rebased and handled a few more places which I had either missed in the last
> round, or that had been added in the meantime.  “PARALLEL” in aggregatecmds.c
> is intentionally using pg_strcasecmp() due to the old-style syntax which is
> still supported.

This meritates a comment. Code readers may get confused.

> AFAICS this covers all relevant codepaths from the 39 above.

I was just looking at the tsearch code which uses pg_strcmpcase, and
those are defined with makeDefElem() so you should switch to strcmp in
this case as well, no? If I patch the code myself I would get an error
when double-quoting, making those command more consistent with the
rest of what you are patching here:
create extension unaccent;
alter text search dictionary unaccent (Rules = 'unaccent'); -- ok
alter text search dictionary unaccent (RuLes = 'unaccent'); -- ok
alter text search dictionary unaccent ("Rules" = 'unaccent'); -- error
--
Michael

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Michael Paquier

Date:

29 November 2017, 06:29:33

On Tue, Nov 28, 2017 at 10:07 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> I was just looking at the tsearch code which uses pg_strcmpcase, and
> those are defined with makeDefElem() so you should switch to strcmp in
> this case as well, no? If I patch the code myself I would get an error
> when double-quoting, making those command more consistent with the
> rest of what you are patching here:
> create extension unaccent;
> alter text search dictionary unaccent (Rules = 'unaccent'); -- ok
> alter text search dictionary unaccent (RuLes = 'unaccent'); -- ok
> alter text search dictionary unaccent ("Rules" = 'unaccent'); -- error

Daniel, I am waiting for your input on this one, and you did not have
much time to send an update. So I am moving this patch to next CF.
-- 
Michael

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Daniel Gustafsson

Date:

30 November 2017, 01:40:49

> On 29 Nov 2017, at 04:29, Michael Paquier <michael.paquier@gmail.com> wrote:
>
> On Tue, Nov 28, 2017 at 10:07 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> I was just looking at the tsearch code which uses pg_strcmpcase, and
>> those are defined with makeDefElem() so you should switch to strcmp in
>> this case as well, no? If I patch the code myself I would get an error
>> when double-quoting, making those command more consistent with the
>> rest of what you are patching here:
>> create extension unaccent;
>> alter text search dictionary unaccent (Rules = 'unaccent'); -- ok
>> alter text search dictionary unaccent (RuLes = 'unaccent'); -- ok
>> alter text search dictionary unaccent ("Rules" = 'unaccent'); -- error
>
> Daniel, I am waiting for your input on this one, and you did not have
> much time to send an update.

Sorry for the slow updates, I managed to catch a nasty fever which turned my
brain closer to a mushroom soup than I’d like.

> So I am moving this patch to next CF.

Good move, thanks!

cheers ./daniel

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Michael Paquier

Date:

30 November 2017, 01:49:19

On Thu, Nov 30, 2017 at 7:40 AM, Daniel Gustafsson <daniel@yesql.se> wrote:
>> On 29 Nov 2017, at 04:29, Michael Paquier <michael.paquier@gmail.com> wrote:
>>
>> On Tue, Nov 28, 2017 at 10:07 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> I was just looking at the tsearch code which uses pg_strcmpcase, and
>>> those are defined with makeDefElem() so you should switch to strcmp in
>>> this case as well, no? If I patch the code myself I would get an error
>>> when double-quoting, making those command more consistent with the
>>> rest of what you are patching here:
>>> create extension unaccent;
>>> alter text search dictionary unaccent (Rules = 'unaccent'); -- ok
>>> alter text search dictionary unaccent (RuLes = 'unaccent'); -- ok
>>> alter text search dictionary unaccent ("Rules" = 'unaccent'); -- error
>>
>> Daniel, I am waiting for your input on this one, and you did not have
>> much time to send an update.
>
> Sorry for the slow updates, I managed to catch a nasty fever which turned my
> brain closer to a mushroom soup than I’d like.

No problems. Take care.
--
Michael

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Daniel Gustafsson

Date:

30 November 2017, 14:40:15

> On 28 Nov 2017, at 02:07, Michael Paquier <michael.paquier@gmail.com> wrote:
>
> On Tue, Nov 28, 2017 at 12:11 AM, Daniel Gustafsson <daniel@yesql.se> wrote:
>>> The patch needs a rebase, and there are a couple of places that need
>>> an extra lookup I think:
>>> $ git grep defname -- *.c | grep strcasecmp | wc -l
>>>     39
>>
>> Rebased and handled a few more places which I had either missed in the last
>> round, or that had been added in the meantime.  “PARALLEL” in aggregatecmds.c
>> is intentionally using pg_strcasecmp() due to the old-style syntax which is
>> still supported.
>
> This meritates a comment. Code readers may get confused.

Good point, added.  I also sent a separate doc patch for this to -docs the
other day.

>> AFAICS this covers all relevant codepaths from the 39 above.
>
> I was just looking at the tsearch code which uses pg_strcmpcase, and
> those are defined with makeDefElem() so you should switch to strcmp in
> this case as well, no? If I patch the code myself I would get an error
> when double-quoting, making those command more consistent with the
> rest of what you are patching here:
> create extension unaccent;
> alter text search dictionary unaccent (Rules = 'unaccent'); -- ok
> alter text search dictionary unaccent (RuLes = 'unaccent'); -- ok
> alter text search dictionary unaccent ("Rules" = 'unaccent'); — error

For reasons unknown to me I had avoided poking in contrib/.  Attached patch
handles the additional defname comparisons in contrib that are applicable.

The remainder of the pg_strcasecmp() calls in the text search code are
operating on a defelem list created in deserialize_deflist() rather than in the
parser, so I opted for keeping that as is rather than casefolding in the list
generation.

cheers ./daniel

Attachment

defname_strcmp-v3.patch

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Robert Haas

Date:

30 November 2017, 22:14:31

On Thu, Nov 30, 2017 at 6:40 AM, Daniel Gustafsson <daniel@yesql.se> wrote:
> For reasons unknown to me I had avoided poking in contrib/.  Attached patch
> handles the additional defname comparisons in contrib that are applicable.

I am having a bit of trouble understanding why the first hunk in
DefineAggregate() is taking PARALLEL as a special case.  The
documentation gives three separate synopses for CREATE AGGREGATE, but
parallel appears in all of them, and there are other options that do
too, so the comment doesn't really help me understand why it's special
as compared to other, similar options.

I think the changes in DefineView and ATExecSetRelOptions is wrong,
because transformRelOptions() is still using pg_strcasecmp.  With the
patch:

rhaas=# create view v(x) with ("Check_option"="local") as select 1;
CREATE VIEW
rhaas=# create view v(x) with ("check_option"="local") as select 1;
ERROR:  WITH CHECK OPTION is supported only on automatically updatable views
HINT:  Views that do not select from a single table or view are not
automatically updatable.

Oops.

Here are, for the record, examples of SQL that will be generate errors
or warnings with the patch, but not presently, with a note about which
function got changed at the C level to affect the behavior.

transformRelOptions/interpretOidsOption: create table a () with ("OiDs"=true);
DefineAggregate, second hunk: CREATE AGGREGATE avg (float8) (sfunc =
float8_accum, stype = float8[], finalfunc = float8_avg, initcond =
'{0,0,0}', parallel = 'sAfe');
DefineCollation: CREATE COLLATION stunk ("LoCaLeS" = "C");
compute_attributes_with_style: create function one() returns int as
$$select 1$$ language sql with ("isStrict" = 'true');
DefineOperator: create operator @ (procedure = pg_catalog.int4eq,
leftarg = int4, "RIGHTARG" = int4);
DefineType: create type awesome (input = int4in, "OuTpUt" = int4out);
validateWithCheckOption: create table t(a int, b text, unique(a));
create view x with (check_option = 'loCal') as select * from t;

I have not yet managed to figure out what the impact of the contrib
changes, or the text search changes in core, is.  This is partly a
lack of subject matter expertise, but the fact that the functions
being modified in contrib have a grand total of 0 lines of comments
between them does not help.  That's not this patch's fault, nor this
patch's job to fix, but it is a barrier to understanding.  I think it
would be nice to have a complete list of examples of what syntax this
patch is affecting.

I am actually pretty dubious that we want to do this.  I found one bug
already (see above), and I don't think there's any guarantee that it's
the only one, because it's pretty hard to be sure that none of the
remaining calls to pg_strcasecmp are being applied to any of these
values in some other part of the code.  I'm not sure that the backward
compatibility issue is a huge deal, but from my point of view this
carries a significant risk of introducing new bugs, might annoy users
who spell any of these keywords in all caps with surrounding quotation
marks, and really has no clear benefit that I can see.  The
originally-offered justification is that making this consistent would
help us avoid subtle bugs, but it seems at least as likely to CREATE
subtle bugs, and the status quo is AFAICT harming nobody.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Refactoring identifier checks to consistently usestrcmp

From

Stephen Frost

Date:

07 January 2018, 03:17:33

Greetings Michael, Daniel, all,

* Michael Paquier (michael.paquier@gmail.com) wrote:
> On Fri, Dec 1, 2017 at 4:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > I think the changes in DefineView and ATExecSetRelOptions is wrong,
> > because transformRelOptions() is still using pg_strcasecmp.  With the
> > patch:
> >
> > rhaas=# create view v(x) with ("Check_option"="local") as select 1;
> > CREATE VIEW
> > rhaas=# create view v(x) with ("check_option"="local") as select 1;
> > ERROR:  WITH CHECK OPTION is supported only on automatically updatable views
> > HINT:  Views that do not select from a single table or view are not
> > automatically updatable.
> >
> > Oops.
>
> I am getting the feeling that there could be other issues than this
> one... If this patch ever gets integrated, I think that this should
> actually be shaped as two patches:
> - One patch with the set of DDL queries taking advantage of the
> current grammar with pg_strcasecmp.
> - A second patch that does the switch from pg_strcasecmp to strcmp,
> and allows checking which paths of patch 1 are getting changed.
> Patch 1 is the hard part to figure out all the possible patterns that
> could be changed.
>
> > I am actually pretty dubious that we want to do this.  I found one bug
> > already (see above), and I don't think there's any guarantee that it's
> > the only one, because it's pretty hard to be sure that none of the
> > remaining calls to pg_strcasecmp are being applied to any of these
> > values in some other part of the code.  I'm not sure that the backward
> > compatibility issue is a huge deal, but from my point of view this
> > carries a significant risk of introducing new bugs, might annoy users
> > who spell any of these keywords in all caps with surrounding quotation
> > marks, and really has no clear benefit that I can see.  The
> > originally-offered justification is that making this consistent would
> > help us avoid subtle bugs, but it seems at least as likely to CREATE
> > subtle bugs, and the status quo is AFAICT harming nobody.
>
> Changing opinion here ;)
> Yes, I agree that the risk of bugs may not be worth the compatibility
> effort at the end. I still see value in this patch for long-term
> purposes by making the code more consistent though.

Looks like this discussion has progressed to where this patch should
really be marked as Rejected.  Does anyone else want to voice an opinion
regarding it, or perhaps Daniel could post a rebuttal to the concerns
raised here?

Thinking through it, for my own 2c on this, I tend to agree with Tom
that, really, we should be using strcmp() for anything coming out of the
parser and that the backward-compatibility break from that is
acceptable.  I also understand Robert's concern that there may be other
bugs hiding and I wonder if there might be a way to check for them,
though no great ideas spring to mind offhand.  Would be great to hear
your thoughts, Daniel, so leaving this in Waiting on Author for now.

Thanks!

Stephen

Attachment

signature.asc

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Tom Lane

Date:

07 January 2018, 03:38:03

Stephen Frost <sfrost@snowman.net> writes:
> Thinking through it, for my own 2c on this, I tend to agree with Tom
> that, really, we should be using strcmp() for anything coming out of the
> parser and that the backward-compatibility break from that is
> acceptable.  I also understand Robert's concern that there may be other
> bugs hiding and I wonder if there might be a way to check for them,
> though no great ideas spring to mind offhand.  Would be great to hear
> your thoughts, Daniel, so leaving this in Waiting on Author for now.

FWIW, I don't especially agree with Robert's position, because I think
he is ignoring the argument that it's a bug that some things are
case-sensitive and other seemingly similar things are not.

It's definitely concerning that the submitted patch introduced a new bug,
but we have seldom taken the position that bugs in an initial submission
are sufficient grounds for rejecting the entire concept.

ISTM that if this patch gets rid of a large fraction of the uses of
pg_strcasecmp in the backend, which is what I expect it should, then
it might not be out of reach to go through all the surviving ones to
make sure they are not processing strings that originate in the parser.

            regards, tom lane

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Michael Paquier

Date:

07 January 2018, 15:43:28

On Sun, Jan 7, 2018 at 9:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ISTM that if this patch gets rid of a large fraction of the uses of
> pg_strcasecmp in the backend, which is what I expect it should, then
> it might not be out of reach to go through all the surviving ones to
> make sure they are not processing strings that originate in the parser.

Yeah, that's why I think that it is important for this patch to
provide as well tests to make sure that all the code paths are working
as they should with this patch.
-- 
Michael

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Robert Haas

Date:

08 January 2018, 19:27:40

On Sat, Jan 6, 2018 at 7:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It's definitely concerning that the submitted patch introduced a new bug,
> but we have seldom taken the position that bugs in an initial submission
> are sufficient grounds for rejecting the entire concept.

Fair point.  I withdraw my categorical -1 vote and replace it with the
statement that the patch hasn't been sufficiently-carefully checked by
the patch author or other reviewers for bugs to consider committing it
-- nor has anyone taken the trouble to list with precision all of the
places where the behavior will change.  I think the latter is
absolutely indispensable, which is why I started to compile such a
list in my previous post.  The former needs to be done as well, of
course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Daniel Gustafsson

Date:

08 January 2018, 19:44:53

> On 08 Jan 2018, at 17:27, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Jan 6, 2018 at 7:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's definitely concerning that the submitted patch introduced a new bug,
>> but we have seldom taken the position that bugs in an initial submission
>> are sufficient grounds for rejecting the entire concept.
>
> Fair point.  I withdraw my categorical -1 vote and replace it with the
> statement that the patch hasn't been sufficiently-carefully checked by
> the patch author or other reviewers for bugs to consider committing it
> -- nor has anyone taken the trouble to list with precision all of the
> places where the behavior will change.  I think the latter is
> absolutely indispensable, which is why I started to compile such a
> list in my previous post.

Sorry for dropping off the radar, my available time for hacking was severely
limited (well, down to zero really).

Seeing the surface again I’ve started on the complete list and hope to have
something quite soon, and I think (as seems to be the consensus on this thread)
that that list is a prerequisite for the review.

cheers ./daniel

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Daniel Gustafsson

Date:

11 January 2018, 01:49:47

> On 07 Jan 2018, at 01:17, Stephen Frost <sfrost@snowman.net> wrote:

> Would be great to hear your thoughts, Daniel, so leaving this in Waiting on
> Author for now.

I am still of the opinion that it’s worth going through and ensuring that we
are matching against parser output in a consistent way, if only to lower the
risk of subtle hard to find bugs.  This patch is a first stab, but there are
more string comparisons to consider should we decide to ahead with this patch
(or the general approach in some other fashion than this implementation).

Collating the responses so far with an updated patch, thanks to everyone who
has reviewed this patch!  Sorry being slow to respond, $life hasn’t allowed for
much hacking lately.

> On 30 Nov 2017, at 20:14, Robert Haas <robertmhaas@gmail.com> wrote:

> I am having a bit of trouble understanding why the first hunk in
> DefineAggregate() is taking PARALLEL as a special case.  The
> documentation gives three separate synopses for CREATE AGGREGATE, but
> parallel appears in all of them, and there are other options that do
> too, so the comment doesn't really help me understand why it's special
> as compared to other, similar options.

The reason for the special handling of parallel in the old-style CREATE
AGGREGATE syntax is that it’s parsed with IDENT rather than ColLabel.  AFAICT
that works for all parameters except parallel as it’s an unreserved keyword
since 7aea8e4f2da.  Extending old_aggr_elem to handle PARALLEL separately seems
to work for not requiring the quoting, but I may be missing something as the
parser hacking isn’t my speciality.  The patch has been updated with this (and
the documentation + tests changes to go with it) but it clearly needs a close
eye.

> I think the changes in DefineView and ATExecSetRelOptions is wrong,
> because transformRelOptions() is still using pg_strcasecmp.

Correct, I had missed that reloptions case, sorry about that.  The hunk in
AlterTableGetRelOptionsLockLevel() seems correct to me, and testing didn’t
break anything, but I wonder if there is a case where it would need
pg_strncasecmp?

> On 08 Jan 2018, at 17:27, Robert Haas <robertmhaas@gmail.com> wrote:

> nor has anyone taken the trouble to list with precision all of the
> places where the behavior will change.  I think the latter is
> absolutely indispensable,

I had a look at the available commands in postgres and compiled a list of them
in options.sql based on if they have options, and how those options and matched
(case sensitive of insensitive).  The queries in the file are nonsensical, they
are just meant to show the handling of the options.  The idea was to illustrate
the impact of this proposal by running examples.  Running this file with and
without the patches applied shows the following commands being affected:

    CREATE TABLE
    CREATE TABLE AS
    ALTER TABLE
    CREATE TABLESPACE
    ALTER TABLESPACE
    CREATE VIEW
    ALTER VIEW
    CREATE INDEX
    ALTER INDEX
    CREATE AGGREGATE (new and old syntax)
    CREATE COLLATION
    CREATE FUNCTION
    CREATE OPERATOR
    ALTER OPERATOR
    CREATE TYPE
    CREATE TEXT SEARCH CONFIGURATION
    CREATE TEXT SEARCH DICTIONARY
    ALTER TEXT SEARCH DICTIONARY

The output with the patch is attached as options_patched.out, and the output
from master as options_master.out.  Diffing the two files is rather helpful I
think.

All possible options aren’t excercised, and I hope I didn’t miss any statements
that should’ve been covered.  The options.sql file makes it quite obvious that
we currently have quite a mix of case sensitive and insensitive commands.  Is
this in line with what you were thinking Robert?  I’m definitely up for better
ideas.

One open question from this excercise is how to write a good test for this.  It
can either be made part of the already existing test queries or a separate
suite.  I’m leaning on the latter simply because the case-flipping existing
tests seems like something that can be cleaned up years from now accidentally
because it looks odd.

> On 07 Jan 2018, at 01:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> ISTM that if this patch gets rid of a large fraction of the uses of
> pg_strcasecmp in the backend, which is what I expect it should, then
> it might not be out of reach to go through all the surviving ones to
> make sure they are not processing strings that originate in the parser.

I completely agree, but I have not expanded this patch with this.  One case was
actually instead put back, since I did see a mention in the documentation for
isStrict and isCachable they aren’t case sensitive.  Instead I removed that
section from the documentation in a hope that we some day can make these case
sensitive.

Additionally, while poking at the commands I noticed an inconsistency in
checking for conflicting options in CREATE FUNCTION.  The below statement is
correctly erroring on IMMUTABLE and VOLATILE being conflicting:

    create function int42(cstring) returns int42 AS 'int4in'
    language internal strict immutable volatile;

If you however combine the new and old syntax, the statement works and the WITH
option wins by overwriting any previous value.  The below statement creates an
IMMUTABLE function:

    create function int42(cstring) returns int42 AS 'int4in'
    language internal strict volatile with (isstrict, iscachable);

It’s arguably a pretty silly statement to write, and may very well never have
been seen in production, but I would still expect that query to error out.
Attached volatility.patch fixes it in a hacky way to behave like the former
statement, there is probably a cleaner way but I didn’t want to spend too much
time on it before hearing if others think it’s not worth fixing.

Thanks for looking at this!

cheers ./daniel

Attachment

Re: [HACKERS] Refactoring identifier checks to consistently usestrcmp

From

Michael Paquier

Date:

11 January 2018, 11:01:03

On Wed, Jan 10, 2018 at 11:49:47PM +0100, Daniel Gustafsson wrote:
>> On 08 Jan 2018, at 17:27, Robert Haas <robertmhaas@gmail.com> wrote:
>> nor has anyone taken the trouble to list with precision all of the
>> places where the behavior will change.  I think the latter is
>> absolutely indispensable,
>
> I had a look at the available commands in postgres and compiled a list
> of them in options.sql based on if they have options, and how those
> options and matched (case sensitive of insensitive).  The queries in
> the file are nonsensical, they are just meant to show the handling of
> the options.  The idea was to illustrate the impact of this proposal
> by running examples. Running this file with and without the patches
> applied shows the following commands being affected:
>
> <snip>
>
> The output with the patch is attached as options_patched.out, and the output
> from master as options_master.out.  Diffing the two files is rather helpful I
> think.

Thanks. This is saving me hours of lookups and testing during the
review, as now reviewers just have to map you test series with the code
modified. I can't help to notice that tests for code paths with
contrib modules are missing. This brings up the point: do we want those
tests to be in the patch? I would like to think that a special section
dedicated to option compatibility for each command would be welcome to
track which grammar is supported and which grammar is not supported.

> All possible options aren’t excercised, and I hope I didn’t miss any
> statements that should’ve been covered.  The options.sql file makes it
> quite obvious that we currently have quite a mix of case sensitive and
> insensitive commands.  Is this in line with what you were thinking
> Robert?  I’m definitely up for better ideas.

I would think that one option tested in the series is fine to cover
grounds. Most of those code paths are made of a series of if/elif using
strcmp so all of them should be consistent..

> One open question from this excercise is how to write a good test for
> this.  It can either be made part of the already existing test queries
> or a separate suite.  I’m leaning on the latter simply because the
> case-flipping existing tests seems like something that can be cleaned
> up years from now accidentally because it looks odd.

Adding them into src/test/regress/ sounds like a good plan to me.

> If you however combine the new and old syntax, the statement works and the WITH
> option wins by overwriting any previous value.  The below statement creates an
> IMMUTABLE function:
>
>     create function int42(cstring) returns int42 AS 'int4in'
>     language internal strict volatile with (isstrict, iscachable);

Here is another idea: nuking isstrict and iscachable from CREATE
FUNCTION syntax and forget about them. I would be tempted of the opinion
to do that before the rest.

-old_aggr_elem:  IDENT '=' def_arg
+old_aggr_elem:  PARALLEL '=' def_arg
+               {
+                   $$ = makeDefElem("parallel", (Node *)$3, @1);
+               }
+               | IDENT '=' def_arg
Nit: alphabetical order.

I have spent a couple of hours reviewing all the calls to pg_strcasecmp,
and the only thing I have noticed is in transformRelOptions(), where the
namespace string should be evaluated as well by strcmp, no? On HEAD:
=# create table a1 (a text) with ( "Toast"."Autovacuum_vacuum_cost_limit" = 1);
CREATE TABLE
=# create table a2 (a text) with ( "Toast.Autovacuum_vacuum_cost_limit" = 1);
ERROR:  22023: unrecognized parameter "Toast.Autovacuum_vacuum_cost_limit"
=# create table a3 (a text) with ( "Toast".autovacuum_vacuum_cost_limit = 1);
CREATE TABLE
=# create table a4 (a text) with ( toast."Autovacuum_vacuum_cost_limit" = 1);
CREATE TABLE

With your patch:
=# create table a1 (a text) with ( "Toast"."Autovacuum_vacuum_cost_limit" = 1);
ERROR:  22023: unrecognized parameter "Autovacuum_vacuum_cost_limit"
=# create table a2 (a text) with ( "Toast.Autovacuum_vacuum_cost_limit" = 1);
ERROR:  22023: unrecognized parameter "Toast.Autovacuum_vacuum_cost_limit"
=# create table a3 (a text) with ( "Toast".autovacuum_vacuum_cost_limit = 1);
CREATE TABLE
=# create table a4 (a text) with ( toast."Autovacuum_vacuum_cost_limit" = 1);
ERROR:  22023: unrecognized parameter "Autovacuum_vacuum_cost_limit"
LOCATION:  parseRelOptions, reloptions.c:1103

So per my set of examples, the evaluation of the schema name should fail
when creating relation a3 with your patch applied. As the patch does
things, the experience for the user is not consistent, and the schema
name goes through parser via reloption_elem using makeDefElemExtended,
so this should use strcmp.
--
Michael

Attachment

signature.asc

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Daniel Gustafsson

Date:

13 January 2018, 01:35:48

> On 11 Jan 2018, at 09:01, Michael Paquier <michael.paquier@gmail.com> wrote:
>
> On Wed, Jan 10, 2018 at 11:49:47PM +0100, Daniel Gustafsson wrote:
>>> On 08 Jan 2018, at 17:27, Robert Haas <robertmhaas@gmail.com> wrote:
>>> nor has anyone taken the trouble to list with precision all of the
>>> places where the behavior will change.  I think the latter is
>>> absolutely indispensable,
>>
>> I had a look at the available commands in postgres and compiled a list
>> of them in options.sql based on if they have options, and how those
>> options and matched (case sensitive of insensitive).  The queries in
>> the file are nonsensical, they are just meant to show the handling of
>> the options.  The idea was to illustrate the impact of this proposal
>> by running examples. Running this file with and without the patches
>> applied shows the following commands being affected:
>>
>> <snip>
>>
>> The output with the patch is attached as options_patched.out, and the output
>> from master as options_master.out.  Diffing the two files is rather helpful I
>> think.
>
> Thanks. This is saving me hours of lookups and testing during the
> review, as now reviewers just have to map you test series with the code
> modified.

Well, being the one proposing the patch I should be the one spending those
hours and not you.  Sorry for not including in the original submission.

> I can't help to notice that tests for code paths with
> contrib modules are missing. This brings up the point: do we want those
> tests to be in the patch?

I left them out since they are version of ALTER TEXT SEARCH .. rather than new
statements.

> I would like to think that a special section
> dedicated to option compatibility for each command would be welcome to
> track which grammar is supported and which grammar is not supported.

I’m not sure I follow?

>> One open question from this excercise is how to write a good test for
>> this.  It can either be made part of the already existing test queries
>> or a separate suite.  I’m leaning on the latter simply because the
>> case-flipping existing tests seems like something that can be cleaned
>> up years from now accidentally because it looks odd.
>
> Adding them into src/test/regress/ sounds like a good plan to me.

If there is interest in this patch now that the list exists and aids review, I
can turn the list into a proper test that makes a little more sense than the
current list which is rather aimed at helping reviewers.

>> If you however combine the new and old syntax, the statement works and the WITH
>> option wins by overwriting any previous value.  The below statement creates an
>> IMMUTABLE function:
>>
>>    create function int42(cstring) returns int42 AS 'int4in'
>>    language internal strict volatile with (isstrict, iscachable);
>
> Here is another idea: nuking isstrict and iscachable from CREATE
> FUNCTION syntax and forget about them. I would be tempted of the opinion
> to do that before the rest.

Thats certainly an option, I have no idea about the prevalence in real life
production environments to have much an opinion to offer.

>
> -old_aggr_elem:  IDENT '=' def_arg
> +old_aggr_elem:  PARALLEL '=' def_arg
> +               {
> +                   $$ = makeDefElem("parallel", (Node *)$3, @1);
> +               }
> +               | IDENT '=' def_arg
> Nit: alphabetical order.

Fixed.

> I have spent a couple of hours reviewing all the calls to pg_strcasecmp,
> and the only thing I have noticed is in transformRelOptions(), where the
> namespace string should be evaluated as well by strcmp, no?
<snip>
> So per my set of examples, the evaluation of the schema name should fail
> when creating relation a3 with your patch applied. As the patch does
> things, the experience for the user is not consistent, and the schema
> name goes through parser via reloption_elem using makeDefElemExtended,
> so this should use strcmp.

I believe you are entirely correct, the attached v5 patch is updated with this
behavior.  The volatility patch is unchanged by this so didn’t submit a new
version of that one.

Thanks for the review!

cheers ./daniel

Attachment

defname_strcmp-v5.patch

Re: [HACKERS] Refactoring identifier checks to consistently usestrcmp

From

Michael Paquier

Date:

15 January 2018, 04:33:41

On Fri, Jan 12, 2018 at 11:35:48PM +0100, Daniel Gustafsson wrote:
> On 11 Jan 2018, at 09:01, Michael Paquier <michael.paquier@gmail.com> wrote:
>> I would like to think that a special section
>> dedicated to option compatibility for each command would be welcome to
>> track which grammar is supported and which grammar is not supported.
>
> I’m not sure I follow?
>
>>> One open question from this excercise is how to write a good test for
>>> this.  It can either be made part of the already existing test queries
>>> or a separate suite.  I’m leaning on the latter simply because the
>>> case-flipping existing tests seems like something that can be cleaned
>>> up years from now accidentally because it looks odd.
>>
>> Adding them into src/test/regress/ sounds like a good plan to me.
>
> If there is interest in this patch now that the list exists and aids review, I
> can turn the list into a proper test that makes a little more sense than the
> current list which is rather aimed at helping reviewers.

Sorry if my words were hard to catch here. What I mean here is to add in
each command's test file the set of commands which check the
compatibility. There is no need to test all the options in my opinion,
as just testing one option is enoughto show the purpose. So for example
to cover the grounds of DefineAggregate(), you could add a set of
commands in create_aggregate.sql. For DefineCollation(), those can go in
collate.sql, etc.

>> Here is another idea: nuking isstrict and iscachable from CREATE
>> FUNCTION syntax and forget about them. I would be tempted of the opinion
>> to do that before the rest.
>
> Thats certainly an option, I have no idea about the prevalence in real life
> production environments to have much an opinion to offer.

Please let me raise a new thread about this point with a proper
patch. That's rather separate to the work you are doing here, even if
those parameters are using pg_strcasecmp().
--
Michael

Attachment

signature.asc

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Daniel Gustafsson

Date:

22 January 2018, 14:48:45

> On 15 Jan 2018, at 02:33, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Fri, Jan 12, 2018 at 11:35:48PM +0100, Daniel Gustafsson wrote:
>> On 11 Jan 2018, at 09:01, Michael Paquier <michael.paquier@gmail.com> wrote:

>>>> One open question from this excercise is how to write a good test for
>>>> this.  It can either be made part of the already existing test queries
>>>> or a separate suite.  I’m leaning on the latter simply because the
>>>> case-flipping existing tests seems like something that can be cleaned
>>>> up years from now accidentally because it looks odd.
>>>
>>> Adding them into src/test/regress/ sounds like a good plan to me.
>>
>> If there is interest in this patch now that the list exists and aids review, I
>> can turn the list into a proper test that makes a little more sense than the
>> current list which is rather aimed at helping reviewers.
>
> Sorry if my words were hard to catch here. What I mean here is to add in
> each command's test file the set of commands which check the
> compatibility. There is no need to test all the options in my opinion,
> as just testing one option is enoughto show the purpose. So for example
> to cover the grounds of DefineAggregate(), you could add a set of
> commands in create_aggregate.sql. For DefineCollation(), those can go in
> collate.sql, etc.

Gotcha.  I’ve added some tests to the patch.  The test for CREATE FUNCTION was
omitted for now awaiting the outcome of the discussion around isStrict
isCachable.

Not sure how much they’re worth in "make check” though as it’s quite easy to
add a new option checked with pg_strcasecmp which then isn’t tested.  Still, it
might aid review so definitely worth it.

cheers ./daniel

Attachment

defname_strcmp-v6.patch

Re: [HACKERS] Refactoring identifier checks to consistently usestrcmp

From

Michael Paquier

Date:

23 January 2018, 07:52:04

On Mon, Jan 22, 2018 at 12:48:45PM +0100, Daniel Gustafsson wrote:
> Gotcha.  I’ve added some tests to the patch.  The test for CREATE
> FUNCTION was omitted for now awaiting the outcome of the discussion
> around isStrict and isCachable.

That makes sense.

> Not sure how much they’re worth in "make check” though as it’s quite
> easy to add a new option checked with pg_strcasecmp which then isn’t
> tested. Still, it might aid review so definitely worth it.

Thanks for making those, this eases the review lookups. There are a
couple of code paths that remained untested:
- contrib/unaccent/
- contrib/dict_xsyn/
- contrib/dict_int/
- The combinations of toast reloptions is pretty particular as option
namespaces also go through pg_strcasecmp, so I think that those should
be tested as well (your patch includes a test for fillfactor, which is a
good base by the way).
- check_option for CREATE VIEW and ALTER VIEW SET.
- ALTER TEXT SEARCH CONFIGURATION for copy/parser options.
- CREATE TYPE AS RANGE with DefineRange().

There are as well two things I have spotted on the way:
1) fillRelOptions() can safely use strcmp.
2) indexam.sgml mentions using pg_strcasecmp() for consistency with the
core code when defining amproperty for an index AM. Well, with this
patch I think that for consistency with the core code that would involve
using strcmp instead, extension developers can of course still use
pg_strcasecmp.
Those are changed as well in the attached, which applies on top of your
v6. I have added as well in it the tests I spotted were missing. If this
looks right to you, I am fine to switch this patch as ready for
committer, without considering the issues with isCachable and isStrict
in CREATE FUNCTION of course.
--
Michael

Attachment

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Daniel Gustafsson

Date:

23 January 2018, 18:22:22

> On 23 Jan 2018, at 05:52, Michael Paquier <michael.paquier@gmail.com> wrote:
>
> On Mon, Jan 22, 2018 at 12:48:45PM +0100, Daniel Gustafsson wrote:
>
>> Not sure how much they’re worth in "make check” though as it’s quite
>> easy to add a new option checked with pg_strcasecmp which then isn’t
>> tested. Still, it might aid review so definitely worth it.
>
> Thanks for making those, this eases the review lookups. There are a
> couple of code paths that remained untested:
> - contrib/unaccent/
> - contrib/dict_xsyn/
> - contrib/dict_int/
> - The combinations of toast reloptions is pretty particular as option
> namespaces also go through pg_strcasecmp, so I think that those should
> be tested as well (your patch includes a test for fillfactor, which is a
> good base by the way).
> - check_option for CREATE VIEW and ALTER VIEW SET.
> - ALTER TEXT SEARCH CONFIGURATION for copy/parser options.
> - CREATE TYPE AS RANGE with DefineRange().

Thanks, those are good points.

> There are as well two things I have spotted on the way:
> 1) fillRelOptions() can safely use strcmp.

Yes, I believe you’re right.

> 2) indexam.sgml mentions using pg_strcasecmp() for consistency with the
> core code when defining amproperty for an index AM. Well, with this
> patch I think that for consistency with the core code that would involve
> using strcmp instead, extension developers can of course still use
> pg_strcasecmp.

That part I’m less sure about, the propname will not be casefolded by the
parser so pg_strcasecmp() should still be the recommendation here no?

> Those are changed as well in the attached, which applies on top of your
> v6. I have added as well in it the tests I spotted were missing. If this
> looks right to you, I am fine to switch this patch as ready for
> committer, without considering the issues with isCachable and isStrict
> in CREATE FUNCTION of course.

Apart from the amproperty hunk, I’m definately +1 on adding your patch on top
of my v6 one.  Thanks for all your help and review!

cheers ./daniel

Re: [HACKERS] Refactoring identifier checks to consistently usestrcmp

From

Michael Paquier

Date:

24 January 2018, 04:37:49

On Tue, Jan 23, 2018 at 04:22:22PM +0100, Daniel Gustafsson wrote:
> On 23 Jan 2018, at 05:52, Michael Paquier <michael.paquier@gmail.com> wrote:
>> 2) indexam.sgml mentions using pg_strcasecmp() for consistency with the
>> core code when defining amproperty for an index AM. Well, with this
>> patch I think that for consistency with the core code that would involve
>> using strcmp instead, extension developers can of course still use
>> pg_strcasecmp.
>
> That part I’m less sure about, the propname will not be casefolded by the
> parser so pg_strcasecmp() should still be the recommendation here no?

Yes, you are right. I had a brain fade here as all the option names here
go through SQL-callable functions.

>> Those are changed as well in the attached, which applies on top of your
>> v6. I have added as well in it the tests I spotted were missing. If this
>> looks right to you, I am fine to switch this patch as ready for
>> committer, without considering the issues with isCachable and isStrict
>> in CREATE FUNCTION of course.
>
> Apart from the amproperty hunk, I’m definately +1 on adding your patch on top
> of my v6 one.  Thanks for all your help and review!

OK. Could you publish a v7? I will switch the entry as ready for
committer.
--
Michael

Attachment

signature.asc

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Daniel Gustafsson

Date:

24 January 2018, 11:47:50

> On 24 Jan 2018, at 02:37, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Tue, Jan 23, 2018 at 04:22:22PM +0100, Daniel Gustafsson wrote:
>> On 23 Jan 2018, at 05:52, Michael Paquier <michael.paquier@gmail.com> wrote:

>>> Those are changed as well in the attached, which applies on top of your
>>> v6. I have added as well in it the tests I spotted were missing. If this
>>> looks right to you, I am fine to switch this patch as ready for
>>> committer, without considering the issues with isCachable and isStrict
>>> in CREATE FUNCTION of course.
>>
>> Apart from the amproperty hunk, I’m definately +1 on adding your patch on top
>> of my v6 one.  Thanks for all your help and review!
>
> OK. Could you publish a v7? I will switch the entry as ready for
> committer.

Attached is a rebased v7 patch which has your amendments (minus propname) which
passes make check without errors.

The volatility patch is also rebased included, but there the discussion whether
to keep or drop the deprecated syntax first needs to happen (started in your
20180115022748.GB1724@paquier.xyz mail).

cheers ./daniel

Attachment

Re: [HACKERS] Refactoring identifier checks to consistently usestrcmp

From

Michael Paquier

Date:

24 January 2018, 13:10:40

On Wed, Jan 24, 2018 at 09:47:50AM +0100, Daniel Gustafsson wrote:
> Attached is a rebased v7 patch which has your amendments (minus
> propname) which passes make check without errors.

Confirmed. I am switching the status as ready for committer for
volatility-v7.patch then.

> The volatility patch is also rebased included, but there the
> discussion whether to keep or drop the deprecated syntax first needs
> to happen (started in your 20180115022748.GB1724@paquier.xyz mail).

Yes, let's see what happens later for this thread.
--
Michael

Attachment

signature.asc

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Tom Lane

Date:

27 January 2018, 00:32:17

Michael Paquier <michael.paquier@gmail.com> writes:
> On Wed, Jan 24, 2018 at 09:47:50AM +0100, Daniel Gustafsson wrote:
>> Attached is a rebased v7 patch which has your amendments (minus
>> propname) which passes make check without errors.

> Confirmed. I am switching the status as ready for committer for
> volatility-v7.patch then.

Poking through this, I notice that there are two reloptions-related
"pg_strncasecmp" calls that did not get converted to "strncmp":
reloptions.c:804 and reloptions.h:169.

Is that an oversight, or intentional, and if intentional why?

            regards, tom lane

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Daniel Gustafsson

Date:

27 January 2018, 01:17:42

> On 26 Jan 2018, at 22:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Wed, Jan 24, 2018 at 09:47:50AM +0100, Daniel Gustafsson wrote:
>>> Attached is a rebased v7 patch which has your amendments (minus
>>> propname) which passes make check without errors.
>
>> Confirmed. I am switching the status as ready for committer for
>> volatility-v7.patch then.
>
> Poking through this,

Thanks!

> I notice that there are two reloptions-related
> "pg_strncasecmp" calls that did not get converted to "strncmp":
> reloptions.c:804

The way I read transformRelOptions(), oldOptions is not guaranteed to come from
the parser (though in reality it probably will be).  The namespace isn’t either
but passing an uppercase namespace should never be valid AFAICT, hence the
patch changing it to case sensitive comparison.

> and reloptions.h:169.

Oversight, completely missed that one.

cheers ./daniel

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Tom Lane

Date:

27 January 2018, 01:58:48

Daniel Gustafsson <daniel@yesql.se> writes:
>> On 26 Jan 2018, at 22:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I notice that there are two reloptions-related
>> "pg_strncasecmp" calls that did not get converted to "strncmp":
>> reloptions.c:804

> The way I read transformRelOptions(), oldOptions is not guaranteed to
> come from the parser (though in reality it probably will be).

Well, one response to that is that it should contain values that were
deemed acceptable at some previous time.  If we allowed catalog contents
to be migrated physically across major releases, we'd need to worry about
having mixed-case reloptions in a v11 catalog ... but we don't, so we
won't.  The old reloptions should always be ones that got through
parseRelOptions before, and those now will always have to be exact case.

Another response is that leaving it like this would mean that
transformRelOptions and parseRelOptions have different opinions about
whether two option names are the same or not, which seems to me to be
exactly the same sort of bug hazard that you were on about at the
beginning of this thread.

> The namespace isn’t either
> but passing an uppercase namespace should never be valid AFAICT, hence the
> patch changing it to case sensitive comparison.

Well, it's no more or less valid than an uppercase option name ...

>> and reloptions.h:169.

> Oversight, completely missed that one.

It looks like no core code uses that macro, so it's got no effect
unless some third-party code does ... but I changed it anyway.

            regards, tom lane

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Daniel Gustafsson

Date:

27 January 2018, 02:16:11

> On 26 Jan 2018, at 23:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>>> On 26 Jan 2018, at 22:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I notice that there are two reloptions-related
>>> "pg_strncasecmp" calls that did not get converted to "strncmp":
>>> reloptions.c:804
>
>> The way I read transformRelOptions(), oldOptions is not guaranteed to
>> come from the parser (though in reality it probably will be).
>
> Well, one response to that is that it should contain values that were
> deemed acceptable at some previous time.  If we allowed catalog contents
> to be migrated physically across major releases, we'd need to worry about
> having mixed-case reloptions in a v11 catalog ... but we don't, so we
> won't.  The old reloptions should always be ones that got through
> parseRelOptions before, and those now will always have to be exact case.

That’s a good point, the reloptions will only ever come from the same major
version.

> Another response is that leaving it like this would mean that
> transformRelOptions and parseRelOptions have different opinions about
> whether two option names are the same or not, which seems to me to be
> exactly the same sort of bug hazard that you were on about at the
> beginning of this thread.

Completely agree.

>> The namespace isn’t either
>> but passing an uppercase namespace should never be valid AFAICT, hence the
>> patch changing it to case sensitive comparison.
>
> Well, it's no more or less valid than an uppercase option name …

Agreed.  Taking a step back and thinking it’s clear that the comparison in the
oldoptions loop should’ve been changed in the patch for it to be consistent
with the original objective.

cheers ./daniel

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Tom Lane

Date:

27 January 2018, 02:36:51

I've pushed this mostly as-is.  I fixed the missed places in reloptions
code as we discussed.  I also took out the parser changes related to
allowing unquoted PARALLEL in old-style CREATE AGGREGATE, because that
is not a goal I consider worthy of adding extra grammar complexity.
We don't document that PARALLEL works there, and there has never been
any expectation that deprecated legacy syntax would grow new options
as needed to have feature parity with the modern syntax.  I don't feel
a need to go out of our way to prevent new options from working in the
old syntax, if they happen to, but I definitely don't want to expend
code on making them do so.

Accordingly, that regression test that expected PARALLEL to work in
the old-style syntax was just ill-considered, and I changed it to use
the new-style syntax instead.

I also trimmed the new regression test cases a bit, as most of them seemed
pretty duplicative.  How many times do we need to verify that the lexer
doesn't downcase quoted identifiers?  I'm not really sure that checking
this behavior from SQL is useful at all, actually.  The interesting
question is whether there are code paths that can reach these string
comparisons with strings that *didn't* get processed as identifiers by the
lexer, and these tests do basically nothing to prove that there aren't.
Still, I think we can push this and wait to see if there are complaints.

            regards, tom lane

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Daniel Gustafsson

Date:

27 January 2018, 23:19:22

> On 27 Jan 2018, at 00:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I've pushed this mostly as-is.

Thanks!

> I also took out the parser changes related to
> allowing unquoted PARALLEL in old-style CREATE AGGREGATE, because that
> is not a goal I consider worthy of adding extra grammar complexity.
> We don't document that PARALLEL works there, and there has never been
> any expectation that deprecated legacy syntax would grow new options
> as needed to have feature parity with the modern syntax.

Ok, didn’t know old syntax wasn’t extended to support new options so I went
after it having run into the regress tests using it.

> I also trimmed the new regression test cases a bit, as most of them seemed
> pretty duplicative.

Makes sense, they were made verbose to aid review but were too chatty for
inclusion.

cheers ./daniel

Re: [HACKERS] Refactoring identifier checks to consistently usestrcmp

From

Michael Paquier

Date:

01 February 2018, 03:56:37

On Fri, Jan 26, 2018 at 05:58:48PM -0500, Tom Lane wrote:
> Daniel Gustafsson <daniel@yesql.se> writes:
>> Oversight, completely missed that one.
>
> It looks like no core code uses that macro, so it's got no effect
> unless some third-party code does ... but I changed it anyway.

Good point.  I missed this one as well.  I have double-checked the core
code and it seems to me that we are clear now.
--
Michael

Attachment

signature.asc

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

From

Robert Haas

Date:

01 February 2018, 22:03:23

On Fri, Jan 26, 2018 at 6:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I've pushed this mostly as-is.  I fixed the missed places in reloptions
> code as we discussed.  I also took out the parser changes related to
> allowing unquoted PARALLEL in old-style CREATE AGGREGATE, because that
> is not a goal I consider worthy of adding extra grammar complexity.
> We don't document that PARALLEL works there, and there has never been
> any expectation that deprecated legacy syntax would grow new options
> as needed to have feature parity with the modern syntax.  I don't feel
> a need to go out of our way to prevent new options from working in the
> old syntax, if they happen to, but I definitely don't want to expend
> code on making them do so.
>
> Accordingly, that regression test that expected PARALLEL to work in
> the old-style syntax was just ill-considered, and I changed it to use
> the new-style syntax instead.
>
> I also trimmed the new regression test cases a bit, as most of them seemed
> pretty duplicative.  How many times do we need to verify that the lexer
> doesn't downcase quoted identifiers?  I'm not really sure that checking
> this behavior from SQL is useful at all, actually.  The interesting
> question is whether there are code paths that can reach these string
> comparisons with strings that *didn't* get processed as identifiers by the
> lexer, and these tests do basically nothing to prove that there aren't.
> Still, I think we can push this and wait to see if there are complaints.

I think it's a shame that the commit message didn't document (for the
release notes) exactly which cases just got changed incompatibly.  I
admit that not many people are likely to get bitten by this, but I
still think it's better if we're precise about what might cause a
particular user to be in that set.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Refactoring identifier checks to consistently usestrcmp

From

Michael Paquier

Date:

02 February 2018, 07:13:49

On Thu, Feb 01, 2018 at 02:03:23PM -0500, Robert Haas wrote:
> I think it's a shame that the commit message didn't document (for the
> release notes) exactly which cases just got changed incompatibly.  I
> admit that not many people are likely to get bitten by this, but I
> still think it's better if we're precise about what might cause a
> particular user to be in that set.

v7 posted in [1] of the patch was doing a pretty good job on this side
because it included regression tests for all the code paths involved by
the change.  The final commit has shaved some of them, but here is a
list for reference based on my notes:
CREATE/ALTER TEXT SEARCH DICTIONARY
CREATE TEXT SEARCH TEMPLATE
CREATE TEXT SEARCH PARSER
CREATE/ALTER OPERATOR
CREATE COLLATION
CREATE AGGREGATE
CREATE/ALTER OPERATOR
CREATE/ALTER TABLE
CREATE TYPE
CREATE/ALTER VIEW

[1]: https://www.postgresql.org/message-id/62991614-9673-4276-99CC-6754E7A0572F%40yesql.se
--
Michael

Attachment

signature.asc