Thread: pgindent && weirdness
I just ran pgindent over some patch, and noticed that this hunk ended up in my working tree: diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c index 861a9148ed..fff54062b0 100644 --- a/src/backend/statistics/extended_stats.c +++ b/src/backend/statistics/extended_stats.c @@ -1405,13 +1405,13 @@ examine_opclause_expression(OpExpr *expr, Var **varp, Const **cstp, bool *varonl if (IsA(rightop, RelabelType)) rightop = (Node *) ((RelabelType *) rightop)->arg; - if (IsA(leftop, Var) && IsA(rightop, Const)) + if (IsA(leftop, Var) &&IsA(rightop, Const)) { var = (Var *) leftop; cst = (Const *) rightop; varonleft = true; } - else if (IsA(leftop, Const) && IsA(rightop, Var)) + else if (IsA(leftop, Const) &&IsA(rightop, Var)) { var = (Var *) rightop; cst = (Const *) leftop; This seems a really strange change; this git grep '&&[^([:space:]]' -- *.c shows that we already have a dozen or so occurrences already. (That's ignoring execExprInterp.c's use of computed gotos.) I don't care all that much, but wanted to throw it out in case somebody is specifically interested in studying pgindent's logic, since the last round of changes has yielded excellent results. Thanks, -- Álvaro Herrera PostgreSQL Expert, https://www.2ndQuadrant.com/
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I just ran pgindent over some patch, and noticed that this hunk ended up > in my working tree: > - if (IsA(leftop, Var) && IsA(rightop, Const)) > + if (IsA(leftop, Var) &&IsA(rightop, Const)) Yeah, it's been doing that for decades. I think the triggering factor is the typedef name (Var, here) preceding the &&. It'd be nice to fix properly, but I've tended to take the path of least resistance by breaking such lines to avoid the ugliness: if (IsA(leftop, Var) && IsA(rightop, Const)) regards, tom lane
On Tue, Jan 14, 2020 at 05:30:21PM -0500, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > I just ran pgindent over some patch, and noticed that this hunk ended up > > in my working tree: > > > - if (IsA(leftop, Var) && IsA(rightop, Const)) > > + if (IsA(leftop, Var) &&IsA(rightop, Const)) > > Yeah, it's been doing that for decades. I think the triggering > factor is the typedef name (Var, here) preceding the &&. > > It'd be nice to fix properly, but I've tended to take the path > of least resistance by breaking such lines to avoid the ugliness: > > if (IsA(leftop, Var) && > IsA(rightop, Const)) In the past I would use a post-processing step after BSD indent to fix up these problems. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Wed, Jan 15, 2020 at 11:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > I just ran pgindent over some patch, and noticed that this hunk ended up > > in my working tree: > > > - if (IsA(leftop, Var) && IsA(rightop, Const)) > > + if (IsA(leftop, Var) &&IsA(rightop, Const)) > > Yeah, it's been doing that for decades. I think the triggering > factor is the typedef name (Var, here) preceding the &&. > > It'd be nice to fix properly, but I've tended to take the path > of least resistance by breaking such lines to avoid the ugliness: > > if (IsA(leftop, Var) && > IsA(rightop, Const)) I am on vacation away from the Internet this week but somehow saw this on my phone and couldn't stop myself from peeking at pg_bsd_ident again. Yeah, "(Var)" (where Var is a known typename) causes it to think that any following operator must be unary. One way to fix that in the cases Alvaro is referring to is to tell override the setting so that && (and likewise ||) are never considered to be unary, though I haven't tested this much and there are surely other ways to achieve this: diff --git a/lexi.c b/lexi.c index d43723c..6de3227 100644 --- a/lexi.c +++ b/lexi.c @@ -655,6 +655,12 @@ stop_lit: unary_delim = state->last_u_d; break; } + + /* && and || are never unary */ + if ((token[0] == '&' && *buf_ptr == '&') || + (token[0] == '|' && *buf_ptr == '|')) + state->last_u_d = false; + while (*(e_token - 1) == *buf_ptr || *buf_ptr == '=') { /* * handle ||, &&, etc, and also things as in int *****i The problem with that is that && sometimes *should* be formatted like a unary operator: when it's part of the nonstandard GCC computed goto syntax.
On Thu, Jan 16, 2020 at 3:59 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Wed, Jan 15, 2020 at 11:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Yeah, it's been doing that for decades. I think the triggering > > factor is the typedef name (Var, here) preceding the &&. Here's a better fix: diff --git a/indent.c b/indent.c index 9faf57a..51a60a6 100644 --- a/indent.c +++ b/indent.c @@ -570,8 +570,11 @@ check_type: ps.in_or_st = false; /* turn off flag for structure decl or * initialization */ } - /* parenthesized type following sizeof or offsetof is not a cast */ - if (ps.keyword == 1 || ps.keyword == 2) + /* + * parenthesized type following sizeof or offsetof is not a cast; + * likewise for function-like macros that take a type + */ + if (ps.keyword == 1 || ps.keyword == 2 || ps.last_token == ident) ps.not_cast_mask |= 1 << ps.p_l_follow; break;
On 2020-Jan-17, Thomas Munro wrote: > On Thu, Jan 16, 2020 at 3:59 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Wed, Jan 15, 2020 at 11:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Yeah, it's been doing that for decades. I think the triggering > > > factor is the typedef name (Var, here) preceding the &&. > > Here's a better fix: This is indeed a very good fix! Several badly formatted sites in our code are improved with this change. Thanks, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jan 16, 2020 at 06:13:36PM -0300, Alvaro Herrera wrote: > This is indeed a very good fix! Several badly formatted sites in our > code are improved with this change. Nice find! Could you commit that? I can see many places improved as well, among explain.c, tablecmds.c, typecmds.c, and much more. -- Michael
Attachment
On 2020-Jan-17, Michael Paquier wrote: > On Thu, Jan 16, 2020 at 06:13:36PM -0300, Alvaro Herrera wrote: > > This is indeed a very good fix! Several badly formatted sites in our > > code are improved with this change. > > Nice find! Could you commit that? I can see many places improved as > well, among explain.c, tablecmds.c, typecmds.c, and much more. I think Tom is the only one who can commit that, https://git.postgresql.org/gitweb/?p=pg_bsd_indent.git -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Jan-17, Michael Paquier wrote: >> Nice find! Could you commit that? I can see many places improved as >> well, among explain.c, tablecmds.c, typecmds.c, and much more. > I think Tom is the only one who can commit that, > https://git.postgresql.org/gitweb/?p=pg_bsd_indent.git I don't *think* that repo is locked down that hard --- IIRC, PG committers should have access to it. But I was hoping to hear Piotr's opinion before moving forward. regards, tom lane
On 2020-Jan-16, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > On 2020-Jan-17, Michael Paquier wrote: > >> Nice find! Could you commit that? I can see many places improved as > >> well, among explain.c, tablecmds.c, typecmds.c, and much more. > > > I think Tom is the only one who can commit that, > > https://git.postgresql.org/gitweb/?p=pg_bsd_indent.git > > I don't *think* that repo is locked down that hard --- IIRC, > PG committers should have access to it. But I was hoping to > hear Piotr's opinion before moving forward. FWIW I think this code predates Piotr's involvement, I think; at least, it was already there in the FreeBSD code he imported: https://github.com/pstef/freebsd_indent/commit/55c29a8774923f2d40fef7919b9490f61e57e7bb#diff-85c94ae15198235e2363f96216b9a1b2R565 -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Jan-16, Tom Lane wrote: >> ... But I was hoping to >> hear Piotr's opinion before moving forward. > FWIW I think this code predates Piotr's involvement, I think; at least, > it was already there in the FreeBSD code he imported: > https://github.com/pstef/freebsd_indent/commit/55c29a8774923f2d40fef7919b9490f61e57e7bb#diff-85c94ae15198235e2363f96216b9a1b2R565 The roots of that code are even older than Postgres, I believe, and there may not be anybody left who understands it completely. But Piotr has certainly spent more time looking at it than I have, so I'd still like to hear what he thinks. regards, tom lane
On Fri, Jan 17, 2020 at 3:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > On 2020-Jan-16, Tom Lane wrote: > >> ... But I was hoping to > >> hear Piotr's opinion before moving forward. Me too. Thinking about this again: It's obviously not true that everything that looks like a function call is not a cast. You could have "my_cast(Type)" that expands to "(Type)" or some slightly more useful variant of that, and then "my_cast(Type) -1" would, with this patch applied, be reformatted as "my_cast(Type) - 1" because it'd err on the side of thinking that the expression produces a value and therefore the minus sign must be a binary operator that needs whitespace on both sides, and that'd be wrong. However, it seems to me that macros that expand to raw cast syntax (and I mean just "(Type)", not a complete cast including the value to be cast, like "((Type) (x))") must be rare and unusual things, and I think it's better to err on the side of thinking that function-like macros are values, not casts. That's all the change does, and fortunately the authors of indent showed how to do that with their existing special cases for offsetof and sizeof; I'm just extending that treatment to any identifier. Is there some other case I'm not thinking of that is confused by the change? I'm sure you could contrive something it screws up, but my question is about real code that people would actually write. Piotr, is there an easy way to reindent some large non-PostgreSQL body of code that uses a cousin of this code to see if it gets better or worse with the change?
On 2020-Feb-17, Thomas Munro wrote: > Thinking about this again: It's obviously not true that everything > that looks like a function call is not a cast. You could have > "my_cast(Type)" that expands to "(Type)" or some slightly more useful > variant of that, and then "my_cast(Type) -1" would, with this patch > applied, be reformatted as "my_cast(Type) - 1" because it'd err on the > side of thinking that the expression produces a value and therefore > the minus sign must be a binary operator that needs whitespace on both > sides, and that'd be wrong. However, it seems to me that macros that > expand to raw cast syntax (and I mean just "(Type)", not a complete > cast including the value to be cast, like "((Type) (x))") must be rare > and unusual things, and I think it's better to err on the side of > thinking that function-like macros are values, not casts. That's all > the change does, and fortunately the authors of indent showed how to > do that with their existing special cases for offsetof and sizeof; I'm > just extending that treatment to any identifier. Hmm ... this suggests to me that if you remove these alleged special cases for offsetof and sizeof, the new code handles them correctly anyway. Do you think it's worth giving that a try? Not because removing the special cases would have any value, but rather to see if anything interesting pops up. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Feb 18, 2020 at 4:35 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Hmm ... this suggests to me that if you remove these alleged special > cases for offsetof and sizeof, the new code handles them correctly > anyway. Do you think it's worth giving that a try? Not because > removing the special cases would have any value, but rather to see if > anything interesting pops up. Good thought, since keywords also have last_token == ident so it's redundant to check the keyword. But while testing that I realised that either way we get this wrong: - return (int) *s1 - (int) *s2; + return (int) * s1 - (int) *s2; So I think the right formulation is one that allows offsetof and sizeof to receive not-a-cast treatment, but not any other known keyword: diff --git a/indent.c b/indent.c index 9faf57a..ed6dce2 100644 --- a/indent.c +++ b/indent.c @@ -570,8 +570,15 @@ check_type: ps.in_or_st = false; /* turn off flag for structure decl or * initialization */ } - /* parenthesized type following sizeof or offsetof is not a cast */ - if (ps.keyword == 1 || ps.keyword == 2) + /* + * parenthesized type following sizeof or offsetof is not a + * cast; we also assume the same about similar macros, + * so if there is any other non-keyword identifier immediately + * preceding a type name in parens we won't consider it to be + * a cast + */ + if (ps.last_token == ident && + (ps.keyword == 0 || ps.keyword == 1 || ps.keyword == 2)) ps.not_cast_mask |= 1 << ps.p_l_follow; break; Another problem is that there is one thing in our tree that looks like a non-cast under the new rule, but it actually expands to a type name, so now we get that wrong! (I mean, unpatched indent doesn't really understand it either, it thinks it's a cast, but at least it knows the following * is not a binary operator): - STACK_OF(X509_NAME) *root_cert_list = NULL; + STACK_OF(X509_NAME) * root_cert_list = NULL; That's a macro from an OpenSSL header. Not sure what to do about that.
Thomas Munro <thomas.munro@gmail.com> writes: > Another problem is that there is one thing in our tree that looks like > a non-cast under the new rule, but it actually expands to a type name, > so now we get that wrong! (I mean, unpatched indent doesn't really > understand it either, it thinks it's a cast, but at least it knows the > following * is not a binary operator): > - STACK_OF(X509_NAME) *root_cert_list = NULL; > + STACK_OF(X509_NAME) * root_cert_list = NULL; > That's a macro from an OpenSSL header. Not sure what to do about that. If we get that wrong, but a hundred other places look better, I'm not too fussed about it. regards, tom lane
On Tue, Feb 18, 2020 at 12:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Another problem is that there is one thing in our tree that looks like > > a non-cast under the new rule, but it actually expands to a type name, > > so now we get that wrong! (I mean, unpatched indent doesn't really > > understand it either, it thinks it's a cast, but at least it knows the > > following * is not a binary operator): > > > - STACK_OF(X509_NAME) *root_cert_list = NULL; > > + STACK_OF(X509_NAME) * root_cert_list = NULL; > > > That's a macro from an OpenSSL header. Not sure what to do about that. > > If we get that wrong, but a hundred other places look better, > I'm not too fussed about it. Here's the patch I propose to commit to pg_bsd_indent, if the repo lets me, and here's the result of running it on the PG tree today. I suppose the principled way to fix that problem with STACK_OF(x) would be to have a user-supplied list of function-like-macros that expand to a type name, but I'm not planning to waste time on that.
Attachment
Thomas Munro <thomas.munro@gmail.com> writes: > Here's the patch I propose to commit to pg_bsd_indent, if the repo > lets me, and here's the result of running it on the PG tree today. +1. I think the repo will let you in, but if not, I can do it. regards, tom lane
On 2020-May-16, Thomas Munro wrote: > Here's the patch I propose to commit to pg_bsd_indent, if the repo > lets me, and here's the result of running it on the PG tree today. Looks good. Of all these changes in PG, only two are of the STACK_OK() nature, and there are 38 places that get better. > I suppose the principled way to fix that problem with STACK_OF(x) > would be to have a user-supplied list of function-like-macros that > expand to a type name, but I'm not planning to waste time on that. +1 on just ignoring that problem as insignificant. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-May-16, Thomas Munro wrote: >> Here's the patch I propose to commit to pg_bsd_indent, if the repo >> lets me, and here's the result of running it on the PG tree today. > Looks good. Of all these changes in PG, only two are of the STACK_OK() > nature, and there are 38 places that get better. It should also be noted that there are a lot of places where we've programmed around this silliness by artificially breaking conditions using IsA() into multiple lines. So the "38 places" is a lowball estimate of how much of a problem this has been. regards, tom lane
On Sat, May 16, 2020 at 10:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Here's the patch I propose to commit to pg_bsd_indent, if the repo > > lets me, and here's the result of running it on the PG tree today. > > +1. I think the repo will let you in, but if not, I can do it. It seems I cannot. Please go ahead. I'll eventually see if I can get this into FreeBSD's usr.bin/indent. It's possible that that process results in a request to make it optional (some project with a lot of STACK_OF- and no IsA-style macros wouldn't like it), but I don't think it hurts to commit it to our copy like this in the meantime to fix our weird formatting problem.
Thomas Munro <thomas.munro@gmail.com> writes: > On Sat, May 16, 2020 at 10:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> +1. I think the repo will let you in, but if not, I can do it. > It seems I cannot. Please go ahead. [ yawn... ] It's about bedtime here, but I'll take care of it in the morning. Off the critical path, we oughta figure out why the repo wouldn't let you commit. What I was told was it was set up to be writable by all PG committers. > I'll eventually see if I can get this into FreeBSD's usr.bin/indent. +1 to that, too. regards, tom lane
Thomas Munro <thomas.munro@gmail.com> writes: > On Sat, May 16, 2020 at 10:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> +1. I think the repo will let you in, but if not, I can do it. > It seems I cannot. Please go ahead. Pushed, and I bumped pg_bsd_indent's version to 2.1.1, and synced our core repo with that. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > It seems I cannot. Please go ahead. > > [ yawn... ] It's about bedtime here, but I'll take care of it in the > morning. > > Off the critical path, we oughta figure out why the repo wouldn't > let you commit. What I was told was it was set up to be writable > by all PG committers. Just happened to see this. Might be I'm not looking at the right thing, but from what I can tell, the repo is set up with only you as having write access. We also don't currently have updating the pg_bsd_indent repo on git.postgresql.org as part of our SOP for adding/removing committers. All of this is fixable, of course. I've CC'd this to sysadmins@ to highlight this issue and possible change to that repo and our SOP. Barring complaints or concerns, based on Tom's comments above, I'll adjust that repo to be 'owned' by pginfra, with all committers having read/write access, and add it to our committer add/remove SOP to update that repo's access list whenever there are changes. I'll plan to do that in a couple of days to allow for any concerns or questions, as this isn't a critical issue at present based on the above comments. Thanks, Stephen
Attachment
On 16/01/2020 03.59, Thomas Munro wrote: > On Wed, Jan 15, 2020 at 11:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>> I just ran pgindent over some patch, and noticed that this hunk ended up >>> in my working tree: >> >>> - if (IsA(leftop, Var) && IsA(rightop, Const)) >>> + if (IsA(leftop, Var) &&IsA(rightop, Const)) >> >> Yeah, it's been doing that for decades. I think the triggering >> factor is the typedef name (Var, here) preceding the &&. >> >> It'd be nice to fix properly, but I've tended to take the path >> of least resistance by breaking such lines to avoid the ugliness: >> >> if (IsA(leftop, Var) && >> IsA(rightop, Const)) > > I am on vacation away from the Internet this week but somehow saw this > on my phone and couldn't stop myself from peeking at pg_bsd_ident > again. Yeah, "(Var)" (where Var is a known typename) causes it to > think that any following operator must be unary. > > One way to fix that in the cases Alvaro is referring to is to tell > override the setting so that && (and likewise ||) are never considered > to be unary, though I haven't tested this much and there are surely > other ways to achieve this: > > diff --git a/lexi.c b/lexi.c > index d43723c..6de3227 100644 > --- a/lexi.c > +++ b/lexi.c > @@ -655,6 +655,12 @@ stop_lit: > unary_delim = state->last_u_d; > break; > } > + > + /* && and || are never unary */ > + if ((token[0] == '&' && *buf_ptr == '&') || > + (token[0] == '|' && *buf_ptr == '|')) > + state->last_u_d = false; > + > while (*(e_token - 1) == *buf_ptr || *buf_ptr == '=') { > /* > * handle ||, &&, etc, and also things as in int *****i > > The problem with that is that && sometimes *should* be formatted like > a unary operator: when it's part of the nonstandard GCC computed goto > syntax. These comments are made in the context of pushing this change or equivalent to FreeBSD repository. I think this is a better approach then the one committed to pg_bsd_indent. It's ubiquitous that the operators are binary, except - as you mentioned - in a nonstandard GCC syntax. The alternative given has more disadvantages, with potential impact on FreeBSD code formatting, which it should support as well as everything else -- to a reasonable extent. sys/kern/ is usually a decent litmus test, but I don't claim it should show anything interesting in this particular case. This change may seem hacky, but it would be far from the worst hack in this program's history or even in its present form. It's actually very much in indent's spirit, which is an attribute I neither support nor condemn. In any case, this change, or equivalent, should be committed to FreeBSD repository together with a test case or two.
Piotr Stefaniak <postgres@piotr-stefaniak.me> <VE1P192MB07504EB33625F9A23F41CCD0F2B70@VE1P192MB0750.EURP192.PROD.OUTLOOK.COM>writes: > On 16/01/2020 03.59, Thomas Munro wrote: >> One way to fix that in the cases Alvaro is referring to is to tell >> override the setting so that && (and likewise ||) are never considered >> to be unary, though I haven't tested this much and there are surely >> other ways to achieve this: > I think this is a better approach then the one committed to > pg_bsd_indent. It's ubiquitous that the operators are binary, except - > as you mentioned - in a nonstandard GCC syntax. The alternative given > has more disadvantages, with potential impact on FreeBSD code > formatting, which it should support as well as everything else -- to a > reasonable extent. sys/kern/ is usually a decent litmus test, but I > don't claim it should show anything interesting in this particular case. I think that the fix we chose is better, at least for our purposes. You can see its effects on our source tree at https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=fa27dd40d5c5f56a1ee837a75c97549e992e32a4 and while certainly most of the diffs are around && or || operators, there are a fair number that are not, such as - dummy_lex.input = unconstify(char *, str) +1; + dummy_lex.input = unconstify(char *, str) + 1; or more interestingly - strncmp(text, "pg_", 3) !=0) + strncmp(text, "pg_", 3) != 0) where the previous misformatting is because "text" is a known typedef name. So it appears to me that this change reduces the misformatting cost of typedef names that chance to match field or variable names, and that's actually quite a big issue for us. We have, ATM, 3574 known typedefs in typedefs.list, a fair number of which are not under our control (since they come from system headers on various platforms). So it's inevitable that there are going to be collisions. In short, I'm inclined to stick with the hack we've got. I'm sad that it will result in further divergence from FreeBSD indent; but it does useful stuff for us, and I don't want to give it up. (That said, I don't see any penalty to carrying both changes; so we'll probably also absorb the &&/|| change at some convenient time.) regards, tom lane