Thread: Add support for AT LOCAL
The Standard defines time zone conversion as follows: <datetime factor> ::= <datetime primary> [ <time zone> ] <time zone> ::= AT <time zone specifier> <time zone specifier> ::= LOCAL | TIME ZONE <interval primary> While looking at something else, I noticed we do not support AT LOCAL. The local time zone is defined as that of *the session*, not the server, which can make this quite interesting in views where the view will automatically adjust to the session's time zone. Patch against 3f1aaaa180 attached. -- Vik Fearing
Attachment
On Mon, 2023-06-05 at 23:13 -0400, Vik Fearing wrote: > The Standard defines time zone conversion as follows: > > <datetime factor> ::= > <datetime primary> [ <time zone> ] > > <time zone> ::= > AT <time zone specifier> > > <time zone specifier> ::= > LOCAL > | TIME ZONE <interval primary> > > > While looking at something else, I noticed we do not support AT LOCAL. > The local time zone is defined as that of *the session*, not the server, > which can make this quite interesting in views where the view will > automatically adjust to the session's time zone. > > Patch against 3f1aaaa180 attached. +1 on the idea; it should be faily trivial, if not very useful. At a quick glance, it looks like you resolve "timezone" at the time the query is parsed. Shouldn't the resolution happen at query execution time? Yours, Laurenz Albe
On 6/6/23 03:56, Laurenz Albe wrote: > On Mon, 2023-06-05 at 23:13 -0400, Vik Fearing wrote: >> The Standard defines time zone conversion as follows: >> >> <datetime factor> ::= >> <datetime primary> [ <time zone> ] >> >> <time zone> ::= >> AT <time zone specifier> >> >> <time zone specifier> ::= >> LOCAL >> | TIME ZONE <interval primary> >> >> >> While looking at something else, I noticed we do not support AT LOCAL. >> The local time zone is defined as that of *the session*, not the server, >> which can make this quite interesting in views where the view will >> automatically adjust to the session's time zone. >> >> Patch against 3f1aaaa180 attached. > > +1 on the idea; it should be faily trivial, if not very useful. Thanks. > At a quick glance, it looks like you resolve "timezone" at the time > the query is parsed. Shouldn't the resolution happen at query > execution time? current_setting(text) is stable, and my tests show that it is calculated at execution time. postgres=# prepare x as values (now() at local); PREPARE postgres=# set timezone to 'UTC'; SET postgres=# execute x; column1 ---------------------------- 2023-06-06 08:23:02.088634 (1 row) postgres=# set timezone to 'Asia/Pyongyang'; SET postgres=# execute x; column1 ---------------------------- 2023-06-06 17:23:14.837219 (1 row) Am I missing something? -- Vik Fearing
On Tue, 2023-06-06 at 04:24 -0400, Vik Fearing wrote: > > At a quick glance, it looks like you resolve "timezone" at the time > > the query is parsed. Shouldn't the resolution happen at query > > execution time? > > current_setting(text) is stable, and my tests show that it is calculated > at execution time. Ah, ok, then sorry for the noise. I misread the code then. Yours, Laurenz Albe
On 2023-Jun-06, Laurenz Albe wrote: > At a quick glance, it looks like you resolve "timezone" at the time > the query is parsed. Shouldn't the resolution happen at query > execution time? Sounds like it -- consider the case where the timestamp value is a partition key and one of the partition boundaries falls in between two timezone offsets for some particular ts value; then you use a prepared query to read from a view defined with AT LOCAL. Partition pruning would need to compute partitions to read from at runtime, not plan time. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On 6/12/23 17:37, Alvaro Herrera wrote: > On 2023-Jun-06, Laurenz Albe wrote: > >> At a quick glance, it looks like you resolve "timezone" at the time >> the query is parsed. Shouldn't the resolution happen at query >> execution time? > > Sounds like it -- consider the case where the timestamp value is a > partition key and one of the partition boundaries falls in between two > timezone offsets for some particular ts value; then you use a prepared > query to read from a view defined with AT LOCAL. Partition pruning > would need to compute partitions to read from at runtime, not plan time. Can you show me an example of that happening with my patch? -- Vik Fearing
> On 6 Jun 2023, at 05:13, Vik Fearing <vik@postgresfriends.org> wrote: > Patch against 3f1aaaa180 attached. This patch fails to compile, the declaration of variables in the switch block needs to be scoped within a { } block. I've fixed this trivial error in the attached v2 and also reflowed the comments which now no longer fit. -- Daniel Gustafsson
Attachment
On 7/3/23 15:42, Daniel Gustafsson wrote: >> On 6 Jun 2023, at 05:13, Vik Fearing <vik@postgresfriends.org> wrote: > >> Patch against 3f1aaaa180 attached. > > This patch fails to compile, the declaration of variables in the switch block > needs to be scoped within a { } block. Interesting. It compiles for me. > I've fixed this trivial error in the > attached v2 and also reflowed the comments which now no longer fit. Thank you. -- Vik Fearing
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed Hello I think this feature can be a useful addition in dealing with time zones. I have applied and tried out the patch, The featureworks as described and seems promising. The problem with compilation failure was probably reported on CirrusCI whencompiled on different platforms. I have run the latest patch on my own Cirrus CI environment and everything checked outfine. Thank you Cary Huang ------------------ Highgo Software Canada www.highgo.ca
On 9/22/23 23:46, cary huang wrote: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: tested, passed > Documentation: tested, passed > > Hello > > I think this feature can be a useful addition in dealing with time zones. I have applied and tried out the patch, The featureworks as described and seems promising. The problem with compilation failure was probably reported on CirrusCI whencompiled on different platforms. I have run the latest patch on my own Cirrus CI environment and everything checked outfine. > > Thank you Thank you for reviewing! -- Vik Fearing
On Sat, Sep 23, 2023 at 12:54:01AM +0200, Vik Fearing wrote: > On 9/22/23 23:46, cary huang wrote: >> I think this feature can be a useful addition in dealing with time >> zones. I have applied and tried out the patch, The feature works as >> described and seems promising. The problem with compilation failure >> was probably reported on CirrusCI when compiled on different >> platforms. I have run the latest patch on my own Cirrus CI environment >> and everything checked out fine. > > Thank you for reviewing! + | a_expr AT LOCAL %prec AT + { + /* Use the value of the session's time zone */ + FuncCall *tz = makeFuncCall(SystemFuncName("current_setting"), + list_make1(makeStringConst("TimeZone", -1)), + COERCE_SQL_SYNTAX, + -1); + $$ = (Node *) makeFuncCall(SystemFuncName("timezone"), + list_make2(tz, $1), + COERCE_SQL_SYNTAX, + @2); As the deparsing code introduced by this patch is showing, this leads to a lot of extra complexity. And, actually, this can be quite expensive as well with these two layers of functions. Note also that in comparison to SQLValueFunctions, COERCE_SQL_SYNTAX does less inlining. So here comes my question: why doesn't this stuff just use one underlying function to do this job? -- Michael
Attachment
On 9/29/23 09:27, Michael Paquier wrote: > On Sat, Sep 23, 2023 at 12:54:01AM +0200, Vik Fearing wrote: >> On 9/22/23 23:46, cary huang wrote: >>> I think this feature can be a useful addition in dealing with time >>> zones. I have applied and tried out the patch, The feature works as >>> described and seems promising. The problem with compilation failure >>> was probably reported on CirrusCI when compiled on different >>> platforms. I have run the latest patch on my own Cirrus CI environment >>> and everything checked out fine. >> >> Thank you for reviewing! > > + | a_expr AT LOCAL %prec AT > + { > + /* Use the value of the session's time zone */ > + FuncCall *tz = makeFuncCall(SystemFuncName("current_setting"), > + list_make1(makeStringConst("TimeZone", -1)), > + COERCE_SQL_SYNTAX, > + -1); > + $$ = (Node *) makeFuncCall(SystemFuncName("timezone"), > + list_make2(tz, $1), > + COERCE_SQL_SYNTAX, > + @2); > > As the deparsing code introduced by this patch is showing, this leads > to a lot of extra complexity. And, actually, this can be quite > expensive as well with these two layers of functions. Note also that > in comparison to SQLValueFunctions, COERCE_SQL_SYNTAX does less > inlining. So here comes my question: why doesn't this stuff just use > one underlying function to do this job? I guess I don't understand the question. Why do you think a single function that repeats what these functions do would be preferable? I am not sure how doing it a different way would be better. -- Vik Fearing
On Tue, Oct 03, 2023 at 02:10:48AM +0200, Vik Fearing wrote: > On 9/29/23 09:27, Michael Paquier wrote: >> As the deparsing code introduced by this patch is showing, this leads >> to a lot of extra complexity. And, actually, this can be quite >> expensive as well with these two layers of functions. Note also that >> in comparison to SQLValueFunctions, COERCE_SQL_SYNTAX does less >> inlining. So here comes my question: why doesn't this stuff just use >> one underlying function to do this job? > > I guess I don't understand the question. Why do you think a single function > that repeats what these functions do would be preferable? I am not sure how > doing it a different way would be better. Leaving aside the ruleutils.c changes introduced by the patch that are quite confusing, having one function in the executor stack is going to be more efficient than two (aka less ExecInitFunc), and this syntax could be used in SQL queries where the operations is repeated a lot. This patch introduces two COERCE_SQL_SYNTAX, meaning that we would do twice the ACL check, twice the function hook, etc, so this could lead to significant differences. I think that we should be careful with the approach taken, and do benchmarks to choose an efficient approach from the start. See for example: https://www.postgresql.org/message-id/ZGHBGE45jKW8FEpe@paquier.xyz -- Michael
Attachment
On 9/29/23 09:27, Michael Paquier wrote: > On Sat, Sep 23, 2023 at 12:54:01AM +0200, Vik Fearing wrote: >> On 9/22/23 23:46, cary huang wrote: >>> I think this feature can be a useful addition in dealing with time >>> zones. I have applied and tried out the patch, The feature works as >>> described and seems promising. The problem with compilation failure >>> was probably reported on CirrusCI when compiled on different >>> platforms. I have run the latest patch on my own Cirrus CI environment >>> and everything checked out fine. >> >> Thank you for reviewing! > > + | a_expr AT LOCAL %prec AT > + { > + /* Use the value of the session's time zone */ > + FuncCall *tz = makeFuncCall(SystemFuncName("current_setting"), > + list_make1(makeStringConst("TimeZone", -1)), > + COERCE_SQL_SYNTAX, > + -1); > + $$ = (Node *) makeFuncCall(SystemFuncName("timezone"), > + list_make2(tz, $1), > + COERCE_SQL_SYNTAX, > + @2); > > As the deparsing code introduced by this patch is showing, this leads > to a lot of extra complexity. And, actually, this can be quite > expensive as well with these two layers of functions. Note also that > in comparison to SQLValueFunctions, COERCE_SQL_SYNTAX does less > inlining. So here comes my question: why doesn't this stuff just use > one underlying function to do this job? Okay. Here is a v3 using that approach. -- Vik Fearing
Attachment
On Wed, Oct 04, 2023 at 03:49:03PM +0100, Vik Fearing wrote: > Okay. Here is a v3 using that approach. You have not posted any numbers to show if there's a difference in performance, so I have run a simple test: PREPARE test AS SELECT TIMESTAMP '1978-07-07 19:38' AT LOCAL; DO $$ BEGIN FOR i IN 1..1000000 LOOP EXECUTE 'EXECUTE test'; END LOOP; END $$; On a medium-ish benchmark machine I have (16 vCPUs, 32GB of memory, -O2, no asserts), this DO block takes in average 4.3s to run with v2, versus 3.6s with v3. So yes, that's faster. I haven't yet finished my review of the patch, still looking at it. -- Michael
Attachment
On 10/6/23 07:05, Michael Paquier wrote: > > I haven't yet finished my review of the patch, still looking at it. I realized that my regression tests are not exercising what I originally intended them to after this change. They are supposed to show that calling the function explicitly or using AT LOCAL is correctly reproduced by ruleutils.c. The attached v4 changes the regression tests (and nothing else). -- Vik Fearing
Attachment
On Sat, Oct 07, 2023 at 02:35:06AM +0100, Vik Fearing wrote: > I realized that my regression tests are not exercising what I originally > intended them to after this change. They are supposed to show that calling > the function explicitly or using AT LOCAL is correctly reproduced by > ruleutils.c. Yes, I don't really see the point in adding more tests for the deparsing of AT TIME ZONE in this context. I would not expect one to call directly timezone() with the grammar in place, but I have no objections either to do that in the view for the regression tests as you are suggesting in v4. The minimal set of changes to test is to make sure that both paths (ts->tstz and tstz->tz) are exercised, and that's what you do. Anyway, upon review, I have a few issues with this patch. First is the documentation that I find too light: - There is no description for AT LOCAL and what kind of result one gets back depending on the input given, while AT TIME ZONE has more details about the arguments that can be used and the results obtained. - The function timezone(zone, timestamp) is mentioned, and I think that we should do the same with timezone(timestamp) for AT LOCAL. Another thing that I have been surprised with is the following, which is inconsistent with AT TIME ZONE because we are lacking one system function: =# select time with time zone '05:34:17-05' at local; ERROR: 42883: function pg_catalog.timezone(time with time zone) does not exist I think that we should include that to have a full set of operations supported, similar to AT TIME ZONE (see \df+ timezone). It looks like this would need one extra timetz_at_local(), which would require a bit more refactoring in date.c so as an equivalent of timetz_zone() could feed on the current session's TimeZone instead. I guess that you could just have an timetz_zone_internal() that optionally takes a timezone provided by the user or gets the current session's Timezone (session_timezone). Am I missing something? I am attaching a v5 that addresses the documentation bits, could you look at the business with date.c? -- Michael
Attachment
On 10/10/23 05:34, Michael Paquier wrote: > I am attaching a v5 that addresses the documentation bits, could you > look at the business with date.c? Here is a v6 which hopefully addresses all of your concerns. -- Vik Fearing
Attachment
On Fri, Oct 13, 2023 at 02:20:59AM +0200, Vik Fearing wrote: > On 10/10/23 05:34, Michael Paquier wrote: > > I am attaching a v5 that addresses the documentation bits, could you > > look at the business with date.c? > > Here is a v6 Thanks for the new version. > which hopefully addresses all of your concerns. Mostly ;) The first thing I did was to extract the doc bits about timezone(zone, time) for AT TIME ZONE from v6 and applied it independently. I have then looked at the rest and it looked mostly OK to me, including the extra description you have added for the fifth example in the docs. I have tweaked a few things: the regression tests to make the views a bit more appealing to the eye, an indentation to not have koel complain and did a catalog bump. Then applied it. -- Michael
Attachment
On 10/13/23 05:07, Michael Paquier wrote: > On Fri, Oct 13, 2023 at 02:20:59AM +0200, Vik Fearing wrote: >> On 10/10/23 05:34, Michael Paquier wrote: >>> I am attaching a v5 that addresses the documentation bits, could you >>> look at the business with date.c? >> >> Here is a v6 > > Thanks for the new version. > >> which hopefully addresses all of your concerns. > > Mostly ;) > > The first thing I did was to extract the doc bits about timezone(zone, > time) for AT TIME ZONE from v6 and applied it independently. > > I have then looked at the rest and it looked mostly OK to me, > including the extra description you have added for the fifth example > in the docs. I have tweaked a few things: the regression tests to > make the views a bit more appealing to the eye, an indentation to not > have koel complain and did a catalog bump. Then applied it. Thank you, Michael君! -- Vik Fearing
On Fri, Oct 13, 2023 at 07:03:20AM +0200, Vik Fearing wrote: > Thank you, Michael君! No pb, ヴィックさん。 -- Michael
Attachment
One of the AIX animals gave a strange result here: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2023-10-15%2011%3A40%3A01 If you ignore the diffs due to change in column width, the interesting change seems to be: - 23:59:00-07 | 06:59:00+00 | 06:59:00+00 | 06:59:00+00 - 23:59:59.99-07 | 06:59:59.99+00 | 06:59:59.99+00 | 06:59:59.99+00 + 23:59:00-07 | 4294966103:4294967295:00+00 | 4294966103:4294967295:00+00 | 4294966103:4294967295:00+00 + 23:59:59.99-07 | 4294966103:00:00.01+00 | 4294966103:00:00.01+00 | 4294966103:00:00.01+00 But the other AIX animal 'sungazer' was OK with it. They're both on the same AIX7.1 host IIRC, both 64 bit builds, but the former is using xlc and the latter gcc. I don't immediately see what would cause that underflow on that old compiler but not elsewhere. I have a shell there (cfarm111) if someone has an idea...
Thomas Munro <thomas.munro@gmail.com> writes: > One of the AIX animals gave a strange result here: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2023-10-15%2011%3A40%3A01 > If you ignore the diffs due to change in column width, the interesting > change seems to be: > - 23:59:00-07 | 06:59:00+00 | 06:59:00+00 | 06:59:00+00 > - 23:59:59.99-07 | 06:59:59.99+00 | 06:59:59.99+00 | 06:59:59.99+00 > + 23:59:00-07 | 4294966103:4294967295:00+00 | > 4294966103:4294967295:00+00 | 4294966103:4294967295:00+00 > + 23:59:59.99-07 | 4294966103:00:00.01+00 | > 4294966103:00:00.01+00 | 4294966103:00:00.01+00 > But the other AIX animal 'sungazer' was OK with it. They're both on > the same AIX7.1 host IIRC, both 64 bit builds, but the former is using > xlc and the latter gcc. I don't immediately see what would cause that > underflow on that old compiler but not elsewhere. I have a shell > there (cfarm111) if someone has an idea... Hmm. Seems like the error has to be creeping in during this part of timetz_zone(): result->time = t->time + (t->zone - tz) * USECS_PER_SEC; while (result->time < INT64CONST(0)) result->time += USECS_PER_DAY; while (result->time >= USECS_PER_DAY) result->time -= USECS_PER_DAY; According to my machine, the initial computation of result->time (for the '23:59:59.99-07' input) yields 111599990000, and then we iterate the second loop once to get 25199990000, which is the right answer. If I force a second iteration to get -61200010000, I get # select '23:59:59.99-07'::timetz at local; timezone ------------------------ 4294967279:00:00.01+00 (1 row) which doesn't quite match hornet's result but it seems suggestively close. Another line of thought is that while the time fields are int64, t->zone and tz are only int32. Multiplying by the INT64CONST USECS_PER_SEC ought to be enough to make the compiler widen the subtraction result to int64, but maybe that's screwing up? I'm tempted to wonder if this helps: - result->time = t->time + (t->zone - tz) * USECS_PER_SEC; + result->time = t->time + (int64) (t->zone - tz) * USECS_PER_SEC; Forcing the wrong thing to happen there doesn't produce a match to hornet's result either, so I don't have a lot of hope for that theory, but it seems like the explanation has to be somewhere here. regards, tom lane
On Mon, Oct 16, 2023 at 10:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm tempted to wonder if this helps: > > - result->time = t->time + (t->zone - tz) * USECS_PER_SEC; > + result->time = t->time + (int64) (t->zone - tz) * USECS_PER_SEC; I wanted to be able to try this and any other theories and managed to build the tip of master on cfarm111 with the same CC and CFLAGS as Noah used, but the problem didn't reproduce! Hmm, I didn't enable any extra options, so now I'm wondering if something in some random header somewhere is involved here... trying again with more stuff turned on...
Another possibly interesting factoid: it appears that before 97957fdba, we had zero regression test coverage of timetz_zone --- and we still have none of timetz_izone, which contains essentially the same code. So if there is a problem here, whether it's ours or the compiler's, it's not hard to see why we didn't notice. regards, tom lane
On Mon, Oct 16, 2023 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Mon, Oct 16, 2023 at 10:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I'm tempted to wonder if this helps: > > > > - result->time = t->time + (t->zone - tz) * USECS_PER_SEC; > > + result->time = t->time + (int64) (t->zone - tz) * USECS_PER_SEC; > > I wanted to be able to try this and any other theories and managed to > build the tip of master on cfarm111 with the same CC and CFLAGS as > Noah used, but the problem didn't reproduce! Hmm, I didn't enable any > extra options, so now I'm wondering if something in some random header > somewhere is involved here... trying again with more stuff turned > on... Oh, I can't use any of the handrolled packages in ~nm due to permissions. I tried enabling perl from /opt/freeware (perl is my usual first guess for who is !@#$ing with the system headers), but the test passes.
On Sun, Oct 15, 2023 at 06:47:04PM -0400, Tom Lane wrote: > Another possibly interesting factoid: it appears that before > 97957fdba, we had zero regression test coverage of timetz_zone --- > and we still have none of timetz_izone, which contains essentially > the same code. So if there is a problem here, whether it's ours or > the compiler's, it's not hard to see why we didn't notice. Right. This one is just a lucky, or say unlucky find. I didn't notice that this path was entirely missing coverage, planting an assertion in the middle of timetz_zone() passes check-world. -- Michael
Attachment
On Mon, Oct 16, 2023 at 11:50:08AM +1300, Thomas Munro wrote: > On Mon, Oct 16, 2023 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote: >> On Mon, Oct 16, 2023 at 10:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I'm tempted to wonder if this helps: >>> >>> - result->time = t->time + (t->zone - tz) * USECS_PER_SEC; >>> + result->time = t->time + (int64) (t->zone - tz) * USECS_PER_SEC; All that should use TZNAME_FIXED_OFFSET as timezone type, and I don't really see why this would overflow.. Perhaps a more aggressive (int64) ((t->zone - (int64) tz) * USECS_PER_SEC) would help? >> I wanted to be able to try this and any other theories and managed to >> build the tip of master on cfarm111 with the same CC and CFLAGS as >> Noah used, but the problem didn't reproduce! Hmm, I didn't enable any >> extra options, so now I'm wondering if something in some random header >> somewhere is involved here... trying again with more stuff turned >> on... > > Oh, I can't use any of the handrolled packages in ~nm due to > permissions. I tried enabling perl from /opt/freeware (perl is my > usual first guess for who is !@#$ing with the system headers), but the > test passes. Another theory would be one of these weird compiler optimization issue from xlc? In recent history, there was 8d2a01ae12cd. -- Michael
Attachment
On Mon, Oct 16, 2023 at 2:58 PM Michael Paquier <michael@paquier.xyz> wrote: > Another theory would be one of these weird compiler optimization issue > from xlc? In recent history, there was 8d2a01ae12cd. Yeah, there are more like that too. xlc 12.1 is dead (like the OS version it shipped with). New versions are available on cfarm if we care about this target. But I am conscious of the cosmic law that if you blame the compiler too soon you can cause the bug to move into your code...
Thomas Munro <thomas.munro@gmail.com> writes: > On Mon, Oct 16, 2023 at 2:58 PM Michael Paquier <michael@paquier.xyz> wrote: >> Another theory would be one of these weird compiler optimization issue >> from xlc? In recent history, there was 8d2a01ae12cd. > Yeah, there are more like that too. xlc 12.1 is dead (like the OS > version it shipped with). New versions are available on cfarm if we > care about this target. But I am conscious of the cosmic law that if > you blame the compiler too soon you can cause the bug to move into > your code... I'm having a hard time not believing that this is a compiler bug. Looking back at 8d2a01ae12cd and its speculation that xlc is overly liberal about reordering code around sequence points ... I wonder if it'd help to do this calculation in a local variable, and only assign the final value to result->time ? But we have to reproduce the problem first. regards, tom lane
On Mon, Oct 16, 2023 at 4:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm having a hard time not believing that this is a compiler bug. > Looking back at 8d2a01ae12cd and its speculation that xlc is overly > liberal about reordering code around sequence points ... I wonder > if it'd help to do this calculation in a local variable, and only > assign the final value to result->time ? But we have to reproduce > the problem first. If that can be shown I would vote for switching to /opt/IBM/xlc/16.1.0 and not changing a single bit of PostgreSQL.
Thomas Munro <thomas.munro@gmail.com> writes: > On Mon, Oct 16, 2023 at 4:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm having a hard time not believing that this is a compiler bug. >> Looking back at 8d2a01ae12cd and its speculation that xlc is overly >> liberal about reordering code around sequence points ... I wonder >> if it'd help to do this calculation in a local variable, and only >> assign the final value to result->time ? But we have to reproduce >> the problem first. > If that can be shown I would vote for switching to /opt/IBM/xlc/16.1.0 > and not changing a single bit of PostgreSQL. If switching to 16.1 removes the failure, I'd agree. It's hard to believe that any significant number of users still care about building PG with xlc 12. regards, tom lane
On Sun, Oct 15, 2023 at 11:30:17PM -0400, Tom Lane wrote: > Thomas Munro <thomas.munro@gmail.com> writes: >> If that can be shown I would vote for switching to /opt/IBM/xlc/16.1.0 >> and not changing a single bit of PostgreSQL. > > If switching to 16.1 removes the failure, I'd agree. It's hard > to believe that any significant number of users still care about > building PG with xlc 12. FWIW, I really wish that we were less conservative here and just drop that rather than waste resources in debugging things. Now, I'm also OK to put this one aside and put a WHERE clause to timetz_local_view to only fetch one value, as the test has the same value as long as we check that AT LOCAL converts the result to UTC for the three expression patterns tested. -- Michael
Attachment
On Sun, Oct 15, 2023 at 11:30:17PM -0400, Tom Lane wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > On Mon, Oct 16, 2023 at 4:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I'm having a hard time not believing that this is a compiler bug. > >> Looking back at 8d2a01ae12cd and its speculation that xlc is overly > >> liberal about reordering code around sequence points ... I wonder > >> if it'd help to do this calculation in a local variable, and only > >> assign the final value to result->time ? But we have to reproduce > >> the problem first. > > > If that can be shown I would vote for switching to /opt/IBM/xlc/16.1.0 > > and not changing a single bit of PostgreSQL. > > If switching to 16.1 removes the failure, I'd agree. It's hard > to believe that any significant number of users still care about > building PG with xlc 12. Works for me. I've started a test run with the xlc version change.
On Sun, Oct 15, 2023 at 09:58:04PM -0700, Noah Misch wrote: > On Sun, Oct 15, 2023 at 11:30:17PM -0400, Tom Lane wrote: > > Thomas Munro <thomas.munro@gmail.com> writes: > > > On Mon, Oct 16, 2023 at 4:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> I'm having a hard time not believing that this is a compiler bug. > > >> Looking back at 8d2a01ae12cd and its speculation that xlc is overly > > >> liberal about reordering code around sequence points ... I wonder > > >> if it'd help to do this calculation in a local variable, and only > > >> assign the final value to result->time ? But we have to reproduce > > >> the problem first. > > > > > If that can be shown I would vote for switching to /opt/IBM/xlc/16.1.0 > > > and not changing a single bit of PostgreSQL. > > > > If switching to 16.1 removes the failure, I'd agree. It's hard > > to believe that any significant number of users still care about > > building PG with xlc 12. > > Works for me. I've started a test run with the xlc version change. It failed similarly: + 23:59:00-07 | 4294966103:4294967295:00+00 | 4294966103:4294967295:00+00 | 4294966103:4294967295:00+00 + 23:59:59.99-07 | 4294966103:00:00.01+00 | 4294966103:00:00.01+00 | 4294966103:00:00.01+00
Noah Misch <noah@leadboat.com> writes: > On Sun, Oct 15, 2023 at 09:58:04PM -0700, Noah Misch wrote: >> Works for me. I've started a test run with the xlc version change. > It failed similarly: > + 23:59:00-07 | 4294966103:4294967295:00+00 | 4294966103:4294967295:00+00 | 4294966103:4294967295:00+00 > + 23:59:59.99-07 | 4294966103:00:00.01+00 | 4294966103:00:00.01+00 | 4294966103:00:00.01+00 Ugh. So if the failure is robust enough to persist across several major xlc versions, why couldn't Thomas reproduce it? regards, tom lane
On Mon, Oct 16, 2023 at 01:54:23AM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Sun, Oct 15, 2023 at 09:58:04PM -0700, Noah Misch wrote: > >> Works for me. I've started a test run with the xlc version change. > > > It failed similarly: > > > + 23:59:00-07 | 4294966103:4294967295:00+00 | 4294966103:4294967295:00+00 | 4294966103:4294967295:00+00 > > + 23:59:59.99-07 | 4294966103:00:00.01+00 | 4294966103:00:00.01+00 | 4294966103:00:00.01+00 > > Ugh. So if the failure is robust enough to persist across > several major xlc versions, why couldn't Thomas reproduce it? Beats me. hornet wipes its starting environment down to OBJECT_MODE=32_64 PERL5LIB=/home/nm/sw/cpan64/lib/perl5 SPECIES=xlc64 PATH=/usr/bin, then applies all the environment settings seen in buildfarm logs.
On Sun, Oct 15, 2023 at 11:05:10PM -0700, Noah Misch wrote: > On Mon, Oct 16, 2023 at 01:54:23AM -0400, Tom Lane wrote: >> Ugh. So if the failure is robust enough to persist across >> several major xlc versions, why couldn't Thomas reproduce it? > > Beats me. hornet wipes its starting environment down to OBJECT_MODE=32_64 > PERL5LIB=/home/nm/sw/cpan64/lib/perl5 SPECIES=xlc64 PATH=/usr/bin, then > applies all the environment settings seen in buildfarm logs. Perhaps that's a stupid question.. But a server running under this environment fails the two following queries even for older branches, right? select timezone('UTC', '23:59:59.99-07'::timetz); select timezone('UTC', '23:59:00-07'::timetz); -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > Perhaps that's a stupid question.. But a server running under this > environment fails the two following queries even for older branches, > right? > select timezone('UTC', '23:59:59.99-07'::timetz); > select timezone('UTC', '23:59:00-07'::timetz); One would expect, since the AT LOCAL syntax is just sugar for that. I'm mighty tempted though to (a) add coverage for timetz_izone to HEAD, and (b) backpatch the new tests, sans the AT LOCAL case, to the back branches (maybe not v11). regards, tom lane
On Mon, Oct 16, 2023 at 09:54:41AM -0400, Tom Lane wrote: > I'm mighty tempted though to (a) add coverage for timetz_izone > to HEAD, and (b) backpatch the new tests, sans the AT LOCAL case, > to the back branches (maybe not v11). I see that you've already done (a) with 2f04720307. I'd be curious to see what happens for (b), as well, once (a) is processed on hornet once.. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Oct 16, 2023 at 09:54:41AM -0400, Tom Lane wrote: >> I'm mighty tempted though to (a) add coverage for timetz_izone >> to HEAD, and (b) backpatch the new tests, sans the AT LOCAL case, >> to the back branches (maybe not v11). > I see that you've already done (a) with 2f04720307. I'd be curious to > see what happens for (b), as well, once (a) is processed on hornet > once.. Sure enough, timetz_izone has exactly the same behavior [1]. I'd kind of decided that back-patching wouldn't be worth the trouble; do you foresee that it'd teach us anything new? regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2023-10-17%2000%3A07%3A36
Noah Misch <noah@leadboat.com> writes: > On Mon, Oct 16, 2023 at 01:54:23AM -0400, Tom Lane wrote: >> Ugh. So if the failure is robust enough to persist across >> several major xlc versions, why couldn't Thomas reproduce it? > Beats me. hornet wipes its starting environment down to OBJECT_MODE=32_64 > PERL5LIB=/home/nm/sw/cpan64/lib/perl5 SPECIES=xlc64 PATH=/usr/bin, then > applies all the environment settings seen in buildfarm logs. I was able to reproduce the failure on cfarm111 after adopting these settings from hornet's configuration: export OBJECT_MODE=64 export CC='xlc_r -D_LARGE_FILES=1 ' export CFLAGS='-O2 -qmaxmem=33554432 -qsuppress=1500-010:1506-995 -qsuppress=1506-010:1506-416:1506-450:1506-480:1506-481:1506-492:1506-944:1506-1264 -qinfo=all:nocnd:noeff:noext:nogot:noini:noord:nopar:noppc:norea:nouni:nouse -qsuppress=1506-374:1506-419:1506-434:1506-438:1506-451:1506-452:1506-453:1506-495:1506-786' and doing ./configure --enable-cassert --enable-debug --without-icu --prefix=/home/tgl/testversion etc etc. It is absolutely, gold-platedly, a compiler bug, because inserting a debug printf into the loop while (result->time >= USECS_PER_DAY) result->time -= USECS_PER_DAY; makes the failure go away. Unfortunately, I've not yet found another way to make it go away :-(. My upthread idea of using a local variable instead of result->time is no help, and some other random code alterations didn't change the results either. Not sure where we go from here. While I don't have much hesitation about blowing off xlc_r 12.1, it would be sad if their latest toolchain doesn't work either. (I didn't try permuting the code while using the newer compiler, though.) regards, tom lane
On Tue, Oct 17, 2023 at 01:40:18AM -0400, Tom Lane wrote: > makes the failure go away. Unfortunately, I've not yet found another > way to make it go away :-(. My upthread idea of using a local variable > instead of result->time is no help, and some other random code > alterations didn't change the results either. That may be a long shot, but even a modulo? Say in these two code paths: - while (result->time >= USECS_PER_DAY) - result->time -= USECS_PER_DAY; + if (result->time >= USECS_PER_DAY) + result->time %= USECS_PER_DAY; > Not sure where we go from here. While I don't have much hesitation > about blowing off xlc_r 12.1, it would be sad if their latest > toolchain doesn't work either. (I didn't try permuting the code > while using the newer compiler, though.) We've spent a lot of time on that. I'm OK to just give up, trim the values of the view with a qual, and call it a day. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Tue, Oct 17, 2023 at 01:40:18AM -0400, Tom Lane wrote: >> makes the failure go away. Unfortunately, I've not yet found another >> way to make it go away :-(. My upthread idea of using a local variable >> instead of result->time is no help, and some other random code >> alterations didn't change the results either. > That may be a long shot, but even a modulo? Yeah, the same thing occurred to me in the shower this morning, and it does seem to work! We can replace both loops with a %= operator, at least if we're willing to assume C99 division semantics, which seems pretty safe in 2023. Your idea of doing a range check to skip the division in typical cases is a refinement I'd not thought of, but it seems like a good idea for performance. (I see that the negative-starting-point case isn't covered in the current regression tests, so maybe we better add a test for that.) regards, tom lane
I wrote: > Yeah, the same thing occurred to me in the shower this morning, and it > does seem to work! We can replace both loops with a %= operator, at > least if we're willing to assume C99 division semantics, which seems > pretty safe in 2023. Whoops, no: for negative starting values we'd need truncate-towards- minus-infinity division whereas C99 specifies truncate-towards-zero. However, the attached does pass for me on cfarm111 as well as my usual dev machine. Presumably this is a pre-existing bug that also appears in back branches. But in the interests of science I propose that we back-patch only the test case and see which machine(s) fail it before back-patching the code change. regards, tom lane diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c index c4da10d47a..56c7746c11 100644 --- a/src/backend/utils/adt/date.c +++ b/src/backend/utils/adt/date.c @@ -3083,10 +3083,11 @@ timetz_zone(PG_FUNCTION_ARGS) result = (TimeTzADT *) palloc(sizeof(TimeTzADT)); result->time = t->time + (t->zone - tz) * USECS_PER_SEC; + /* C99 modulo has the wrong sign convention for negative input */ while (result->time < INT64CONST(0)) result->time += USECS_PER_DAY; - while (result->time >= USECS_PER_DAY) - result->time -= USECS_PER_DAY; + if (result->time >= USECS_PER_DAY) + result->time %= USECS_PER_DAY; result->zone = tz; @@ -3116,10 +3117,11 @@ timetz_izone(PG_FUNCTION_ARGS) result = (TimeTzADT *) palloc(sizeof(TimeTzADT)); result->time = time->time + (time->zone - tz) * USECS_PER_SEC; + /* C99 modulo has the wrong sign convention for negative input */ while (result->time < INT64CONST(0)) result->time += USECS_PER_DAY; - while (result->time >= USECS_PER_DAY) - result->time -= USECS_PER_DAY; + if (result->time >= USECS_PER_DAY) + result->time %= USECS_PER_DAY; result->zone = tz; diff --git a/src/test/regress/expected/timetz.out b/src/test/regress/expected/timetz.out index 3f8e005ce7..cbab6cfe5d 100644 --- a/src/test/regress/expected/timetz.out +++ b/src/test/regress/expected/timetz.out @@ -304,4 +304,25 @@ TABLE timetz_local_view; 23:59:59.99-07 | 06:59:59.99+00 | 06:59:59.99+00 | 06:59:59.99+00 | 06:59:59.99+00 (12 rows) +SELECT f1 AS dat, + f1 AT TIME ZONE 'UTC+10' AS dat_at_tz, + f1 AT TIME ZONE INTERVAL '-10:00' AS dat_at_int + FROM TIMETZ_TBL + ORDER BY f1; + dat | dat_at_tz | dat_at_int +----------------+----------------+---------------- + 00:01:00-07 | 21:01:00-10 | 21:01:00-10 + 01:00:00-07 | 22:00:00-10 | 22:00:00-10 + 02:03:00-07 | 23:03:00-10 | 23:03:00-10 + 08:08:00-04 | 02:08:00-10 | 02:08:00-10 + 07:07:00-08 | 05:07:00-10 | 05:07:00-10 + 11:59:00-07 | 08:59:00-10 | 08:59:00-10 + 12:00:00-07 | 09:00:00-10 | 09:00:00-10 + 12:01:00-07 | 09:01:00-10 | 09:01:00-10 + 15:36:39-04 | 09:36:39-10 | 09:36:39-10 + 15:36:39-05 | 10:36:39-10 | 10:36:39-10 + 23:59:00-07 | 20:59:00-10 | 20:59:00-10 + 23:59:59.99-07 | 20:59:59.99-10 | 20:59:59.99-10 +(12 rows) + ROLLBACK; diff --git a/src/test/regress/sql/timetz.sql b/src/test/regress/sql/timetz.sql index 33f7f8aafb..d797f478f0 100644 --- a/src/test/regress/sql/timetz.sql +++ b/src/test/regress/sql/timetz.sql @@ -100,4 +100,9 @@ CREATE VIEW timetz_local_view AS ORDER BY f1; SELECT pg_get_viewdef('timetz_local_view', true); TABLE timetz_local_view; +SELECT f1 AS dat, + f1 AT TIME ZONE 'UTC+10' AS dat_at_tz, + f1 AT TIME ZONE INTERVAL '-10:00' AS dat_at_int + FROM TIMETZ_TBL + ORDER BY f1; ROLLBACK;
Hmm, I guess I must have missed some important flag or environment variable when trying to reproduce it, sorry. Given that IBM describes xlc as "legacy" (replaced by xlclang, but still supported for some unspecified period of time for the benefit of people who need C++ ABI compatibility with old code), I wonder how long we plan to support it... Anecdotally, from a time 1-2 decades ago when I used AIX daily, I can report that vast amounts of open source stuff couldn't build with xlc, so gcc was used for pretty much anything that didn't have a C++ ABI requirement. I kinda wonder if a single person in the entire world appreciates that we support this.
Thomas Munro <thomas.munro@gmail.com> writes: > Given that IBM describes xlc as "legacy" (replaced by xlclang, but > still supported for some unspecified period of time for the benefit of > people who need C++ ABI compatibility with old code), I wonder how > long we plan to support it... Should we be testing against xlclang instead? regards, tom lane
On Wed, Oct 18, 2023 at 11:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > > Given that IBM describes xlc as "legacy" (replaced by xlclang, but > > still supported for some unspecified period of time for the benefit of > > people who need C++ ABI compatibility with old code), I wonder how > > long we plan to support it... > > Should we be testing against xlclang instead? I hesitated to suggest it because it's not my animal/time we're talking about but it seems to make more sense. It appears to be IBM's answer to the nothing-builds-with-this-thing phenomenon, since it accepts a lot of GCCisms via Clang's adoption of them. From a quick glance at [1], it lacks the atomics builtins but we have our own assembler magic for POWER. So maybe it'd all just work™. [1] https://www.ibm.com/docs/en/xl-c-and-cpp-aix/16.1?topic=migration-checklist-when-moving-from-xl-based-front-end-clang-based-front-end
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Oct 18, 2023 at 11:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Should we be testing against xlclang instead? > I hesitated to suggest it because it's not my animal/time we're > talking about but it seems to make more sense. It appears to be IBM's > answer to the nothing-builds-with-this-thing phenomenon, since it > accepts a lot of GCCisms via Clang's adoption of them. From a quick > glance at [1], it lacks the atomics builtins but we have our own > assembler magic for POWER. So maybe it'd all just work™. Discounting the Windows animals, it looks like the xlc animals are our only remaining ones that use anything except gcc or clang. That feels uncomfortably like a compiler monoculture to me, so I can understand the reasoning for keeping hornet/mandrill going. Still, maybe we should just accept the fact that gcc/clang have outcompeted everything else in the C compiler universe. It's getting hard to imagine that anyone would bring out some new product that didn't try to be bug-compatible with gcc, for precisely the reason you mention. regards, tom lane
On Tue, Oct 17, 2023 at 12:45:28PM -0400, Tom Lane wrote: > Whoops, no: for negative starting values we'd need truncate-towards- > minus-infinity division whereas C99 specifies truncate-towards-zero. > However, the attached does pass for me on cfarm111 as well as my > usual dev machine. I guess that the following trick could be used for the negative case, with one modulo followed by one extra addition: if (result->time < INT64CONST(0)) { result->time %= USECS_PER_DAY; result->time += USECS_PER_DAY; } > Presumably this is a pre-existing bug that also appears in back > branches. But in the interests of science I propose that we > back-patch only the test case and see which machine(s) fail it > before back-patching the code change. Sure, as you see fit. -- Michael
Attachment
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Oct 18, 2023 at 11:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Should we be testing against xlclang instead? > I hesitated to suggest it because it's not my animal/time we're > talking about but it seems to make more sense. It appears to be IBM's > answer to the nothing-builds-with-this-thing phenomenon, since it > accepts a lot of GCCisms via Clang's adoption of them. From a quick > glance at [1], it lacks the atomics builtins but we have our own > assembler magic for POWER. So maybe it'd all just work™. FWIW, I tried a test build with xlclang 16.1 on cfarm111, and it does seem like it Just Works, modulo a couple of oddities: * <netinet/tcp.h> fails to compile, due to references to struct in6_addr, unless <netinet/in.h> is included first. Most of our references to tcp.h already do that, but not libpq-be.h and fe-protocol3.c. I'm a bit at a loss why we've not seen this with the existing BF animals on this machine, because AFAICS they're all using the same /usr/include tree. * configure recognizes this as gcc but not Clang, which may or may not be fine: ... checking whether we are using the GNU C compiler... yes ... checking whether xlclang is Clang... no ... This doesn't seem to break anything, but it struck me as odd. configure seems to pick a sane set of compiler options anyway. Interestingly, xlclang shows the same failure with the pre-19fa97731 versions of timetz_zone/timetz_izone as plain xlc does. I guess this is not so astonishing since they presumably share the same codegen backend. But maybe somebody ought to file a bug with IBM? regards, tom lane
On Tue, Oct 17, 2023 at 7:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > On Wed, Oct 18, 2023 at 11:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Should we be testing against xlclang instead? > > > I hesitated to suggest it because it's not my animal/time we're > > talking about but it seems to make more sense. It appears to be IBM's > > answer to the nothing-builds-with-this-thing phenomenon, since it > > accepts a lot of GCCisms via Clang's adoption of them. From a quick > > glance at [1], it lacks the atomics builtins but we have our own > > assembler magic for POWER. So maybe it'd all just work™. > > Discounting the Windows animals, it looks like the xlc animals are > our only remaining ones that use anything except gcc or clang. > That feels uncomfortably like a compiler monoculture to me, so > I can understand the reasoning for keeping hornet/mandrill going. > Still, maybe we should just accept the fact that gcc/clang have > outcompeted everything else in the C compiler universe. It's > getting hard to imagine that anyone would bring out some new product > that didn't try to be bug-compatible with gcc, for precisely the > reason you mention. After some research I determined that the release date for xlc 12.1 seems to be June 1, 2012. At that time, clang 3.1 was current and just after, GCC release version 4.7.1 was released. The oldest version of clang that I find in the buildfarm is 3.9, and the oldest version of gcc I find in the buildfarm is 4.6.3. So, somewhat to my surprise, xlc is not the oldest compiler that we're still supporting in the buildfarm. But it is very old, and it seems like that gcc and clang are going to continue to gain ground against gcc and other proprietary compilers for some time to come. I think it's reasonable to ask ourselves whether we really want to go to the trouble of maintaining something that is likely to get so little real-world usage. To be honest, I'm not particularly concerned about the need to adjust compiler and linker options from time to time, even though I know that probably annoys Andres. What does concern me is finding and coding around compiler bugs. 19fa977311b9da9c6c84f0108600e78213751a38 is just ridiculous, IMHO. If an end-of-life compiler for an end-of-life operating system has bugs that mean that C code that's doing nothing more than a bit of arithmetic isn't compiling properly, it's time to pull the plug. Nor is this the first example of working around a bug that only manifests in ancient xlc. I think that, when there was more real diversity in the software ecosystem, testing on a lot of platforms was a good way of finding out whether you'd done something that was in general correct or just something that happened to work on the machine you had in front of you. But hornet and mandrill are not telling us about things we've done that are incorrect in general yet happen to work on gcc and clang. What they seem to be telling us about, in this case and some others, are things that are CORRECT in general yet happen NOT to work on ancient xlc. And that's an important difference, because if we were finding real mistakes for which other platforms were not punishing us, then we could hope that fixing those mistakes would improve compatibility with other, equally niche platforms, potentially including future platforms that haven't come along yet. As it is, it's hard to believe that any work we put into this is going to have any benefit on any system other than ancient AIX. If there are other niche systems out there that have a similar number of bugs, they'll probably be *different* bugs. Sources for release dates: https://www.ibm.com/support/pages/fix-list-xl-c-aix https://releases.llvm.org/ https://gcc.gnu.org/releases.html -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Oct 17, 2023 at 7:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Discounting the Windows animals, it looks like the xlc animals are >> our only remaining ones that use anything except gcc or clang. > After some research I determined that the release date for xlc 12.1 > seems to be June 1, 2012. At that time, clang 3.1 was current and just > after, GCC release version 4.7.1 was released. The oldest version of > clang that I find in the buildfarm is 3.9, and the oldest version of > gcc I find in the buildfarm is 4.6.3. So, somewhat to my surprise, xlc > is not the oldest compiler that we're still supporting in the > buildfarm. But it is very old, and it seems like that gcc and clang > are going to continue to gain ground against gcc and other proprietary > compilers for some time to come. Probably. Independent of that, it's fair to ask why we're still testing against xlc 12.1 and not the considerably-more-recent xlclang, or at least xlc 16.1. (I also wonder why we're still testing AIX 7.1 rather than an OS version that's not EOL.) > What does concern me is finding and coding > around compiler bugs. 19fa977311b9da9c6c84f0108600e78213751a38 is just > ridiculous, IMHO. I would agree, except for the downthread discovery that the bug is still present in current xlc and xlclang. Short of blowing off AIX altogether, it seems like we need to do something about it. regards, tom lane
On Wed, Oct 18, 2023 at 12:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Probably. Independent of that, it's fair to ask why we're still > testing against xlc 12.1 and not the considerably-more-recent xlclang, > or at least xlc 16.1. (I also wonder why we're still testing AIX 7.1 > rather than an OS version that's not EOL.) Well, according to Wikipedia, AIX 7.3 (released in 2021) requires POWER8. AIX 7.2 (released 2015) only requires POWER7, and according to the buildfarm page, this machine is POWER7. So it could possibly be upgraded from 7.1 to 7.2, supposing that it is indeed compatible with that release and that Noah's willing to do it and that there's not an exorbitant fee and so on, but that still leaves you running an OS version that is almost certainly closer to EOL than it is to the original release date. Anything newer would require buying new hardware, or so I guess. Put otherwise, I think the reason we're testing on this AIX rather than anything else is probably that there is exactly 1 person associated with the project who has >0 pieces of hardware that can run AIX, and that person has one, so we're testing on that one. That might be a reason to question whether that particular strain of hardware has a bright future, at least in terms of PostgreSQL support. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Oct 18, 2023 at 12:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Probably. Independent of that, it's fair to ask why we're still >> testing against xlc 12.1 and not the considerably-more-recent xlclang, >> or at least xlc 16.1. (I also wonder why we're still testing AIX 7.1 >> rather than an OS version that's not EOL.) > Well, according to Wikipedia, AIX 7.3 (released in 2021) requires > POWER8. AIX 7.2 (released 2015) only requires POWER7, and according to > the buildfarm page, this machine is POWER7. So it could possibly be > upgraded from 7.1 to 7.2, supposing that it is indeed compatible with > that release and that Noah's willing to do it and that there's not an > exorbitant fee and so on, but that still leaves you running an OS > version that is almost certainly closer to EOL than it is to the > original release date. Anything newer would require buying new > hardware, or so I guess. The machine belongs to OSU (via the gcc compile farm), and I see that they have another one that's POWER8 and is running AIX 7.3 [1]. So in principle the buildfarm animals could just be moved over to that one. Perhaps Noah has some particular attachment to 7.1, but now that that's EOL it seems like we shouldn't be paying so much attention to it. My guess is that it's still there in the compile farm because the gcc people think it's still useful to have access to POWER7 hardware; but I doubt there's enough difference for our purposes to be worth dealing with a dead OS and ancient compiler. regards, tom lane [1] https://portal.cfarm.net/machines/list/
On Wed, Oct 18, 2023 at 04:45:46PM -0400, Tom Lane wrote: > > On Wed, Oct 18, 2023 at 12:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Probably. Independent of that, it's fair to ask why we're still > >> testing against xlc 12.1 and not the considerably-more-recent xlclang, > >> or at least xlc 16.1. (I also wonder why we're still testing AIX 7.1 > >> rather than an OS version that's not EOL.) > The machine belongs to OSU (via the gcc compile farm), and I see > that they have another one that's POWER8 and is running AIX 7.3 [1]. > So in principle the buildfarm animals could just be moved over > to that one. > > Perhaps Noah has some particular attachment to 7.1, but now that that's > EOL it seems like we shouldn't be paying so much attention to it. > My guess is that it's still there in the compile farm because the gcc > people think it's still useful to have access to POWER7 hardware; but > I doubt there's enough difference for our purposes to be worth dealing > with a dead OS and ancient compiler. No particular attachment. From 2019 to 2023-08, hoverfly tested xlc16 on AIX 7.2; its run ended when cfarm119's owner replaced cfarm119 with an AIX 7.3, ibm-clang v17.1.1 machine. Since 2015, hornet and mandrill have tested xlc12 on AIX 7.1. That said, given your finding that later xlc versions have the same code generation bug, the choice of version is a side issue. A migration to ibm-clang wouldn't have prevented this week's xlc-prompted commits. I feel the gravity and longevity of xlc bugs has been out of proportion with the compiler's contribution to PostgreSQL. I would find it reasonable to revoke xlc support in v17+, leaving AIX gcc support in place. The main contribution of AIX has been to find the bug behind commit a1b8aa1. That benefited from the AIX kernel, not from any particular compiler. hornet and mandrill would continue to test v16-. By the way, I once tried to report an xlc bug. Their system was tailored to accept bugs from paid support customers only. I submitted it via some sales inquiry form, just in case, but never heard back.
On Wed, Oct 18, 2023 at 7:33 PM Noah Misch <noah@leadboat.com> wrote: > I feel the gravity and longevity of xlc bugs has been out of proportion with > the compiler's contribution to PostgreSQL. I would find it reasonable to > revoke xlc support in v17+, leaving AIX gcc support in place. +1 for this proposal. I just think this is getting silly. We're saying that we only have access to 1 or 2 AIX machines, and most of us have access to none, and the compiler has serious code generation bugs that are present in both a release 11 years old and also a release current release, meaning they went unfixed for 10 years, and we can't report bugs or get them fixed when we find them, and the use of this particular compiler in the buildfarm isn't finding any issues that matter anywhere else. To be honest, I'm not entirely sure that even AIX gcc support is delivering enough value per unit work to justify keeping it around. But the xlc situation is worse. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Oct 18, 2023 at 7:33 PM Noah Misch <noah@leadboat.com> wrote: >> I feel the gravity and longevity of xlc bugs has been out of proportion with >> the compiler's contribution to PostgreSQL. I would find it reasonable to >> revoke xlc support in v17+, leaving AIX gcc support in place. > +1 for this proposal. WFM, too. regards, tom lane
Hi, On 2023-10-19 10:38:14 -0400, Robert Haas wrote: > On Wed, Oct 18, 2023 at 7:33 PM Noah Misch <noah@leadboat.com> wrote: > > I feel the gravity and longevity of xlc bugs has been out of proportion with > > the compiler's contribution to PostgreSQL. I would find it reasonable to > > revoke xlc support in v17+, leaving AIX gcc support in place. > > +1 for this proposal. I just think this is getting silly. We're saying > that we only have access to 1 or 2 AIX machines, and most of us have > access to none, and the compiler has serious code generation bugs that > are present in both a release 11 years old and also a release current > release, meaning they went unfixed for 10 years, and we can't report > bugs or get them fixed when we find them, and the use of this > particular compiler in the buildfarm isn't finding any issues that > matter anywhere else. +1. > To be honest, I'm not entirely sure that even AIX gcc support is > delivering enough value per unit work to justify keeping it around. > But the xlc situation is worse. Agreed with both. If it were just a platform that didn't need special casing in a bunch of places, it'd be one thing, but it's linkage model is so odd that it makes no sense to keep AIX support around. But I'll take what I can get... Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2023-10-19 10:38:14 -0400, Robert Haas wrote: >> To be honest, I'm not entirely sure that even AIX gcc support is >> delivering enough value per unit work to justify keeping it around. >> But the xlc situation is worse. > Agreed with both. If it were just a platform that didn't need special casing > in a bunch of places, it'd be one thing, but it's linkage model is so odd that > it makes no sense to keep AIX support around. But I'll take what I can get... The other thread recently referred to: https://www.postgresql.org/message-id/flat/20220702183354.a6uhja35wta7agew%40alap3.anarazel.de was mostly about how AIX's choice that alignof(double) < alignof(int64) breaks a whole bunch of assumptions in our code. AFAICS we've done nothing to resolve that, and nobody really wants to deal with it, and there's no good reason to think that fixing it would improve portability to any other platform. So maybe there's an adequate case for just nuking AIX support altogether? I can't recall the last time I saw a report from an actual AIX end user. regards, tom lane