Thread: pgsql: Fix some issues with step generation in partition pruning.
Fix some issues with step generation in partition pruning. In the case of range partitioning, get_steps_using_prefix() assumes that the passed-in prefix list contains at least one clause for each of the partition keys earlier than one specified in the passed-in step_lastkeyno, but the caller (ie, gen_prune_steps_from_opexps()) didn't take it into account, which led to a server crash or incorrect results when the list contained no clauses for such partition keys, as reported in bug #16500 and #16501 from Kobayashi Hisanori. Update the caller to call that function only when the list created there contains at least one clause for each of the earlier partition keys in the case of range partitioning. While at it, fix some other issues: * The list to pass to get_steps_using_prefix() is allowed to contain multiple clauses for the same partition key, as described in the comment for that function, but that function actually assumed that the list contained just a single clause for each of middle partition keys, which led to an assertion failure when the list contained multiple clauses for such partition keys. Update that function to match the comment. * In the case of hash partitioning, partition keys are allowed to be NULL, in which case the list to pass to get_steps_using_prefix() contains no clauses for NULL partition keys, but that function treats that case as like the case of range partitioning, which led to the assertion failure. Update the assertion test to take into account NULL partition keys in the case of hash partitioning. * Fix a typo in a comment in get_steps_using_prefix_recurse(). * gen_partprune_steps() failed to detect self-contradiction from strict-qual clauses and an IS NULL clause for the same partition key in some cases, producing incorrect partition-pruning steps, which led to incorrect results of partition pruning, but didn't cause any user-visible problems fortunately, as the self-contradiction is detected later in the query planning. Update that function to detect the self-contradiction. Per bug #16500 and #16501 from Kobayashi Hisanori. Patch by me, initial diagnosis for the reported issue and review by Dmitry Dolgov. Back-patch to v11, where partition pruning was introduced. Discussion: https://postgr.es/m/16500-d1613f2a78e1e090%40postgresql.org Discussion: https://postgr.es/m/16501-5234a9a0394f6754%40postgresql.org Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/13838740f61fc455aa4196d257efc0b761daba1f Modified Files -------------- src/backend/partitioning/partprune.c | 187 ++++++++++++++++++-------- src/test/regress/expected/partition_prune.out | 92 +++++++++++++ src/test/regress/sql/partition_prune.sql | 71 ++++++++++ 3 files changed, 294 insertions(+), 56 deletions(-)
On Tue, Jul 28, 2020 at 2:01 PM Etsuro Fujita <efujita@postgresql.org> wrote: > src/backend/partitioning/partprune.c | 187 ++++++++++++++++++-------- Hi Fujita-san I wonder if this build farm failure is related? Program terminated with signal 11, Segmentation fault. ... #0 0x59c15c in gen_partprune_steps_internal (context=0x40223d28, clauses=0x0) at partprune.c:1483 1483 for (i = 0; i < pc->keyno; i++) https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gaur&dt=2020-08-01%2004%3A12%3A08
Thomas Munro <thomas.munro@gmail.com> writes: > Hi Fujita-san > I wonder if this build farm failure is related? Sure looks that way, doesn't it? I'm just now working to reproduce on gaur's host and poke into it with a debugger. More news in an hour or two (it's slow :-(). regards, tom lane
On Sat, Aug 1, 2020 at 11:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Hi Fujita-san > > I wonder if this build farm failure is related? Thanks for letting me know! > Sure looks that way, doesn't it? I'm just now working to reproduce > on gaur's host and poke into it with a debugger. More news in an > hour or two (it's slow :-(). Thanks for that! I'm not sure this is related to that failure, but self-reviewing the patch, one thing I noticed I missed is that get_steps_using_prefix() assumes that clauses in the passed-in prefix list are sorted in an ascending order of partkeyidx, but the caller (ie, gen_prune_steps_from_opexps()) doesn't take that into account. I might have broken something. I'm a bit tired today, so I'll look at this more closely tomorrow. Best regards, Etsuro Fujita
Etsuro Fujita <etsuro.fujita@gmail.com> writes: > On Sat, Aug 1, 2020 at 11:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Sure looks that way, doesn't it? I'm just now working to reproduce >> on gaur's host and poke into it with a debugger. More news in an >> hour or two (it's slow :-(). > Thanks for that! I've concluded that this is probably a compiler bug. It doesn't fail at optimization level -O0 or -O2, only -O1; and trying to step through gen_partprune_steps_internal() suggests that the part_scheme local variable is changing value, which it surely should not. Since gaur is running an ancient gcc version, and -O1 is doubtless a pretty under-tested optimization level, bugs there are not so surprising. There wasn't any amazingly good reason to be using -O1 for gaur, so I've switched the animal to use -O2. I expect it'll go back to green in a few hours. regards, tom lane
On Sun, Aug 2, 2020 at 3:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I've concluded that this is probably a compiler bug. It doesn't fail > at optimization level -O0 or -O2, only -O1; and trying to step through > gen_partprune_steps_internal() suggests that the part_scheme local > variable is changing value, which it surely should not. Since gaur > is running an ancient gcc version, and -O1 is doubtless a pretty > under-tested optimization level, bugs there are not so surprising. > > There wasn't any amazingly good reason to be using -O1 for gaur, > so I've switched the animal to use -O2. I expect it'll go back > to green in a few hours. I checked that gaur got back to green. Thanks! Best regards, Etsuro Fujita
On Sat, Aug 1, 2020 at 2:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > There wasn't any amazingly good reason to be using -O1 for gaur, > so I've switched the animal to use -O2. I expect it'll go back > to green in a few hours. While I like having HP-UX buildfarm coverage, the status page says that gaur is running 10.20, which according to noted reliable source Wikipedia, was released in 1996 and went out of support in 2003. So I wonder whether testing that version is really a sensible thing to do at this point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > While I like having HP-UX buildfarm coverage, the status page says > that gaur is running 10.20, which according to noted reliable source > Wikipedia, was released in 1996 and went out of support in 2003. So I > wonder whether testing that version is really a sensible thing to do > at this point. Certainly HPUX 10.20 isn't interesting as an OS anymore. I think the point of this animal is mostly to test on HPPA hardware, and the reason HPPA is interesting for us is mostly that it has such strange spinlocks, and that's not OS-dependent. If I still had support coverage on the machine, I'd upgrade to v11, but I don't. (Wikipedia claims that HPUX is going EOL altogether next year, so such a change wouldn't buy much for long anyhow.) I experimented a year or so ago with installing OpenBSD, which seems to be the last major open-source OS with HPPA support. I did get it running, but it takes about twice as long to do "check-world" as HPUX on the same machine, which does not speak well to the amount of effort OpenBSD has put into the port. It already takes gaur near six hours to do one buildfarm run; I don't want it to be twelve. So I'm planning to keep running gaur as-is until (a) the hardware dies or (b) we make a policy decision that breaks support for this hardware. pademelon (same machine, vendor compiler) is just there to keep us honest on C89 support until v11 is EOL. There are other ways to test C89 compliance, perhaps, but as long as this one is handy I don't feel like putting effort into figuring that out. (Besides, we have darn little coverage of not-gcc-derived compilers.) regards, tom lane