Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan |
Date | |
Msg-id | CAH2-Wz=Ct9fLcN-2SV0gx_iBk87RsAi+Sor3OwR2_5Rg_KO0+g@mail.gmail.com Whole thread Raw |
In response to | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan |
List | pgsql-hackers |
On Tue, Nov 7, 2023 at 4:20 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > On Tue, 7 Nov 2023 at 00:03, Peter Geoghegan <pg@bowt.ie> wrote: > > I should be able to post v6 later this week. My current plan is to > > commit the other nbtree patch first (the backwards scan "boundary > > cases" one from the ongoing CF) -- since I saw your review earlier > > today. I think that you should probably wait for this v6 before > > starting your review. > > Okay, thanks for the update, then I'll wait for v6 to be posted. On second thought, I'll just post v6 now (there won't be conflicts against the master branch once the other patch is committed anyway). Highlights: * Major simplifications to the array key state machine, already described by my recent email. * Added preprocessing of "redundant and contradictory" array elements to _bt_preprocess_array_keys(). This makes the special preprocessing pass just for array keys ("preprocessing preprocessing") within _bt_preprocess_array_keys() make this query into a no-op: select * from tab where a in (180, 345) and a in (230, 300); -- contradictory Similarly, it can make this query only attempt one single primitive index scan for "230": select * from tab where a in (180, 230) and a in (230, 300); -- has redundancies, plus some individual elements contradict each other This duplicates some of what _bt_preprocess_keys can do already. But _bt_preprocess_keys can only do this stuff at the level of individual array elements/primitive index scans. Whereas this works "one level up", allowing preprocessing to see the full picture rather than just seeing the start of one particular primitive index scan. It explicitly works across array keys, saving repeat work inside _bt_preprocess_keys. That could really add up with thousands of array keys and/or multiple SAOPs. (Note that _bt_preprocess_array_keys already does something like this, to deal with SAOP inequalities such as "WHERE my_col >= any (array[1, 2])" -- it's a little surprising that this obvious optimization wasn't part of the original nbtree SAOP patch.) This reminds me: you might want to try breaking the patch by coming up with adversarial cases, Matthias. The patch needs to be able to deal with absurdly large amounts of array keys reasonably well, because it proposes to normalize passing those to the nbtree code. It's especially important that the patch never takes too much time to do something (e.g., binary searching through array keys) while holding a buffer lock -- even with very silly adversarial queries. So, for example, queries like this one (specifically designed to stress the implementation) *need* to work reasonably well: with a as ( select i from generate_series(0, 500000) i ) select count(*), thousand, tenthous from tenk1 where thousand = any (array[(select array_agg(i) from a)]) and tenthous = any (array[(select array_agg(i) from a)]) group by thousand, tenthous order by thousand, tenthous; (You can run this yourself after the regression tests finish, of course.) This takes about 130ms on my machine, hardly any of which takes place in the nbtree code with the patch (think tens of microseconds per _bt_readpage call, at most) -- the plan is an index-only scan that gets only 30 buffer hits. On the master branch, it's vastly slower -- 1000025 buffer hits. The query as a whole takes about 3 seconds there. If you have 3 or 4 SOAPs (with a composite index that has as many columns) you can quite easily DOS the master branch, since the planner makes a generic assumption that each of these SOAPs will have only 10 elements. The planner also thinks that with the patch applied, with one important difference: it doesn't matter to nbtree. The cost of scanning each index page should be practically independent of the total size of each array, at least past a certain point. Similarly, the maximum cost of an index scan should be approximately fixed: it should be capped at the cost of a full index scan (with the added cost of these relatively expensive quals still capped, still essentially independent of array sizes past some point). I notice that if I remove the "thousand = any (array[(select array_agg(i) from a)]) and" line from the adversarial query, executing the resulting query still get 30 buffer hits with the patch -- though it only takes 90ms this time (it's faster for reasons that likely have less than you'd think to do with nbtree overheads). This is just another way of getting roughly the same full index scan. That's a completely new way of thinking about nbtree SAOPs from a planner perspective (also from a user's perspective, I suppose). It's important that the planner's new optimistic assumptions about the cost profile of SOAPS (that it can expect reasonable performance/access patterns with wildly unreasonable/huge/arbitrarily complicated SAOPs) always be met by nbtree -- no repeat index page accesses, no holding a buffer lock for more than (say) a small fraction of 1 millisecond (no matter the complexity of the query), and possibly other things I haven't thought of yet. If you end up finding a bug in this v6, it'll most likely be a case where nbtree fails to live up to that. This project is as much about robust/predictable performance as anything else -- nbtree needs to be able to cope with practically anything. I suggest that your review start by trying to break the patch along these lines. -- Peter Geoghegan
Attachment
pgsql-hackers by date: