[WIP] Better partial index-only scans - Mailing list pgsql-hackers
From | Joshua Yanovski |
---|---|
Subject | [WIP] Better partial index-only scans |
Date | |
Msg-id | CABz-M-GrkvrMc9ni5S0mX53rtZg3=SZNEYrU_A8RigQ2b3MGNA@mail.gmail.com Whole thread Raw |
Responses |
Re: [WIP] Better partial index-only scans
Re: [WIP] Better partial index-only scans |
List | pgsql-hackers |
Proof of concept initial patch for enabling index only scans for partial indices even when an attribute is not in the target list, as long as it is only used in restriction clauses that can be proved by the index predicate. This also works for index quals, though they still can't be used in the target list. However, this patch may be inefficient since it duplicates effort that is currently delayed until after the best plan is chosen. The patch works by basically repeating the logic from create_indexscan_plan in createplan.c that determines which clauses can't be discarded, instead of the current approach, which just assumes that any attributes referenced anywhere in a restriction clause has to be a column in the relevant index. It should build against master and passes make check for me. It also includes a minor fix in the same code in createplan.c to make sure we're explicitly comparing an empty list to NIL, but I can take that out if that's not considered in scope. If this were the final patch I'd probably coalesce the code used in both places into a single function, but since I'm not certain that the implementation in check_index_only won't change substantially I held off on that. Since the original comment suggested that this was not done due to planner performance concerns, I assume the performance of this approach is unacceptable (though I did a few benchmarks and wasn't able to detect a consistent difference--what would be a good test for this?). As such, this is intended as more of a first pass that I can build on, but I wanted to get feedback at this stage on where we can improve (particularly if there were already ideas on how this might be done, as the comment hints). Index only scans cost less than regular index scans so I don't think we can get away with waiting until we've chosen the best plan before we do the work described above. That said, as I see it performance could improve in any combination of five ways: * Improve the performance of determining which clauses can't be discarded (e.g. precompute some information about equivalence classes for index predicates, mess around with the order in which we check the clauses to make it fail faster, switch to real union-find data structures for equivalence classes). * Find a cleverer way of figuring out whether a partial index can be used than just checking which clauses can't be discarded. * Use a simpler heuristic (that doesn't match what use to determine which clauses can be discarded, but still matches more than we do now). * Take advantage of work we do here to speed things up elsewhere (e.g. if this does get chosen as the best plan we don't need to recompute the same information in create_indexscan_plan). * Delay determining whether to use an index scan or index only scan until after cost analysis somehow. I'm not sure exactly what this would entail. Since this is my first real work with the codebase, I'd really appreciate it if people could help me figure out the best approach here (and, more importantly, if one is necessary based on benchmarks). And while this should go without saying, if this patch doesn't actually work then please let me know, since all the above is based on the assumption that what's there is enough :) Thanks, Joshua Yanovski
Attachment
pgsql-hackers by date: