Re: [HACKERS] path toward faster partition pruning - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] path toward faster partition pruning |
Date | |
Msg-id | CA+TgmoahUxagjeNeJTcJkD0rbk+mHTXROzWcEd+tZ8DuQG83cg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] path toward faster partition pruning (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] path toward faster partition pruning
Re: [HACKERS] path toward faster partition pruning Re: [HACKERS] path toward faster partition pruning |
List | pgsql-hackers |
On Fri, Mar 2, 2018 at 6:21 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Looking at the rough interface sketch in your message, it seems that the > product of whatever steps we end up grouping into phase 1 should be > something that can be put into a Node tree (PartitionPruneContext?), > because we may need to attach it to the plan if some of the needed values > will only be made available during execution. Right. You might also end up with two representations: a Node-tree-style representation that contains all of the information we need, and another, faster form into which it gets converted before use. > Given the patch's implementation, we'll have to make the structure of that > Node tree a bit more complex than a simple List. For one thing, the patch > handles OR clauses by performing pruning separately for each arm and them > combining partitions selected across OR arms using set union. By > "performing pruning" in the last sentence I meant following steps similar > to ones you wrote in your message: > > 1. Segregating pruning clauses into per-partition-key Lists, that is, > generate_partition_clauses() producing a PartitionClauseInfo, > > 2. Removing redundant clauses from each list, that is, > remove_redundant_clauses() to produce lists with just one member per > operator strategy for each partition key, > > 3. Extracting Datum values from the clauses to form equal/min/max tuples > and setting null or not null bits for individual keys, that is, > extract_bounding_datums() producing a PartScanKeyInfo, and > > 4. Finally pruning with those Datum tuples and null/not null info, that > is, get_partitions_for_keys(). > > Steps 2-4 are dependent on clauses providing Datums, which all the clauses > may or may not do. Depending on whether or not, we'll have to defer those > steps to run time. I don't see that there's a real need to perform step 2 at all. I mean, if you have x > $1 and x > $2 in the query, you can just compute the set of partitions for the first clause, compute the set of partitions for the second clause, and then intersect. That doesn't seem obviously worse than deciding which of $1 and $2 is greater and then pruning only based on whichever one is greater in this case. > * What do we encode into the Node tree attached to the plan? Clauses that > haven't gone through steps 2 and 3 (something like PartitionClauseInfo) > or the product of step 3 (something like PartScanKeyInfo)? > > * How do we account for OR clauses? Perhaps by having the aforementioned > Node trees nested inside the top-level one, wherein there will be one > nested node per arm of an OR clause. Suppose we define the notion of a pruning program. A pruning program can use any number of registers, which have integer numbers starting with 0 and counting upward as high as necessary. Each register holds a Bitmapset. The result of a pruning program is the value of register 0 when the program completes. A pruning program consists of a list of steps, each of which is either a PruningBaseStep or a PruningCombineStep. A PruningCombineStep modifies the contents of the target register based on the contents of a source register in one of the following three ways: (1) UNION -- all bits set in source become set in target; (2) INTERSECT -- all bits clear in source become clear in target; (3) DIFFERENCE -- all bits set in source become clear in target. A PruningBaseStep consists of a strategy (equality, less-than, etc.), an output register, and list of expressions -- either as many as there are partition keys, or for range partitioning perhaps fewer; it prunes based on the strategy and the expressions and overwrites the output register with the partitions that would be selected. Example #1. Table is hash-partitioned on a and b. Given a query like SELECT * FROM tab WHERE a = 100 AND b = 233, we create a single-step program: 1. base-step (strategy =, register 0, expressions 100, 233) If there were an equality constraint on one of the two columns, we would not create a pruning program at all, because no pruning is possible. Example #2. Table is list-partitioned on a. Given a query like SELECT * FROM tab WHERE (a = $1 OR a = $2) AND a != $3, we create this program: 1. base-step (strategy =, register 0, expressions $1) 2. base-step (strategy =, register 1, expressions $2) 3. base-step (strategy !=, register 2, expressions $3) 4. combine-step (target-register 0, source-register 1, strategy union) 5. combine-step (target-register 0, source-register 2, strategy difference) (This is unoptimized -- one could do better by reversing steps 3 and 4 and using reusing register 1 instead of needing register 2, but that kind of optimization is probably not too important.) Example #3. Table is range-partitioned on a and b. Given a query like SELECT * FROM tab WHERE (a = 40 AND b > $1) OR (a = $2 AND b = $3), we do this: 1. base-step (strategy >, register 0, expressions 40, $1) 2. base-step (strategy =, register 1, expressions $2, $3) 3. combine-step (target-register 0, source-register 1, strategy union) You might need a few extra gadgets here to make all of this work -- e.g. another base-step strategy to handle ScalarArrayOpExpr; I'm just trying to convey the basic idea here. It's pretty easy to see how to store a program like this as a node tree: just create PruningBaseStep and PruningCombineStep nodes and stick them into a List. At execution time transform the List into an array and loop over it. Or possibly it would be better to have two lists, one of base steps without explicit register numbers, where step N always outputs to register N, and then a second list of combine steps. Then at execution time you could have an array of PruningBaseStep * and an array of PruningCombineStep * instead of a combined array of Node *, which might be quicker to process. But regardless of what you do exactly, I think you should try to come up with some kind of representation that is basically uniform, handling all the things you support in a similar fashion. The current patch has basically separate and somewhat ad-hoc representations for the regular case, the <> case, and the OR case, which I think is not ideal because you end up with more code and a certain amount of repeated logic. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: