Re: [HACKERS] path toward faster partition pruning - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] path toward faster partition pruning |
Date | |
Msg-id | CA+TgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] path toward faster partition pruning (Jesper Pedersen <jesper.pedersen@redhat.com>) |
Responses |
Re: [HACKERS] path toward faster partition pruning
|
List | pgsql-hackers |
On Tue, Sep 26, 2017 at 9:00 AM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > Could you share your thoughts on the usage of PartitionAppendInfo's > min_datum_idx / max_datum_idx ? Especially in relation to hash partitions. This brings up something that I've kind of been thinking about. There are sort of four cases when it comes to partition pruning: 1. There is exactly one matching partition. For example, this happens when there is an equality constraint on every partition column. 2. There are multiple matching partitions which are consecutive. For example, there is a single level of range partitioning with no default partition and the single partitioning column is constrained by < > <= or >=. 3. There are multiple matching partitions which are not consecutive. This case is probably rare, but it can happen if there is a default partition, if there are list partitions with multiple bounds that are interleaved (e.g. p1 allows (1, 4), p2 allows (2), p3 allows (3, 5), and the query allows values >= 4 and <= 5), if the query involves OR conditions, or if there are multiple levels of partitioning (e.g. partition by a, subpartition by b, put a range constraint on a and an equality constraint on b). 4. There are no matching partitions. One of the goals of this algorithm is to be fast. The obvious way to cater to case (3) is to iterate through all partitions and test whether each one works, returning a Bitmapset, but that is O(n). Admittedly, it might be O(n) with a pretty small constant factor, but it still seems like exactly the sort of thing that we want to avoid given the desire to scale to higher partition counts. I propose that we create a structure that looks like this: struct foo { int min_partition; int max_partition; Bitmapset *extra_partitions; }; This indicates that all partitions from min_partition to max_partition need to be scanned, and in addition any partitions in extra_partitions need to be scanned. Assuming that we only consider cases where all partition keys or a leading subset of the partition keys are constrained, we'll generally be able to get by with just setting min_partition and max_partition, but extra_partitions can be used to handle default partitions and interleaved list bounds. For equality on all partitioning columns, we can do a single bsearch of the bounds to identify the target partition at a given partitioning level, and the same thing works for a single range-bound. If there are two range-bounds (< and > or <= and >= or whatever) we need to bsearch twice. The default partition, if any and if matched, must also be included. When there are multiple levels of partitioning things get a bit more complex -- if someone wants to knock out a partition that breaks up the range, we might need to shrink the main range to cover part of it and kick the other indexes out to extra_partitions. But the good thing is that in common cases with only O(lg n) effort we can return O(1) data that describes what will be scanned. In cases where that's not practical we expend more effort but still prune with maximal effectiveness. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: