Thread: Proposal: partition pruning by secondary attributes
Hello, hackers! Sorry if this have already been discussed. I've had this idea some time ago and then successfully forgot about it until pgconf.ru, where I had a conversation with one of postgres users. His situation could be described as this: they have a table with id, timestamp and some other attributes, which is partitioned by (let's say) timestamp column. In different contexts they may want to filter the table either by id or by timestamp attribute (but not both). In this case pruning will only work for timestamp column. The idea is to store min and max values of secondary attributes (like 'id' in the example above) for each partition somewhere in catalog and use it for partition pruning along with partitioning key. You can think of it as somewhat like BRIN index but for partitions. And it will have similar limitations. For example, we may benefit if secondary attribute values are monotonically increase or decrease, but would be unhelpful if they are scattered, or if table wasn't partitioned by range. I wanted to ask community's opinion would it be worth considering. Thanks! -- Ildar Musin i.musin@postgrespro.ru
Ildar Musin wrote: > The idea is to store min and max values of secondary attributes (like > 'id' in the example above) for each partition somewhere in catalog and > use it for partition pruning along with partitioning key. You can think > of it as somewhat like BRIN index but for partitions. What is the problem with having a BRIN index? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-02-08 14:48:34 -0300, Alvaro Herrera wrote: > Ildar Musin wrote: > > > The idea is to store min and max values of secondary attributes (like > > 'id' in the example above) for each partition somewhere in catalog and > > use it for partition pruning along with partitioning key. You can think > > of it as somewhat like BRIN index but for partitions. > > What is the problem with having a BRIN index? Building plans to scan the individual partitions, lock them, open the relevant files, etc is often going to be significantly more expensive than pruning at plan time. But there also seems to be a number of fairly nasty locking issues with this proposal, leaving the amount of required code aside. Greetings, Andres Freund
On Thu, Feb 8, 2018 at 4:51 PM, Ildar Musin <i.musin@postgrespro.ru> wrote: > > The idea is to store min and max values of secondary attributes (like > 'id' in the example above) for each partition somewhere in catalog and > use it for partition pruning along with partitioning key. Every insertion and update of secondary attribute will touch catalog and update if required. That will increase the contention on catalog. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 08.02.2018 21:01, Andres Freund wrote: > On 2018-02-08 14:48:34 -0300, Alvaro Herrera wrote: >> Ildar Musin wrote: >> >>> The idea is to store min and max values of secondary attributes >>> (like 'id' in the example above) for each partition somewhere in >>> catalog and use it for partition pruning along with partitioning >>> key. You can think of it as somewhat like BRIN index but for >>> partitions. >> >> What is the problem with having a BRIN index? > > Building plans to scan the individual partitions, lock them, open > the relevant files, etc is often going to be significantly more > expensive than pruning at plan time. > > But there also seems to be a number of fairly nasty locking issues > with this proposal, leaving the amount of required code aside. > Sorry, I probably didn't describe it right. I wasn't talking about using brin index for partition pruning or something like that, just used it as a reference to the idea. I'll try to explain it in more detailed way. Let's say we have a table to store some events, which is partitioned by timestamp column: CREATE TABLE events ( id serial, dt timestamp, ... ) PARTITION BY RANGE (dt); In some cases it is queried by 'dt' column and partition pruning is working fine because 'dt' is a partitioning key: EXPLAIN (COSTS OFF) SELECT ... FROM events WHERE dt >= '2018-01-01' AND dt < '2018-02-01'; Append -> Seq Scan on events_0 Filter: ((dt >= '2018-01-01 00:00:00'::timestamp without time zone) AND (dt < '2018-02-01 00:00:00'::timestamp without time zone)) But if we filter the table by 'id' then planner has no other way but to append every partition to the plan. EXPLAIN (COSTS OFF) SELECT * FROM events WHERE id = 123; Append -> Seq Scan on events_0 Filter: (id = 123) -> Seq Scan on events_1 Filter: (id = 123) -> Seq Scan on events_2 Filter: (id = 123) We can see though that values of 'dt' and 'id' both monotonically increase over time and so we can potentially use 'id' column to do partition pruning at plan time too. To do so we need to store min and max values of 'id' column per partition somewhere in catalog and use them to decide which partition should be added to the plan by matching them to the query restrictions. Each time table is updated we must check whether new value exceeds stored min/max values and update those too if needed. This raises few issues. One of them as Ashutosh Bapat mentioned is the need to change catalog very often which could result in high catalog contention. I can't think of comprehensive solution for this problem. But for numerical and datetime types we could shift min or max bounds with margin so that not every update will result in catalog change. The other one is the need to change min/max of partition if rows were removed which is less evil since we can postpone it and do it later (during autovacuum for instance). A new command for ALTER TABLE should also be introduced to specify the column or expression which is not a partition key but can be used for partition pruning as described above. This command would scan each partition, gather min/max values and store them into catalog. What do you think? -- Ildar Musin i.musin@postgrespro.ru
Ildar Musin wrote: > But if we filter the table by 'id' then planner has no other way but to > append every partition to the plan. > > EXPLAIN (COSTS OFF) SELECT * FROM events WHERE id = 123; > Append > -> Seq Scan on events_0 > Filter: (id = 123) > -> Seq Scan on events_1 > Filter: (id = 123) > -> Seq Scan on events_2 > Filter: (id = 123) I think it should be possible to prune at runtime based on a brin index. As Andres says this means we cannot prune at plan time, and you still need to open the relations and indexes to perform pruning, but the contention problem is solved. A pretty crazy related idea is to allow BRIN indexes to be global -- so you create a brin index on the partitioned table in such a way that it doesn't cascade to create local indexes, but instead a single index represents the whole hierarchy. This requires a lot of other changes, but seems to match your design. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services