Home > mailing lists

Re: Proposal: partition pruning by secondary attributes - Mailing list pgsql-hackers

From	Ildar Musin
Subject	Re: Proposal: partition pruning by secondary attributes
Date	February 9, 2018 14:49:29
Msg-id	fa1d43eb-02ea-238b-deca-50149cfb97e0@postgrespro.ru Whole thread Raw
In response to	Re: Proposal: partition pruning by secondary attributes (Andres Freund <andres@anarazel.de>)
Responses	Re: Proposal: partition pruning by secondary attributes
List	pgsql-hackers

Tree view

On 08.02.2018 21:01, Andres Freund wrote:
> On 2018-02-08 14:48:34 -0300, Alvaro Herrera wrote:
>> Ildar Musin wrote:
>>
>>> The idea is to store min and max values of secondary attributes
>>> (like 'id' in the example above) for each partition somewhere in
>>> catalog and use it for partition pruning along with partitioning
>>> key. You can think of it as somewhat like BRIN index but for
>>> partitions.
>>
>> What is the problem with having a BRIN index?
>
> Building plans to scan the individual partitions, lock them, open
> the relevant files, etc is often going to be significantly more
> expensive than pruning at plan time.
>
> But there also seems to be a number of fairly nasty locking issues
> with this proposal, leaving the amount of required code aside.
>

Sorry, I probably didn't describe it right. I wasn't talking about using
brin index for partition pruning or something like that, just used it as
a reference to the idea. I'll try to explain it in more detailed way.

Let's say we have a table to store some events, which is partitioned by
timestamp column:

CREATE TABLE events (
    id serial,
    dt timestamp,
    ...
) PARTITION BY RANGE (dt);

In some cases it is queried by 'dt' column and partition pruning is
working fine because 'dt' is a partitioning key:

EXPLAIN (COSTS OFF) SELECT ... FROM events WHERE dt >= '2018-01-01' AND
dt < '2018-02-01';
  Append
    ->  Seq Scan on events_0
          Filter: ((dt >= '2018-01-01 00:00:00'::timestamp without time
zone) AND (dt < '2018-02-01 00:00:00'::timestamp without time zone))

But if we filter the table by 'id' then planner has no other way but to
append every partition to the plan.

EXPLAIN (COSTS OFF) SELECT * FROM events WHERE id = 123;
  Append
    ->  Seq Scan on events_0
          Filter: (id = 123)
    ->  Seq Scan on events_1
          Filter: (id = 123)
    ->  Seq Scan on events_2
          Filter: (id = 123)

We can see though that values of 'dt' and 'id' both monotonically
increase over time and so we can potentially use 'id' column to do
partition pruning at plan time too. To do so we need to store min and
max values of 'id' column per partition somewhere in catalog and use
them to decide which partition should be added to the plan by matching
them to the query restrictions.

Each time table is updated we must check whether new value exceeds
stored min/max values and update those too if needed. This raises few
issues. One of them as Ashutosh Bapat mentioned is the need to change
catalog very often which could result in high catalog contention. I
can't think of comprehensive solution for this problem. But for
numerical and datetime types we could shift min or max bounds with
margin so that not every update will result in catalog change. The other
one is the need to change min/max of partition if rows were removed
which is less evil since we can postpone it and do it later (during
autovacuum for instance).

A new command for ALTER TABLE should also be introduced to specify the
column or expression which is not a partition key but can be used for
partition pruning as described above. This command would scan each
partition, gather min/max values and store them into catalog.

What do you think?

-- 
Ildar Musin
i.musin@postgrespro.ru

pgsql-hackers by date:

From: Pavel Stehule
Date: 09 February 2018, 14:10:11
Subject: Re: Using scalar function as set-returning: bug or feature?

From: Etsuro Fujita
Date: 09 February 2018, 15:20:03
Subject: Re: non-bulk inserts and tuple routing

Re: Proposal: partition pruning by secondary attributes - Mailing list pgsql-hackers

Previous

Next