Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables |
Date | |
Msg-id | CAFjFpRfS9fW1+9vcbqyPq0WcqxoGN1uPUHGGq_x2eTUorOHu7w@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
|
List | pgsql-hackers |
On Thu, Mar 16, 2017 at 12:30 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Mar 15, 2017 at 8:49 AM, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >>> Of course, that supposes that 0009 can manage to postpone creating >>> non-sampled child joinrels until create_partition_join_plan(), which >>> it currently doesn't. >> >> Right. We need the child-join's RelOptInfos to estimate sizes, so that >> we could sample the largest ones. So postponing it looks difficult. > > You have a point. > >>> In fact, unless I'm missing something, 0009 >>> hasn't been even slightly adapted to take advantage of the >>> infrastructure in 0001; it doesn't seem to reset the path_cxt or >>> anything. That seems like a fairly major omission. >> >> The path_cxt reset introduced by 0001 recycles memory used by all the >> paths, including paths created for the children. But that happens only >> after all the planning has completed. I thought that's what we >> discussed to be done. We could create a separate path context for >> every top-level child-join. > > I don't think we need to create a new context for each top-level > child-join, but I think we should create a context to be used across > all top-level child-joins and then reset it after planning each one. Sorry, that's what I meant by creating a new context for each top-level child-join. So, we need to copy the required path tree before resetting the context. I am fine doing that but read on. > I thought the whole point here was that NOT doing that caused the > memory usage for partitionwise join to get out of control. Am I > confused? We took a few steps to reduce the memory footprint of partition-wise join in [1] and [2]. According to the numbers reported in [1] and then in [2], if the total memory consumed by a planner is 44MB (memory consumed by paths 150K) for a 5-way non-parition-wise join between tables with 1000 partitions, partition-wise join consumed 192MB which is 4.4 times the non-partitino-wise case. The earlier implementation of blowing away a memory context after each top-level child-join, just got rid of the paths created for that child-join. The total memory consumed by paths created for all the child-joins was about 150MB. Remember that we can not get rid of memory consumed by expressions, RelOptInfos, RestrictInfos etc. since their pointers will be copied into the plan nodes. With changes in 0001, what happens is we accumulate 150MB till the end of the planning and get rid of it after we have created a plan. So, till the plan is created we are consuming approx. 192MB + 150MB = 342MB memory and are getting rid of 150MB memory after we have created the plan. I am not sure whether consuming extra 150MB or for that matter 342MB in a setup with a thousand partitions is "going out of control". (342MB is approx. 7.8 time 44MB; not 1000 times, and not even 10 times). But if you think that we should throw away unused paths after planning every top-level child-join I am fine with it. [1] https://www.postgresql.org/message-id/CAFjFpRcZ_M3-JxoiDkdoPS%2B-9Cok4ux9Si%2B4drcRL-62af%3DjWw@mail.gmail.com [2] https://www.postgresql.org/message-id/CAFjFpRe66z%2Bw9%2BdnAkWGiaB1CU2CUQsLGsqzHzYBoA%3DKJFf%2BPQ%40mail.gmail.com -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: