Re: Determine parallel-safety of partition relations for Inserts - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Determine parallel-safety of partition relations for Inserts |
Date | |
Msg-id | CAExHW5s_xBB4cyN12=1BcvoT5Of-kUWtcDGwZ3UNv1f356trVw@mail.gmail.com Whole thread Raw |
In response to | Determine parallel-safety of partition relations for Inserts (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Determine parallel-safety of partition relations for Inserts
|
List | pgsql-hackers |
On Fri, Jan 15, 2021 at 3:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > While reviewing parallel insert [1] (Insert into .... Select) and > parallel copy patches [2], it came to my notice that both the patches > traverse the entire partition hierarchy to determine parallel-safety > of partitioned relations. This is required because before considering > the Insert or Copy can be considered for parallelism, we need to > determine whether it is safe to do so. We need to check for each > partition because any of the partitions can have some parallel-unsafe > index expression, constraint, etc. We do a similar thing for Selects > in standard_planner. > > The plain Select case for partitioned tables was simpler because we > anyway loop through all the partitions in set_append_rel_size() and we > determine parallel-safety of each partition at that time but the same > is not true for Inserts. > > For Inserts, currently, we only open the partition table when we are > about to insert into that partition. During ExecInsert, we find out > the partition matching the partition-key value and then lock if it is > not already locked. In this patch, we need to open each partition at > the planning time to determine its parallel-safety. We don't want to open the partitions where no rows will be inserted. > > This will surely increase planning time but the execution is reduced > to an extent due to parallelism that it won't matter for either of the > cases if we see just total time. For example, see the latest results > for parallel inserts posted by Haiying Tang [3]. There might be an > impact when Selects can't be parallelized due to the small size of the > Select-table but we still have to traverse all the partitions to > determine parallel-safety but not sure how much it is compared to > overall time. I guess we need to find the same but apart from that can > anyone think of a better way to determine parallel-safety of > partitioned relation for Inserts? In case of SELECT we open only those partitions which surive pruning. So those are the ones which will definitely required to be scanned. We perform parallelism checks only on those partitions. The actual check isn't much costly. > > Thoughts? > > Note: I have kept a few people in Cc who are either directly involved > in this work or work regularly in the partitioning related work just > in the hope that might help in moving the discussion forward. Since you brought up comparison between SELECT and INSERT, "pruning" partitions based on the values being INSERTed might help. It should be doable in case of INSERT ... SELECT where we need to prune partitions based on the clauses of SELECT. Doable with some little effort in case of VALUEs and COPY. Second possibility is to open partitions only when the estimated number of rows to be inserted goes beyond a certain value. Third idea is to use something similar to parallel append where individual partitions are scanned sequentially but multiple partitions are scanned in parallel. When a row is inserted into a non-yet-opened partition, allocate one/more backends to insert into partitions which do not allow parallelism, otherwise continue to use a common pool of parallel workers for insertion. This means the same thread performing select may not perform insert. So some complications will be involved. -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: