Thread: INSERT INTO SELECT, Why Parallelism is not selected?
I have just notice that the parallelism is off even for the select part of the query mentioned in the $subject. I see the only reason it is not getting parallel because we block the parallelism if the query type is not SELECT. I don't see any reason for not selecting the parallelism for this query. I have quickly hacked the code to enable the parallelism for this query. I can see there is a significant improvement if we can enable the parallelism in this case. For an experiment, I have just relaxed a couple of checks, maybe if we think that it's good to enable the parallelism for this case we can try to put better checks which are specific for this quey. No parallel: postgres[36635]=# explain analyze insert into t2 select * from t where a < 100; Insert on t2 (cost=0.00..29742.00 rows=100 width=105) (actual time=278.300..278.300 rows=0 loops=1) -> Seq Scan on t (cost=0.00..29742.00 rows=100 width=105) (actual time=0.061..277.330 rows=99 loops=1) Filter: (a < 100) Rows Removed by Filter: 999901 Planning Time: 0.093 ms Execution Time: 278.330 ms (6 rows) With parallel Insert on t2 (cost=1000.00..23460.33 rows=100 width=105) (actual time=108.410..108.410 rows=0 loops=1) -> Gather (cost=1000.00..23460.33 rows=100 width=105) (actual time=0.306..108.973 rows=99 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on t (cost=0.00..22450.33 rows=42 width=105) (actual time=66.396..101.979 rows=33 loops=3) Filter: (a < 100) Rows Removed by Filter: 333300 Planning Time: 0.154 ms Execution Time: 110.158 ms (9 rows) -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
On Sat, Jul 11, 2020 at 6:07 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I have just notice that the parallelism is off even for the select > part of the query mentioned in the $subject. I see the only reason it > is not getting parallel because we block the parallelism if the query > type is not SELECT. I don't see any reason for not selecting the > parallelism for this query. I have quickly hacked the code to enable > the parallelism for this query. I can see there is a significant > improvement if we can enable the parallelism in this case. For an > experiment, I have just relaxed a couple of checks, maybe if we think > that it's good to enable the parallelism for this case we can try to > put better checks which are specific for this quey. > +1. I also don't see any problem with this idea considering we will find a better way to enable the parallelism for this case because we can already use parallelism for statements like "create table <tbl_name> as select ...". I think we can do more than this by parallelizing the Insert part of this query as well as we have lifted group locking restrictions related to RelationExtension and Page lock in PG13. It would be really cool to do that unless we see any fundamental problems with it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Jul 13, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Jul 11, 2020 at 6:07 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I have just notice that the parallelism is off even for the select > > part of the query mentioned in the $subject. I see the only reason it > > is not getting parallel because we block the parallelism if the query > > type is not SELECT. I don't see any reason for not selecting the > > parallelism for this query. I have quickly hacked the code to enable > > the parallelism for this query. I can see there is a significant > > improvement if we can enable the parallelism in this case. For an > > experiment, I have just relaxed a couple of checks, maybe if we think > > that it's good to enable the parallelism for this case we can try to > > put better checks which are specific for this quey. > > > > +1. I also don't see any problem with this idea considering we will > find a better way to enable the parallelism for this case because we > can already use parallelism for statements like "create table > <tbl_name> as select ...". Okay, thanks for the feedback. I think we can do more than this by > parallelizing the Insert part of this query as well as we have lifted > group locking restrictions related to RelationExtension and Page lock > in PG13. It would be really cool to do that unless we see any > fundamental problems with it. +1, I think it will be cool to support for insert part here as well as insert part in 'Create Table As Select..' as well. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Sat, Jul 11, 2020 at 6:07 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have just notice that the parallelism is off even for the select
part of the query mentioned in the $subject. I see the only reason it
is not getting parallel because we block the parallelism if the query
type is not SELECT. I don't see any reason for not selecting the
parallelism for this query. I have quickly hacked the code to enable
the parallelism for this query. I can see there is a significant
improvement if we can enable the parallelism in this case. For an
experiment, I have just relaxed a couple of checks, maybe if we think
that it's good to enable the parallelism for this case we can try to
put better checks which are specific for this quey.
+1 for the idea. For the given example also it shows a good performance
gain and I also don't any reason on restrict the parallel case for INSERT INTO SELECT.
No parallel:
postgres[36635]=# explain analyze insert into t2 select * from t where a < 100;
Insert on t2 (cost=0.00..29742.00 rows=100 width=105) (actual
time=278.300..278.300 rows=0 loops=1)
-> Seq Scan on t (cost=0.00..29742.00 rows=100 width=105) (actual
time=0.061..277.330 rows=99 loops=1)
Filter: (a < 100)
Rows Removed by Filter: 999901
Planning Time: 0.093 ms
Execution Time: 278.330 ms
(6 rows)
With parallel
Insert on t2 (cost=1000.00..23460.33 rows=100 width=105) (actual
time=108.410..108.410 rows=0 loops=1)
-> Gather (cost=1000.00..23460.33 rows=100 width=105) (actual
time=0.306..108.973 rows=99 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t (cost=0.00..22450.33 rows=42
width=105) (actual time=66.396..101.979 rows=33 loops=3)
Filter: (a < 100)
Rows Removed by Filter: 333300
Planning Time: 0.154 ms
Execution Time: 110.158 ms
(9 rows)
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Rushabh Lathia
On Sat, Jul 11, 2020 at 8:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > I have just notice that the parallelism is off even for the select > part of the query mentioned in the $subject. I see the only reason it > is not getting parallel because we block the parallelism if the query > type is not SELECT. I don't see any reason for not selecting the > parallelism for this query. There's a relevant comment near the top of heap_prepare_insert(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jul 15, 2020 at 12:32 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Sat, Jul 11, 2020 at 8:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I have just notice that the parallelism is off even for the select > > part of the query mentioned in the $subject. I see the only reason it > > is not getting parallel because we block the parallelism if the query > > type is not SELECT. I don't see any reason for not selecting the > > parallelism for this query. > > There's a relevant comment near the top of heap_prepare_insert(). > I think that is no longer true after commits 85f6b49c2c and 3ba59ccc89 where we have allowed relation extension and page locks to conflict among group members. We have accordingly changed comments at a few places but forgot to update this one. I will check and see if any other similar comments are there which needs to be updated. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 15, 2020 at 8:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jul 15, 2020 at 12:32 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Sat, Jul 11, 2020 at 8:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > I have just notice that the parallelism is off even for the select > > > part of the query mentioned in the $subject. I see the only reason it > > > is not getting parallel because we block the parallelism if the query > > > type is not SELECT. I don't see any reason for not selecting the > > > parallelism for this query. > > > > There's a relevant comment near the top of heap_prepare_insert(). > > > > I think that is no longer true after commits 85f6b49c2c and 3ba59ccc89 > where we have allowed relation extension and page locks to conflict > among group members. We have accordingly changed comments at a few > places but forgot to update this one. I will check and see if any > other similar comments are there which needs to be updated. > The attached patch fixes the comments. Let me know if you think I have missed anything or any other comments. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Thu, Jul 16, 2020 at 8:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jul 15, 2020 at 8:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Jul 15, 2020 at 12:32 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > > On Sat, Jul 11, 2020 at 8:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I have just notice that the parallelism is off even for the select > > > > part of the query mentioned in the $subject. I see the only reason it > > > > is not getting parallel because we block the parallelism if the query > > > > type is not SELECT. I don't see any reason for not selecting the > > > > parallelism for this query. > > > > > > There's a relevant comment near the top of heap_prepare_insert(). > > > > > > > I think that is no longer true after commits 85f6b49c2c and 3ba59ccc89 > > where we have allowed relation extension and page locks to conflict > > among group members. We have accordingly changed comments at a few > > places but forgot to update this one. I will check and see if any > > other similar comments are there which needs to be updated. > > > > The attached patch fixes the comments. Let me know if you think I > have missed anything or any other comments. Your comments look good to me. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 15, 2020 at 11:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > The attached patch fixes the comments. Let me know if you think I > have missed anything or any other comments. If it's safe now, why not remove the error check? (Is it safe? Could there be other problems?) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jul 16, 2020 at 6:43 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Jul 15, 2020 at 11:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > The attached patch fixes the comments. Let me know if you think I > > have missed anything or any other comments. > > If it's safe now, why not remove the error check? > I think it is not safe for all kind of Inserts (see my response later in email), so we need some smarts to identify un-safe inserts before we can open this check. > (Is it safe? Could there be other problems?) > I think we need to be careful of two things: (a) Do we want to enable parallel inserts where tuple locks are involved, forex. in statements like "Insert into primary_tbl Select * from secondary_tbl Where col < 10 For Update;"? In such statements, I don't see any problem because each worker will operate on a separate page and even if the leader already has a lock on the tuple, it will be granted to the worker as it is taken in the same transaction. (b) The insert statements that can generate 'CommandIds' which can happen while insert into tables with foreign keys, see below test: CREATE TABLE primary_tbl(index INTEGER PRIMARY KEY, height real, weight real); insert into primary_tbl values(1, 1.1, 100); insert into primary_tbl values(2, 1.2, 100); insert into primary_tbl values(3, 1.3, 100); CREATE TABLE secondary_tbl(index INTEGER REFERENCES primary_tbl(index), height real, weight real); insert into secondary_tbl values(generate_series(1,3),1.2,200); Here we can't parallelise statements like "insert into secondary_tbl values(generate_series(1,3),1.2,200);" as they will generate 'CommandIds' for each row insert into table with foreign key. The new command id is generated while performing a foreign key check. Now, it is a separate question whether generating a command id for each row insert is required or not but as of now we can't parallelize such statements. Do you have something else in mind? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Jul 17, 2020 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Do you have something else in mind? > I am planning to commit the comments change patch attached in the above email [1] next week sometime (probably Monday or Tuesday) unless you have something more to add? [1] - https://www.postgresql.org/message-id/CAA4eK1%2BRL7c_s%3D%2BTwAE6DJ1MmupbEiGCFLt97US%2BDMm6UxAjTA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Jul 24, 2020 at 7:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Jul 17, 2020 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Do you have something else in mind? > > I am planning to commit the comments change patch attached in the > above email [1] next week sometime (probably Monday or Tuesday) unless > you have something more to add? Well, I think the comments could be more clear - for the insert case specifically - about which cases you think are and are not safe. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Well, I think the comments could be more clear - for the insert case > specifically - about which cases you think are and are not safe. Yeah, the proposed comment changes don't actually add much. Also please try to avoid inserting non-ASCII into the source code; at least in my mail reader, that attachment has some. regards, tom lane
On Fri, Jul 24, 2020 at 7:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > Well, I think the comments could be more clear - for the insert case > > specifically - about which cases you think are and are not safe. > Okay, I'll update the patch accordingly. > Yeah, the proposed comment changes don't actually add much. Also > please try to avoid inserting non-ASCII into the source code; > at least in my mail reader, that attachment has some. > I don't see any non-ASCII characters in the patch. I have applied and checked (via vi editor) the patch as well but don't see any non-ASCII characters. How can I verify that? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > On Fri, Jul 24, 2020 at 7:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, the proposed comment changes don't actually add much. Also >> please try to avoid inserting non-ASCII into the source code; >> at least in my mail reader, that attachment has some. > I don't see any non-ASCII characters in the patch. I have applied and > checked (via vi editor) the patch as well but don't see any non-ASCII > characters. How can I verify that? They're definitely there: $ od -c 0001-Fix-comments-in-heapam.c.patch ... 0002740 h e \n + \t * l e a d e r c 0002760 a n p e r f o r m t h e i 0003000 n s e r t . 302 240 T h i s r e 0003020 s t r i c t i o n c a n b e 0003040 u p l i f t e d o n c e w 0003060 e \n + \t * a l l o w t h e 0003100 302 240 p l a n n e r t o g e n 0003120 e r a t e p a r a l l e l p 0003140 l a n s f o r i n s e r t s 0003160 . \n \t * / \n \t i f ( I s ... I'm not sure if "git diff --check" would whine about this. regards, tom lane
On Sat, Jul 25, 2020 at 8:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > On Fri, Jul 24, 2020 at 7:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Yeah, the proposed comment changes don't actually add much. Also > >> please try to avoid inserting non-ASCII into the source code; > >> at least in my mail reader, that attachment has some. > > > I don't see any non-ASCII characters in the patch. I have applied and > > checked (via vi editor) the patch as well but don't see any non-ASCII > > characters. How can I verify that? > > They're definitely there: > > $ od -c 0001-Fix-comments-in-heapam.c.patch > ... > 0002740 h e \n + \t * l e a d e r c > 0002760 a n p e r f o r m t h e i > 0003000 n s e r t . 302 240 T h i s r e > 0003020 s t r i c t i o n c a n b e > 0003040 u p l i f t e d o n c e w > 0003060 e \n + \t * a l l o w t h e > 0003100 302 240 p l a n n e r t o g e n > 0003120 e r a t e p a r a l l e l p > 0003140 l a n s f o r i n s e r t s > 0003160 . \n \t * / \n \t i f ( I s > ... > Thanks, I could see that. > I'm not sure if "git diff --check" would whine about this. > No, "git diff --check" doesn't help. I have tried pgindent but that also doesn't help neither was I expecting it to help. I am still not able to figure out how I goofed up this but will spend some more time on this. In the meantime, I have updated the patch to improve the comments as suggested by Robert. Do let me know if you want to edit/add something more? -- With Regards, Amit Kapila.
Attachment
On Sun, Jul 26, 2020 at 4:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Jul 25, 2020 at 8:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > No, "git diff --check" doesn't help. I have tried pgindent but that > also doesn't help neither was I expecting it to help. I am still not > able to figure out how I goofed up this but will spend some more time > on this. > I think I have figured out how the patch ended up having . Some editors add it when we use two spaces after a period (.) and I could see that my Gmail client does that for such a pattern. Normally, I use one of MSVC, vi, or NetBeans IDE to develop code/patch but sometimes I check the comments by pasting in Gmail client to find typos or such and then fix them manually. I guess in this case I have used Gmail client to write this comment and then copy/pasted it for the patch. I will be careful not to do this in the future. -- With Regards, Amit Kapila.
On Sun, Jul 26, 2020 at 7:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > No, "git diff --check" doesn't help. I have tried pgindent but that > also doesn't help neither was I expecting it to help. I am still not > able to figure out how I goofed up this but will spend some more time > on this. In the meantime, I have updated the patch to improve the > comments as suggested by Robert. Do let me know if you want to > edit/add something more? I still don't agree with this as proposed. + * For now, we don't allow parallel inserts of any form not even where the + * leader can perform the insert. This restriction can be uplifted once + * we allow the planner to generate parallel plans for inserts. We can If I'm understanding this correctly, this logic is completely backwards. We don't prohibit inserts here because we know the planner can't generate them. We prohibit inserts here because, if the planner somehow did generate them, it wouldn't be safe. You're saying that it's not allowed because we don't try to do it yet, but actually it's not allowed because we want to make sure that we don't accidentally try to do it. That's very different. + * parallelize inserts unless they generate a new commandid (ex. inserts + * into a table having foreign key column) or lock tuples (ex. statements + * like Insert .. Select For Update). I understand the part about generating new command IDs, but not the part about locking tuples. Why would that be a problem? Can it better explained here? Examples in comments are typically introduced with e.g., not ex. + * We should be able to parallelize + * the later case if we can ensure that no two parallel processes can ever + * operate on the same page. I don't know whether this is talking about two processes operating on the same page at the same time, or ever within a single query execution. If it's the former, perhaps we need to explain why that's a concern for parallel query but not otherwise; if it's the latter, that seems impossible to guarantee and imagining that we'll ever be able to do so seems like wishful thinking. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jul 29, 2020 at 7:18 PM Robert Haas <robertmhaas@gmail.com> wrote: > > I still don't agree with this as proposed. > > + * For now, we don't allow parallel inserts of any form not even where the > + * leader can perform the insert. This restriction can be uplifted once > + * we allow the planner to generate parallel plans for inserts. We can > > If I'm understanding this correctly, this logic is completely > backwards. We don't prohibit inserts here because we know the planner > can't generate them. We prohibit inserts here because, if the planner > somehow did generate them, it wouldn't be safe. You're saying that > it's not allowed because we don't try to do it yet, but actually it's > not allowed because we want to make sure that we don't accidentally > try to do it. That's very different. > Right, so how about something like: "To allow parallel inserts, we need to ensure that they are safe to be performed in workers. We have the infrastructure to allow parallel inserts in general except for the case where inserts generate a new commandid (eg. inserts into a table having a foreign key column)." We can extend this for tuple locking if required as per the below discussion. Kindly suggest if you prefer a different wording here. > > + * We should be able to parallelize > + * the later case if we can ensure that no two parallel processes can ever > + * operate on the same page. > > I don't know whether this is talking about two processes operating on > the same page at the same time, or ever within a single query > execution. If it's the former, perhaps we need to explain why that's a > concern for parallel query but not otherwise; > I am talking about the former case and I know that as per current design it is not possible that two worker processes try to operate on the same page but I was trying to be pessimistic so that we can ensure that via some form of Assert. I don't know whether it is important to mention this case or not but for the sake of extra safety, I have mentioned it. -- With Regards, Amit Kapila.
On Thu, Jul 30, 2020 at 12:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jul 29, 2020 at 7:18 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > I still don't agree with this as proposed. > > > > + * For now, we don't allow parallel inserts of any form not even where the > > + * leader can perform the insert. This restriction can be uplifted once > > + * we allow the planner to generate parallel plans for inserts. We can > > > > If I'm understanding this correctly, this logic is completely > > backwards. We don't prohibit inserts here because we know the planner > > can't generate them. We prohibit inserts here because, if the planner > > somehow did generate them, it wouldn't be safe. You're saying that > > it's not allowed because we don't try to do it yet, but actually it's > > not allowed because we want to make sure that we don't accidentally > > try to do it. That's very different. > > > > Right, so how about something like: "To allow parallel inserts, we > need to ensure that they are safe to be performed in workers. We have > the infrastructure to allow parallel inserts in general except for the > case where inserts generate a new commandid (eg. inserts into a table > having a foreign key column)." We can extend this for tuple locking > if required as per the below discussion. Kindly suggest if you prefer > a different wording here. > > > > > + * We should be able to parallelize > > + * the later case if we can ensure that no two parallel processes can ever > > + * operate on the same page. > > > > I don't know whether this is talking about two processes operating on > > the same page at the same time, or ever within a single query > > execution. If it's the former, perhaps we need to explain why that's a > > concern for parallel query but not otherwise; > > > > I am talking about the former case and I know that as per current > design it is not possible that two worker processes try to operate on > the same page but I was trying to be pessimistic so that we can ensure > that via some form of Assert. > I think the two worker processes can operate on the same page for a parallel index scan case but it won't be for same tuple. I am not able to think of any case where we should be worried about tuple locking for Insert's case, so we can probably skip writing anything about it unless someone else can think of such a case. -- With Regards, Amit Kapila.
On Tue, Jul 14, 2020 at 1:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Jul 13, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > I think we can do more than this by > > parallelizing the Insert part of this query as well as we have lifted > > group locking restrictions related to RelationExtension and Page lock > > in PG13. It would be really cool to do that unless we see any > > fundamental problems with it. > > +1, I think it will be cool to support for insert part here as well as > insert part in 'Create Table As Select..' as well. > +1 to parallelize inserts. Currently, ExecInsert() and CTAS use table_tuple_insert(), if we parallelize these parts, each worker will be inserting it's tuples(one tuple at a time) into the same data page, until space is available, if not a new data page can be obtained by any of the worker, others might start inserting into it. This way, will there be lock contention on data pages?. Do we also need to make inserts to use table_multi_insert() (like the way "COPY" uses) instead of table_tuple_insert()? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Tue, Aug 18, 2020 at 1:37 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, Jul 14, 2020 at 1:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Mon, Jul 13, 2020 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > I think we can do more than this by > > > parallelizing the Insert part of this query as well as we have lifted > > > group locking restrictions related to RelationExtension and Page lock > > > in PG13. It would be really cool to do that unless we see any > > > fundamental problems with it. > > > > +1, I think it will be cool to support for insert part here as well as > > insert part in 'Create Table As Select..' as well. > > > > +1 to parallelize inserts. Currently, ExecInsert() and CTAS use > table_tuple_insert(), if we parallelize these parts, each worker will > be inserting it's tuples(one tuple at a time) into the same data page, > until space is available, if not a new data page can be obtained by > any of the worker, others might start inserting into it. This way, > will there be lock contention on data pages? > It is possible but we need to check how much that is a bottleneck because that should not be a big part of the operation. And, it won't be any worse than inserts via multiple backends. I think it is important to do that way, otherwise, some of the pages can remain half-empty. Right now, the plan for Insert ... Select is like Insert on <tbl_x> -> Seq Scan on <tbl_y> .... In the above the scan could be index scan as well. What we want is: Gather -> Insert on <tbl_x> -> Seq Scan on <tbl_y> .... >. Do we also need to make > inserts to use table_multi_insert() (like the way "COPY" uses) instead > of table_tuple_insert()? > I am not sure at this stage but if it turns out to be a big problem then we might think of inventing some way to allow individual workers to operate on different pages. I think even without that we should be able to make a big gain as reads, filtering, etc can be done in parallel. -- With Regards, Amit Kapila.
On Thu, Jul 30, 2020 at 6:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jul 30, 2020 at 12:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Jul 29, 2020 at 7:18 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > > I still don't agree with this as proposed. > > > > > > + * For now, we don't allow parallel inserts of any form not even where the > > > + * leader can perform the insert. This restriction can be uplifted once > > > + * we allow the planner to generate parallel plans for inserts. We can > > > > > > If I'm understanding this correctly, this logic is completely > > > backwards. We don't prohibit inserts here because we know the planner > > > can't generate them. We prohibit inserts here because, if the planner > > > somehow did generate them, it wouldn't be safe. You're saying that > > > it's not allowed because we don't try to do it yet, but actually it's > > > not allowed because we want to make sure that we don't accidentally > > > try to do it. That's very different. > > > > > > > Right, so how about something like: "To allow parallel inserts, we > > need to ensure that they are safe to be performed in workers. We have > > the infrastructure to allow parallel inserts in general except for the > > case where inserts generate a new commandid (eg. inserts into a table > > having a foreign key column)." Robert, Dilip, do you see any problem if we change the comment on the above lines? Feel free to suggest if you have something better in mind. > > We can extend this for tuple locking > > if required as per the below discussion. Kindly suggest if you prefer > > a different wording here. > > I feel we can leave this based on the reasoning provided below. > > > > > > + * We should be able to parallelize > > > + * the later case if we can ensure that no two parallel processes can ever > > > + * operate on the same page. > > > > > > I don't know whether this is talking about two processes operating on > > > the same page at the same time, or ever within a single query > > > execution. If it's the former, perhaps we need to explain why that's a > > > concern for parallel query but not otherwise; > > > > > > > I am talking about the former case and I know that as per current > > design it is not possible that two worker processes try to operate on > > the same page but I was trying to be pessimistic so that we can ensure > > that via some form of Assert. > > > > I think the two worker processes can operate on the same page for a > parallel index scan case but it won't be for same tuple. I am not able > to think of any case where we should be worried about tuple locking > for Insert's case, so we can probably skip writing anything about it > unless someone else can think of such a case. > -- With Regards, Amit Kapila.
On Wed, Sep 9, 2020 at 10:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jul 30, 2020 at 6:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Jul 30, 2020 at 12:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Jul 29, 2020 at 7:18 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > > > > I still don't agree with this as proposed. > > > > > > > > + * For now, we don't allow parallel inserts of any form not even where the > > > > + * leader can perform the insert. This restriction can be uplifted once > > > > + * we allow the planner to generate parallel plans for inserts. We can > > > > > > > > If I'm understanding this correctly, this logic is completely > > > > backwards. We don't prohibit inserts here because we know the planner > > > > can't generate them. We prohibit inserts here because, if the planner > > > > somehow did generate them, it wouldn't be safe. You're saying that > > > > it's not allowed because we don't try to do it yet, but actually it's > > > > not allowed because we want to make sure that we don't accidentally > > > > try to do it. That's very different. > > > > > > > > > > Right, so how about something like: "To allow parallel inserts, we > > > need to ensure that they are safe to be performed in workers. We have > > > the infrastructure to allow parallel inserts in general except for the > > > case where inserts generate a new commandid (eg. inserts into a table > > > having a foreign key column)." > > Robert, Dilip, do you see any problem if we change the comment on the > above lines? Feel free to suggest if you have something better in > mind. > Hearing no further comments, I have pushed the changes as discussed above. -- With Regards, Amit Kapila.