Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers
From | Luc Vlaming |
---|---|
Subject | Re: Parallel Inserts in CREATE TABLE AS |
Date | |
Msg-id | ef02fe4e-b476-40e6-3fd6-3407caf82fb5@swarm64.com Whole thread Raw |
In response to | Re: Parallel Inserts in CREATE TABLE AS (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Parallel Inserts in CREATE TABLE AS
|
List | pgsql-hackers |
On 04-01-2021 14:32, Bharath Rupireddy wrote: > On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming <luc@swarm64.com > <mailto:luc@swarm64.com>> wrote: > > Sorry it took so long to get back to reviewing this. > > Thanks for the comments. > > > wrt v18-0001....patch: > > > > + /* > > + * If the worker is for parallel insert in CTAS, then > use the proper > > + * dest receiver. > > + */ > > + intoclause = (IntoClause *) stringToNode(intoclausestr); > > + receiver = CreateIntoRelDestReceiver(intoclause); > > + ((DR_intorel *)receiver)->is_parallel_worker = true; > > + ((DR_intorel *)receiver)->object_id = fpes->objectid; > > I would move this into a function called e.g. > > GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in > > createas.c. > > I would then also split up intorel_startup into intorel_leader_startup > > and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set > > self->pub.rStartup to intorel_worker_startup. > > My intention was to not add any new APIs to the dest receiver. I simply > made the changes in intorel_startup, in which for workers it just does > the minimalistic work and exit from it. In the leader most of the table > creation and sanity check is kept untouched. Please have a look at the > v19 patch posted upthread [1]. > Looks much better, really nicely abstracted away in the v20 patch. > > + volatile pg_atomic_uint64 *processed; > > why is it volatile? > > Intention is to always read from the actual memory location. I referred > it from the way pg_atomic_fetch_add_u64_impl, > pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their > u32 counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr. > Okay I had not seen this syntax before for atomics with the volatile keyword but its apparently how the atomics abstraction works in postgresql. > > + if (isctas) > > + { > > + intoclause = ((DR_intorel *) > node->dest)->into; > > + objectid = ((DR_intorel *) > node->dest)->object_id; > > + } > > Given that you extract them each once and then pass them directly into > > the parallel-worker, can't you instead pass in the destreceiver and > > leave that logic to ExecInitParallelPlan? > > That's changed entirely in the v19 patch set posted upthread [1]. Please > have a look. I didn't pass the dest receiver, to keep the API generic, I > passed parallel insert command type and a void * ptr which points to > insertion command because the information we pass to workers depends on > the insertion command (for instance, the information needed by workers > is for CTAS into clause and object id and for Refresh Mat View object id). > > > > > + if > (IS_PARALLEL_CTAS_DEST(gstate->dest) && > > + ((DR_intorel *) > gstate->dest)->into->rel && > > + ((DR_intorel *) > gstate->dest)->into->rel->relname) > > why would rel and relname not be there? if no rows have been inserted? > > because it seems from the intorel_startup function that that would be > > set as soon as startup was done, which i assume (wrongly?) is always > done? > > Actually, that into clause rel variable is always being set in the > gram.y for CTAS, Create Materialized View and SELECT INTO (because > qualified_name non-terminal is not optional). My bad. I just added it as > a sanity check. Actually, it's not required. > > create_as_target: > *qualified_name* opt_column_list table_access_method_clause > OptWith OnCommitOption OptTableSpace > { > $$ = makeNode(IntoClause); > * $$->rel = $1;* > create_mv_target: > *qualified_name* opt_column_list table_access_method_clause > opt_reloptions OptTableSpace > { > $$ = makeNode(IntoClause); > * $$->rel = $1;* > into_clause: > INTO OptTempTableName > { > $$ = makeNode(IntoClause); > * $$->rel = $2;* > > I will change the below code: > + if (GetParallelInsertCmdType(gstate->dest) == > + PARALLEL_INSERT_CMD_CREATE_TABLE_AS && > + ((DR_intorel *) gstate->dest)->into && > + ((DR_intorel *) gstate->dest)->into->rel && > + ((DR_intorel *) gstate->dest)->into->rel->relname) > + { > > to: > + if (GetParallelInsertCmdType(gstate->dest) == > + PARALLEL_INSERT_CMD_CREATE_TABLE_AS) > + { > > I will update this in the next version of the patch set. > Thanks > > > > + * In case if no workers were launched, allow the leader to > insert entire > > + * tuples. > > what does "entire tuples" mean? should it maybe be "all tuples"? > > Yeah, noticed that while working on the v19 patch set. Please have a > look at the v19 patch posted upthread [1]. > > > ================ > > wrt v18-0003....patch: > > > > not sure if it is needed, but i was wondering if we would want more > > tests with multiple gather nodes existing? caused e.g. by using CTE's, > > valid subquery's (like the one test you have, but without the group > > by/having)? > > I'm not sure if we can have CTAS/CMV/SELECT INTO in CTEs like WITH, WITH > RECURSIVE and I don't see that any of the WITH clause processing hits > createas.c functions. So, IMHO, we don't need to add them. Please let me > know if there are any specific use cases you have in mind. > > For instance, I tried to cover Init/Sub Plan and Subquery cases with: > > below case has multiple Gather, Init Plan: > +-- parallel inserts must occur, as there is init plan that gets executed by > +-- each parallel worker > +select explain_pictas( > +'create table parallel_write as select two col1, > + (select two from (select * from tenk2) as tt limit 1) col2 > + from tenk1 where tenk1.four = 3;'); > > below case has Gather, Sub Plan: > +-- parallel inserts must not occur, as there is sub plan that gets > executed by > +-- the Gather node in leader > +select explain_pictas( > +'create table parallel_write as select two col1, > + (select tenk1.two from generate_series(1,1)) col2 > + from tenk1 where tenk1.four = 3;'); > > For multiple Gather node cases, I covered them with the Union All/Append > cases in the 0004 patch. Please have a look. > Right, had not reviewed part 4 yet. My bad. > [1] - > https://www.postgresql.org/message-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk%2Bzmg%40mail.gmail.com > <https://www.postgresql.org/message-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk%2Bzmg%40mail.gmail.com> > > With Regards, > Bharath Rupireddy. > EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com> Kind regards, Luc
pgsql-hackers by date: