Home > mailing lists

Re: Parallel copy - Mailing list pgsql-hackers

From	vignesh C
Subject	Re: Parallel copy
Date	October 27, 2020 13:36:15
Msg-id	CALDaNm1cAONkFDN6K72DSiRpgqNGvwxQL7TjEiHZ58opnp9VoA@mail.gmail.com Whole thread Raw
In response to	Re: Parallel copy (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses	RE: Parallel copy Re: Parallel copy Re: Parallel copy Re: Parallel copy Re: Parallel copy
List	pgsql-hackers

Tree view

Thanks for the comments, please find my thoughts below.
On Wed, Oct 21, 2020 at 3:19 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi Vignesh,
>
> I took a look at the v8 patch set. Here are some comments:
>
> 1. PopulateCommonCstateInfo() -- can we use PopulateCommonCStateInfo()
> or PopulateCopyStateInfo()? And also EstimateCstateSize() --
> EstimateCStateSize(), PopulateCstateCatalogInfo() --
> PopulateCStateCatalogInfo()?
>

Changed as suggested.

> 2. Instead of mentioning numbers like 1024, 64K, 10240 in the
> comments, can we represent them in terms of macros?
> /* It can hold 1024 blocks of 64K data in DSM to be processed by the worker. */
> #define MAX_BLOCKS_COUNT 1024
> /*
>  * It can hold upto 10240 record information for worker to process. RINGSIZE
>

Changed as suggested.

> 3. How about
> "
> Each worker at once will pick the WORKER_CHUNK_COUNT records from the
> DSM data blocks and store them in it's local memory.
> This is to make workers not contend much while getting record
> information from the DSM. Read RINGSIZE comments before
>  changing this value.
> "
> instead of
> /*
>  * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
>  * block to process to avoid lock contention. Read RINGSIZE comments before
>  * changing this value.
>  */
>

Rephrased it.

> 4.  How about one line gap before and after for comments: "Leader
> should operate in the following order:" and "Worker should operate in
> the following order:"
>

Changed it.

> 5. Can we move RAW_BUF_BYTES macro definition to the beginning of the
> copy.h where all the macro are defined?
>

Change was done as part of another commit & we are using as it is. I
preferred it to be as it is.

> 6. I don't think we need the change in toast_internals.c with the
> temporary hack Assert(!(IsParallelWorker() && !currentCommandIdUsed));
> in GetCurrentCommandId()
>

Modified it.

> 7. I think
>     /* Can't perform copy in parallel */
>     if (parallel_workers <= 0)
>         return NULL;
> can be
>     /* Can't perform copy in parallel */
>     if (parallel_workers == 0)
>         return NULL;
> as parallel_workers can never be < 0 since we enter BeginParallelCopy
> only if cstate->nworkers > 0 and also we are not allowed to have
> negative values for max_worker_processes.
>

Modified it.

> 8. Do we want to pfree(cstate->pcdata) in case we failed to start any
> parallel workers, we would have allocated a good
>     else
>         {
>             /*
>              * Reset nworkers to -1 here. This is useful in cases where user
>              * specifies parallel workers, but, no worker is picked up, so go
>              * back to non parallel mode value of nworkers.
>              */
>             cstate->nworkers = -1;
>             *processed = CopyFrom(cstate);    /* copy from file to database */
>         }
>

Added pfree.

> 9. Instead of calling CopyStringToSharedMemory() for each string
> variable, can't we just create a linked list of all the strings that
> need to be copied into shm and call CopyStringToSharedMemory() only
> once? We could avoid 5 function calls?
>

I feel keeping it this way makes the code more readable, and also this
is not in a performance intensive tight loop. I'm  retaining the
change as is unless we feel this will make an impact.

> 10. Similar to above comment: can we fill all the required
> cstate->variables inside the function CopyNodeFromSharedMemory() and
> call it only once? In each worker we could save overhead of 5 function
> calls.
>

same as above.

> 11. Looks like CopyStringFromSharedMemory() and
> CopyNodeFromSharedMemory() do almost the same things except
> stringToNode() and pfree(destptr);. Can we have a generic function
> CopyFromSharedMemory() or something else and handle with flag "bool
> isnode" to differentiate the two use cases?
>

Removed CopyStringFromSharedMemory & used CopyNodeFromSharedMemory
appropriately. CopyNodeFromSharedMemory is renamed to
RestoreNodeFromSharedMemory keep the name consistent.

> 12. Can we move below check to the end in IsParallelCopyAllowed()?
>     /* Check parallel safety of the trigger functions. */
>     if (cstate->rel->trigdesc != NULL &&
>         !CheckRelTrigFunParallelSafety(cstate->rel->trigdesc))
>         return false;
>

Modified.

> 13. CacheLineInfo(): Instead of goto empty_data_line_update; how about
> having this directly inside the if block as it's being used only once?
>

Have removed the goto by using a macro.

> 14. GetWorkerLine(): How about avoiding goto statements and replacing
> the common code with a always static inline function or a macro?
>

Have removed the goto by using a macro.

> 15. UpdateSharedLineInfo(): Below line is misaligned.
>                 lineInfo->first_block = blk_pos;
>         lineInfo->start_offset = offset;
>

Changed it.

> 16. ParallelCopyFrom(): Do we need CHECK_FOR_INTERRUPTS(); at the
> start of  for (;;)?
>

Added it.

> 17. Remove extra lines after #define IsHeaderLine()
> (cstate->header_line && cstate->cur_lineno == 1) in copy.h
>

Modified it.

Attached v9 patches have the fixes for the above comments.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 27 October 2020, 13:23:00
Subject: Re: partition routing layering in nodeModifyTable.c

From: Peter Eisentraut
Date: 27 October 2020, 13:45:13
Subject: Re: SQL-standard function body

Re: Parallel copy - Mailing list pgsql-hackers

Attachment

Previous

Next