Re: Parallel copy - Mailing list pgsql-hackers
| From | vignesh C |
|---|---|
| Subject | Re: Parallel copy |
| Date | |
| Msg-id | CALDaNm1cAONkFDN6K72DSiRpgqNGvwxQL7TjEiHZ58opnp9VoA@mail.gmail.com Whole thread Raw |
| In response to | Re: Parallel copy (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
| Responses |
RE: Parallel copy
Re: Parallel copy Re: Parallel copy Re: Parallel copy Re: Parallel copy |
| List | pgsql-hackers |
Thanks for the comments, please find my thoughts below.
On Wed, Oct 21, 2020 at 3:19 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi Vignesh,
>
> I took a look at the v8 patch set. Here are some comments:
>
> 1. PopulateCommonCstateInfo() -- can we use PopulateCommonCStateInfo()
> or PopulateCopyStateInfo()? And also EstimateCstateSize() --
> EstimateCStateSize(), PopulateCstateCatalogInfo() --
> PopulateCStateCatalogInfo()?
>
Changed as suggested.
> 2. Instead of mentioning numbers like 1024, 64K, 10240 in the
> comments, can we represent them in terms of macros?
> /* It can hold 1024 blocks of 64K data in DSM to be processed by the worker. */
> #define MAX_BLOCKS_COUNT 1024
> /*
> * It can hold upto 10240 record information for worker to process. RINGSIZE
>
Changed as suggested.
> 3. How about
> "
> Each worker at once will pick the WORKER_CHUNK_COUNT records from the
> DSM data blocks and store them in it's local memory.
> This is to make workers not contend much while getting record
> information from the DSM. Read RINGSIZE comments before
> changing this value.
> "
> instead of
> /*
> * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
> * block to process to avoid lock contention. Read RINGSIZE comments before
> * changing this value.
> */
>
Rephrased it.
> 4. How about one line gap before and after for comments: "Leader
> should operate in the following order:" and "Worker should operate in
> the following order:"
>
Changed it.
> 5. Can we move RAW_BUF_BYTES macro definition to the beginning of the
> copy.h where all the macro are defined?
>
Change was done as part of another commit & we are using as it is. I
preferred it to be as it is.
> 6. I don't think we need the change in toast_internals.c with the
> temporary hack Assert(!(IsParallelWorker() && !currentCommandIdUsed));
> in GetCurrentCommandId()
>
Modified it.
> 7. I think
> /* Can't perform copy in parallel */
> if (parallel_workers <= 0)
> return NULL;
> can be
> /* Can't perform copy in parallel */
> if (parallel_workers == 0)
> return NULL;
> as parallel_workers can never be < 0 since we enter BeginParallelCopy
> only if cstate->nworkers > 0 and also we are not allowed to have
> negative values for max_worker_processes.
>
Modified it.
> 8. Do we want to pfree(cstate->pcdata) in case we failed to start any
> parallel workers, we would have allocated a good
> else
> {
> /*
> * Reset nworkers to -1 here. This is useful in cases where user
> * specifies parallel workers, but, no worker is picked up, so go
> * back to non parallel mode value of nworkers.
> */
> cstate->nworkers = -1;
> *processed = CopyFrom(cstate); /* copy from file to database */
> }
>
Added pfree.
> 9. Instead of calling CopyStringToSharedMemory() for each string
> variable, can't we just create a linked list of all the strings that
> need to be copied into shm and call CopyStringToSharedMemory() only
> once? We could avoid 5 function calls?
>
I feel keeping it this way makes the code more readable, and also this
is not in a performance intensive tight loop. I'm retaining the
change as is unless we feel this will make an impact.
> 10. Similar to above comment: can we fill all the required
> cstate->variables inside the function CopyNodeFromSharedMemory() and
> call it only once? In each worker we could save overhead of 5 function
> calls.
>
same as above.
> 11. Looks like CopyStringFromSharedMemory() and
> CopyNodeFromSharedMemory() do almost the same things except
> stringToNode() and pfree(destptr);. Can we have a generic function
> CopyFromSharedMemory() or something else and handle with flag "bool
> isnode" to differentiate the two use cases?
>
Removed CopyStringFromSharedMemory & used CopyNodeFromSharedMemory
appropriately. CopyNodeFromSharedMemory is renamed to
RestoreNodeFromSharedMemory keep the name consistent.
> 12. Can we move below check to the end in IsParallelCopyAllowed()?
> /* Check parallel safety of the trigger functions. */
> if (cstate->rel->trigdesc != NULL &&
> !CheckRelTrigFunParallelSafety(cstate->rel->trigdesc))
> return false;
>
Modified.
> 13. CacheLineInfo(): Instead of goto empty_data_line_update; how about
> having this directly inside the if block as it's being used only once?
>
Have removed the goto by using a macro.
> 14. GetWorkerLine(): How about avoiding goto statements and replacing
> the common code with a always static inline function or a macro?
>
Have removed the goto by using a macro.
> 15. UpdateSharedLineInfo(): Below line is misaligned.
> lineInfo->first_block = blk_pos;
> lineInfo->start_offset = offset;
>
Changed it.
> 16. ParallelCopyFrom(): Do we need CHECK_FOR_INTERRUPTS(); at the
> start of for (;;)?
>
Added it.
> 17. Remove extra lines after #define IsHeaderLine()
> (cstate->header_line && cstate->cur_lineno == 1) in copy.h
>
Modified it.
Attached v9 patches have the fixes for the above comments.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: