Re: Parallel copy - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Parallel copy |
Date | |
Msg-id | CALDaNm1cAONkFDN6K72DSiRpgqNGvwxQL7TjEiHZ58opnp9VoA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel copy (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
RE: Parallel copy
Re: Parallel copy Re: Parallel copy Re: Parallel copy Re: Parallel copy |
List | pgsql-hackers |
Thanks for the comments, please find my thoughts below. On Wed, Oct 21, 2020 at 3:19 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Hi Vignesh, > > I took a look at the v8 patch set. Here are some comments: > > 1. PopulateCommonCstateInfo() -- can we use PopulateCommonCStateInfo() > or PopulateCopyStateInfo()? And also EstimateCstateSize() -- > EstimateCStateSize(), PopulateCstateCatalogInfo() -- > PopulateCStateCatalogInfo()? > Changed as suggested. > 2. Instead of mentioning numbers like 1024, 64K, 10240 in the > comments, can we represent them in terms of macros? > /* It can hold 1024 blocks of 64K data in DSM to be processed by the worker. */ > #define MAX_BLOCKS_COUNT 1024 > /* > * It can hold upto 10240 record information for worker to process. RINGSIZE > Changed as suggested. > 3. How about > " > Each worker at once will pick the WORKER_CHUNK_COUNT records from the > DSM data blocks and store them in it's local memory. > This is to make workers not contend much while getting record > information from the DSM. Read RINGSIZE comments before > changing this value. > " > instead of > /* > * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data > * block to process to avoid lock contention. Read RINGSIZE comments before > * changing this value. > */ > Rephrased it. > 4. How about one line gap before and after for comments: "Leader > should operate in the following order:" and "Worker should operate in > the following order:" > Changed it. > 5. Can we move RAW_BUF_BYTES macro definition to the beginning of the > copy.h where all the macro are defined? > Change was done as part of another commit & we are using as it is. I preferred it to be as it is. > 6. I don't think we need the change in toast_internals.c with the > temporary hack Assert(!(IsParallelWorker() && !currentCommandIdUsed)); > in GetCurrentCommandId() > Modified it. > 7. I think > /* Can't perform copy in parallel */ > if (parallel_workers <= 0) > return NULL; > can be > /* Can't perform copy in parallel */ > if (parallel_workers == 0) > return NULL; > as parallel_workers can never be < 0 since we enter BeginParallelCopy > only if cstate->nworkers > 0 and also we are not allowed to have > negative values for max_worker_processes. > Modified it. > 8. Do we want to pfree(cstate->pcdata) in case we failed to start any > parallel workers, we would have allocated a good > else > { > /* > * Reset nworkers to -1 here. This is useful in cases where user > * specifies parallel workers, but, no worker is picked up, so go > * back to non parallel mode value of nworkers. > */ > cstate->nworkers = -1; > *processed = CopyFrom(cstate); /* copy from file to database */ > } > Added pfree. > 9. Instead of calling CopyStringToSharedMemory() for each string > variable, can't we just create a linked list of all the strings that > need to be copied into shm and call CopyStringToSharedMemory() only > once? We could avoid 5 function calls? > I feel keeping it this way makes the code more readable, and also this is not in a performance intensive tight loop. I'm retaining the change as is unless we feel this will make an impact. > 10. Similar to above comment: can we fill all the required > cstate->variables inside the function CopyNodeFromSharedMemory() and > call it only once? In each worker we could save overhead of 5 function > calls. > same as above. > 11. Looks like CopyStringFromSharedMemory() and > CopyNodeFromSharedMemory() do almost the same things except > stringToNode() and pfree(destptr);. Can we have a generic function > CopyFromSharedMemory() or something else and handle with flag "bool > isnode" to differentiate the two use cases? > Removed CopyStringFromSharedMemory & used CopyNodeFromSharedMemory appropriately. CopyNodeFromSharedMemory is renamed to RestoreNodeFromSharedMemory keep the name consistent. > 12. Can we move below check to the end in IsParallelCopyAllowed()? > /* Check parallel safety of the trigger functions. */ > if (cstate->rel->trigdesc != NULL && > !CheckRelTrigFunParallelSafety(cstate->rel->trigdesc)) > return false; > Modified. > 13. CacheLineInfo(): Instead of goto empty_data_line_update; how about > having this directly inside the if block as it's being used only once? > Have removed the goto by using a macro. > 14. GetWorkerLine(): How about avoiding goto statements and replacing > the common code with a always static inline function or a macro? > Have removed the goto by using a macro. > 15. UpdateSharedLineInfo(): Below line is misaligned. > lineInfo->first_block = blk_pos; > lineInfo->start_offset = offset; > Changed it. > 16. ParallelCopyFrom(): Do we need CHECK_FOR_INTERRUPTS(); at the > start of for (;;)? > Added it. > 17. Remove extra lines after #define IsHeaderLine() > (cstate->header_line && cstate->cur_lineno == 1) in copy.h > Modified it. Attached v9 patches have the fixes for the above comments. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: