Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | CALDaNm2GK8+0uicSvP3hKYqREymNus5GoxtwhvfTx7Mx2O=qWA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel INSERT (INTO ... SELECT ...) (Greg Nancarrow <gregn4422@gmail.com>) |
Responses |
Re: Parallel INSERT (INTO ... SELECT ...)
|
List | pgsql-hackers |
>
> See attached patches.
>
Thanks for providing the patches.
I had reviewed v6-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch, please find my comments:
-> commandType is not used, we can remove it.
+ * Prepare for entering parallel mode by assigning a FullTransactionId, to be
+ * included in the transaction state that is serialized in the parallel DSM.
+ */
+void PrepareParallelModeForModify(CmdType commandType)
+{
+ Assert(!IsInParallelMode());
+
+ (void)GetCurrentTransactionId();
+}
-> As we support insertion of data from the workers, this comments "but as of now, only the leader backend writes into a completely new table. In the future, we can extend it to allow workers to write into the table" must be updated accordingly:
+ * modify any data using a CTE, or if this is a cursor operation, or if
+ * GUCs are set to values that don't permit parallelism, or if
+ * parallel-unsafe functions are present in the query tree.
*
- * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
+ * (Note that we do allow CREATE TABLE AS, INSERT, SELECT INTO, and CREATE
* MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
* backend writes into a completely new table. In the future, we can
* extend it to allow workers to write into the table. However, to allow
-> Also should we specify insert as "insert into select"
-> We could include a small writeup of the design may be in the commit message. It will be useful for review.
-> I felt the below two assignment statements can be in the else condition:
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+
+ /*
+ * Additional parallel-mode safety checks are required in order to
+ * allow an underlying parallel query to be used for a
+ * table-modification command that is supported in parallel-mode.
+ */
+ if (glob->parallelModeOK &&
+ IsModifySupportedInParallelMode(parse->commandType))
+ {
+ glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
+ glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+ }
something like:
/*
* Additional parallel-mode safety checks are required in order to
* allow an underlying parallel query to be used for a
* table-modification command that is supported in parallel-mode.
*/
if (glob->parallelModeOK &&
IsModifySupportedInParallelMode(parse->commandType))
glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
else
/* all the cheap tests pass, so scan the query tree */
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
-> Comments need slight adjustment, maybe you could run pgindent for the modified code.
+ /*
+ * Supported table-modification commands may require additional steps
+ * prior to entering parallel mode, such as assigning a FullTransactionId.
+ */
-> In the below, max_parallel_hazard_test will return true for PROPARALLEL_RESTRICTED also, Is break intentional in that case? As in case of RI_TRIGGER_FK for PROPARALLEL_RESTRICTED we continue.
+ if (max_parallel_hazard_test(func_parallel(trigger->tgfoid), context))
+ break;
+
+ /*
+ * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
+ * the relation, and this would result in creation of new CommandIds
+ * on insert/update/delete and this isn't supported in a parallel
+ * worker (but is safe in the parallel leader).
+ */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_FK)
+ {
+ context->max_hazard = PROPARALLEL_RESTRICTED;
+ /*
+ * As we're looking for the max parallel hazard, we don't break
+ * here; examine any further triggers ...
+ */
+ }
-> Should we switch to non-parallel mode in this case, instead of throwing error?
+ val = SysCacheGetAttr(CONSTROID, tup,
+ Anum_pg_constraint_conbin, &isnull);
+ if (isnull)
+ elog(ERROR, "null conbin for constraint %u", con->oid);
+ conbin = TextDatumGetCString(val);
-> We could include a few tests for this in regression.
-> We might need some documentation update like in parallel-query.html/parallel-plans.html, etc
+ * modify any data using a CTE, or if this is a cursor operation, or if
+ * GUCs are set to values that don't permit parallelism, or if
+ * parallel-unsafe functions are present in the query tree.
*
- * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
+ * (Note that we do allow CREATE TABLE AS, INSERT, SELECT INTO, and CREATE
* MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
* backend writes into a completely new table. In the future, we can
* extend it to allow workers to write into the table. However, to allow
-> Also should we specify insert as "insert into select"
-> We could include a small writeup of the design may be in the commit message. It will be useful for review.
-> I felt the below two assignment statements can be in the else condition:
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+
+ /*
+ * Additional parallel-mode safety checks are required in order to
+ * allow an underlying parallel query to be used for a
+ * table-modification command that is supported in parallel-mode.
+ */
+ if (glob->parallelModeOK &&
+ IsModifySupportedInParallelMode(parse->commandType))
+ {
+ glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
+ glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+ }
something like:
/*
* Additional parallel-mode safety checks are required in order to
* allow an underlying parallel query to be used for a
* table-modification command that is supported in parallel-mode.
*/
if (glob->parallelModeOK &&
IsModifySupportedInParallelMode(parse->commandType))
glob->maxParallelHazard = MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
else
/* all the cheap tests pass, so scan the query tree */
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
-> Comments need slight adjustment, maybe you could run pgindent for the modified code.
+ /*
+ * Supported table-modification commands may require additional steps
+ * prior to entering parallel mode, such as assigning a FullTransactionId.
+ */
-> In the below, max_parallel_hazard_test will return true for PROPARALLEL_RESTRICTED also, Is break intentional in that case? As in case of RI_TRIGGER_FK for PROPARALLEL_RESTRICTED we continue.
+ if (max_parallel_hazard_test(func_parallel(trigger->tgfoid), context))
+ break;
+
+ /*
+ * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
+ * the relation, and this would result in creation of new CommandIds
+ * on insert/update/delete and this isn't supported in a parallel
+ * worker (but is safe in the parallel leader).
+ */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_FK)
+ {
+ context->max_hazard = PROPARALLEL_RESTRICTED;
+ /*
+ * As we're looking for the max parallel hazard, we don't break
+ * here; examine any further triggers ...
+ */
+ }
-> Should we switch to non-parallel mode in this case, instead of throwing error?
+ val = SysCacheGetAttr(CONSTROID, tup,
+ Anum_pg_constraint_conbin, &isnull);
+ if (isnull)
+ elog(ERROR, "null conbin for constraint %u", con->oid);
+ conbin = TextDatumGetCString(val);
-> We could include a few tests for this in regression.
-> We might need some documentation update like in parallel-query.html/parallel-plans.html, etc
pgsql-hackers by date: