Re: [HACKERS] why not parallel seq scan for slow functions - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [HACKERS] why not parallel seq scan for slow functions |
Date | |
Msg-id | CAFjFpReHQ4xaPYaaW82kdUDWfsUtoKn-1Z_GBLNnOU1a8UR8gg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] why not parallel seq scan for slow functions (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [HACKERS] why not parallel seq scan for slow functions
Re: [HACKERS] why not parallel seq scan for slow functions |
List | pgsql-hackers |
I happened to look at the patch for something else. But here are some comments. If any of those have been already discussed, please feel free to ignore. I have gone through the thread cursorily, so I might have missed something. In grouping_planner() we call query_planner() first which builds the join tree and creates paths, then calculate the target for scan/join rel which is applied on the topmost scan join rel. I am wondering whether we can reverse this order to calculate the target list of scan/join relation and pass it to standard_join_search() (or the hook) through query_planner(). standard_join_search() would then set this as reltarget of the topmost relation and every path created for it will have that target, applying projection if needed. This way we avoid calling generate_gather_path() at two places. Right now generate_gather_path() seems to be the only thing benefitting from this but what about FDWs and custom paths whose costs may change when targetlist changes. For custom paths I am considering GPU optimization paths. Also this might address Tom's worry, "But having to do that in a normal code path suggests that something's not right about the design ... " Here are some comments on the patch. + /* + * Except for the topmost scan/join rel, consider gathering + * partial paths. We'll do the same for the topmost scan/join This function only works on join relations. Mentioning scan rel is confusing. index 6e842f9..5206da7 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -481,14 +481,21 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, } + * + * Also, if this is the topmost scan/join rel (that is, the only baserel), + * we postpone this until the final scan/join targelist is available (see Mentioning join rel here is confusing since we deal with base relations here. + bms_membership(root->all_baserels) != BMS_SINGLETON) set_tablesample_rel_pathlist() is also using this method to decide whether there are any joins in the query. May be macro-ize this and use that macro at these two places? - * for the specified relation. (Otherwise, add_partial_path might delete a + * for the specified relation. (Otherwise, add_partial_path might delete a Unrelated change? + /* Add projection step if needed */ + if (target && simple_gather_path->pathtarget != target) If the target was copied someplace, this test will fail. Probably we want to check containts of the PathTarget structure? Right now copy_pathtarget() is not called from many places and all those places modify the copied target. So this isn't a problem. But we can't guarantee that in future. Similar comment for gather_merge path creation. + simple_gather_path = apply_projection_to_path(root, + rel, + simple_gather_path, + target); + Why don't we incorporate those changes in create_gather_path() by passing it the projection target instead of rel->reltarget? Similar comment for gather_merge path creation. + /* + * Except for the topmost scan/join rel, consider gathering + * partial paths. We'll do the same for the topmost scan/join rel Mentioning scan rel is confusing here. deallocate tenk1_count; +explain (costs off) select ten, costly_func(ten) from tenk1; verbose output will show that the parallel seq scan's targetlist has costly_func() in it. Isn't that what we want to test here? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: