Re: Problems with plan estimates in postgres_fdw - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: Problems with plan estimates in postgres_fdw |
Date | |
Msg-id | 5C669DFD.30002@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Problems with plan estimates in postgres_fdw (Antonin Houska <ah@cybertec.at>) |
Responses |
Re: Problems with plan estimates in postgres_fdw
|
List | pgsql-hackers |
(2019/02/12 18:03), Antonin Houska wrote: > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >> (2019/02/08 2:04), Antonin Houska wrote: >>> Some comments on coding: >>> >>> 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch >>> ----------------------------------------------------------------- >>> >>> * I'm not sure if UPPERREL_ORDERD should to deal with LIMIT at all. Note that >>> grouping_planner() does not call create_limit_path() until it's done with >>> create_ordered_paths(), and create_ordered_paths() is where the FDW is >>> requested to add its paths to UPPERREL_ORDERED relation. >> >> Maybe I'm missing something, but the 0001 patch doesn't deal with LIMIT at >> all. I added the parameter limit_tuples to PgFdwPathExtraData so that we can >> pass limit_tuples=-1, which means no LIMIT, to cost_sort(), though. > > Yes, the limit_tuples field made me confused. But if cost_sort() is the only > reason, why not simply pass -1 in the 0001 patch, and use the limit_tuples > argument only in 0002? I see several other calls of cost_sort() in the core > where -1 is hard-coded. Yeah, that would make it easy to review the patch; will do. >>> Both patches: >>> ------------ >>> >>> One thing that makes me confused when I try to understand details is that the >>> new functions add_foreign_ordered_paths() and add_foreign_final_paths() both >>> pass "input_rel" for the "foreignrel" argument of estimate_path_cost_size(): >>> >>> estimate_path_cost_size(root, input_rel, ...) >>> >>> whereas the existing add_foreign_grouping_paths() passes "grouped_rel": >>> >>> estimate_path_cost_size(root, grouped_rel, ...) >>> >>> Can you please make this consistent? If you do, you'll probably need to >>> reconsider your changes to estimate_path_cost_size(). >> >> This is intended and I think I should add more comments on that. Let me >> explain. These patches modified estimate_path_cost_size() so that we can >> estimate the costs of a foreign path for the UPPERREL_ORDERED or >> UPPERREL_FINAL rel, by adding the costs of the upper-relation processing (ie, >> ORDER BY and/or LIMIT/OFFSET, specified by PgFdwPathExtraData) to the costs of >> implementing the *underlying* relation for the upper relation (ie, scan, join >> or grouping rel, specified by the input_rel). So those functions call >> estimate_path_cost_size() with the input_rel, not the ordered_rel/final_rel, >> along with PgFdwPathExtraData. > > I think the same already happens for the UPPERREL_GROUP_AGG relation: > estimate_path_cost_size() > > ... > else if (IS_UPPER_REL(foreignrel)) > { > ... > ofpinfo = (PgFdwRelationInfo *) fpinfo->outerrel->fdw_private; > > Here "outerrel" actually means input relation from the grouping perspective. > The input relation (i.e. result of scan / join) estimates are retrieved from > "ofpinfo" and the costs of aggregation are added. Why should this be different > for other kinds of upper relations? IIUC, I think there is an oversight in that case: the cost estimation doesn't account for evaluating the final scan/join target updated by apply_scanjoin_target_to_paths(), but I think that is another issue, so I'll start a new thread. >>> And maybe related problem: why should FDW care about the effect of >>> apply_scanjoin_target_to_paths() like you claim to do here? >>> >>> /* >>> * If this is UPPERREL_ORDERED and/or UPPERREL_FINAL steps on the >>> * final scan/join relation, the costs obtained from the cache >>> * wouldn't yet contain the eval cost for the final scan/join target, >>> * which would have been updated by apply_scanjoin_target_to_paths(); >>> * add the eval cost now. >>> */ >>> if (fpextra&& !IS_UPPER_REL(foreignrel)) >>> { >>> /* The costs should have been obtained from the cache. */ >>> Assert(fpinfo->rel_startup_cost>= 0&& >>> fpinfo->rel_total_cost>= 0); >>> >>> startup_cost += foreignrel->reltarget->cost.startup; >>> run_cost += foreignrel->reltarget->cost.per_tuple * rows; >>> } >>> >>> I think it should not, whether "foreignrel" means "input_rel" or "output_rel" >>> from the perspective of postgresGetForeignUpperPaths(). >> >> I added this before costing the sort operation below, because 1) the cost of >> evaluating the scan/join target might not be zero (consider the case where >> sort columns are not simple Vars, for example) and 2) the cost of sorting >> takes into account the underlying path's startup/total costs. Maybe I'm >> missing something, though. > > My understanding is that the "input_rel" argument of > postgresGetForeignUpperPaths() should already have been processed by > postgresGetForeignPaths() or postgresGetForeignJoinPaths() (or actually by > postgresGetForeignUpperPaths() called with different "stage" too). Therefore > it should already have the "fdw_private" field initialized. The estimates in > the corresponding PgFdwRelationInfo should already include the reltarget > costs, as set previously by estimate_path_cost_size(). Right, but what I'm saying here is: the costs of evaluating the final scan/join target updated by apply_scanjoin_target_to_paths() wouldn't be yet included in the costs we calculated using local statistics when called from postgresGetForeignPaths() or postgresGetForeignJoinPaths(). Let me explain using an example: SELECT a+b FROM foreign_table LIMIT 10; For this query, the reltarget for the foreign_table would be {a, b} when called from postgresGetForeignPaths() (see build_base_rel_tlists()). The costs of evaluating simple Vars are zeroes, so we wouldn't charge any costs for tlist evaluation when estimating the costs of a basic foreign table scan in that callback routine, but the final scan target, with which the reltarget will be replaced later by apply_scanjoin_target_to_paths(), would be {a+b}, so we need to adjust the costs of the basic foreign table scan, cached in the PgFdwRelationInfo for the input_rel (ie, the foreign_table), so that eval costs for the replaced tlist are included when called from postgresGetForeignUpperPaths() with the UPPERREL_FINAL stage, as the costs of evaluating the expression 'a+b' wouldn't be zeroes. Best regards, Etsuro Fujita
pgsql-hackers by date: