Thread: Tuple count used while costing MergeAppend and that for an append rel
Hi, In create_merge_append_path() we have following code 1331 1332 /* Now we can compute total costs of the MergeAppend */ 1333 cost_merge_append(&pathnode->path, root, 1334 pathkeys, list_length(subpaths), 1335 input_startup_cost, input_total_cost, 1336 rel->tuples); 1337 The last arguemnt to cost_merge_append() is described as 'tuples' is the number of tuples in all the streams For an append relation, set_append_rel_size() sets rel->tuples to the sum of rows output by each child i.e. sum of rel->rows from each child. 1091 rel->rows = parent_rows; 1092 rel->reltarget->width = rint(parent_size / parent_rows); 1093 for (i = 0; i < nattrs; i++) 1094 rel->attr_widths[i] = rint(parent_attrsizes[i] / parent_rows); 1095 1096 /* 1097 * Set "raw tuples" count equal to "rows" for the appendrel; needed 1098 * because some places assume rel->tuples is valid for any baserel. 1099 */ 1100 rel->tuples = parent_rows; AFAIU, There's difference in the tuples and rows members of RelOptInfo. While tuples gives the estimate of number of tuples per pg_class, rows gives the number of rows output by that relation. From that perspective rel->tuples should be set of the sum of child_rel->tuples from each of the children. If we do that the last argument to cost_merge_append() should change from rel->tuples to rel->rows. Does that make sense? Attached patch makes those changes. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: > AFAIU, There's difference in the tuples and rows members of > RelOptInfo. While tuples gives the estimate of number of tuples per > pg_class, rows gives the number of rows output by that relation. From > that perspective rel->tuples should be set of the sum of > child_rel->tuples from each of the children. If we do that the last > argument to cost_merge_append() should change from rel->tuples to > rel->rows. > Does that make sense? Attached patch makes those changes. AFAICS, what you propose to add in set_append_rel_size is pure overhead. There's no conceivable use to computing sum-of-raw-tuple-counts for an appendrel ... or at least, if there is, you didn't say what you expect it would be good for. Normally the difference between rel->tuples and rel->rows corresponds to the selectivity of the rel's restriction clauses. Since an appendrel has no restrictions of its own (they've all been pushed down to the child rels) it doesn't seem unreasonable for it to have tuples equal to rows. While we could also define it as you suggest, I don't see the point of expending extra cycles to do so. I concur that using path->rows not rel->tuples in create_merge_append_path would be an improvement. I think it makes no difference now, but if we were to support parameterized mergeappend paths, rel->rows and rel->tuples would both be wrong for such a path. (I believe the possibility of a parameterized path is the reason why create_append_path computes path->rows the hard way instead of just copying it from rel->rows. The logic in create_merge_append_path was probably just copied from create_append_path; it's overkill today but might not be so forever.) regards, tom lane
> > AFAICS, what you propose to add in set_append_rel_size is pure overhead. > There's no conceivable use to computing sum-of-raw-tuple-counts for an > appendrel ... or at least, if there is, you didn't say what you expect > it would be good for. Normally the difference between rel->tuples and > rel->rows corresponds to the selectivity of the rel's restriction clauses. > Since an appendrel has no restrictions of its own (they've all been pushed > down to the child rels) it doesn't seem unreasonable for it to have tuples > equal to rows. While we could also define it as you suggest, I don't see > the point of expending extra cycles to do so. > I suggested that change, just to keep the computation consistent with the comment in RelOptInfo. I agree that it's a pure overhead for now. If we were to maintain it per the comments, we may be able to find the average selectivity for an appendrel, but again, I do not see any real purpose in doing so. > I concur that using path->rows not rel->tuples in create_merge_append_path > would be an improvement. I think it makes no difference now, but if we > were to support parameterized mergeappend paths, rel->rows and rel->tuples > would both be wrong for such a path. (I believe the possibility of a > parameterized path is the reason why create_append_path computes > path->rows the hard way instead of just copying it from rel->rows. > The logic in create_merge_append_path was probably just copied from > create_append_path; it's overkill today but might not be so forever.) Yes, this code did puzzle me. add_path() does not allow paths to have both parameterization and pathkeys. So, a merge append on parameterized paths does look odd. But then, I thought, that producing a parameterized merge append would get allow merging ordered rows for a given value of parameter. Although we don't do that now, but may be it's useful in the future. Thanks a lot for your explanation. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company