Re: GROUP BY ROLLUP queries on views trigger full table scans (index usage not optimized) - Mailing list pgsql-bugs

From Richard Guo
Subject Re: GROUP BY ROLLUP queries on views trigger full table scans (index usage not optimized)
Date
Msg-id CAMbWs4-99AsnnvfTuGj9C_QHXB9aNhjNyhfxXEcOWeAeq=Og5A@mail.gmail.com
Whole thread Raw
In response to Re: GROUP BY ROLLUP queries on views trigger full table scans (index usage not optimized)  ("Haowu Ge" <gehaowu@bitmoe.com>)
Responses Re: GROUP BY ROLLUP queries on views trigger full table scans (index usage not optimized)
List pgsql-bugs
On Thu, Dec 11, 2025 at 11:32 AM Haowu Ge <gehaowu@bitmoe.com> wrote:
> Based on the "Minimal Reproducible Example" principle,
> I previously created a simplified version of the example,
> which you successfully fixed. However, during actual testing,
> I found that the behavior does not meet expectations.
> Specifically, I observed inconsistent index behavior when using a view compared to querying the underlying table
directly.

This seems to be a separate issue.  The query against the view chooses
different indexes than the query against the underlying table.  This
isn't the index capability issue you initially reported, but rather a
cost estimation issue.

Upon closer look, the issue is that the planner fails to look through
PHVs when looking up statistical data.  Since examine_variable()
relies on strict structural matching, the presence of PHVs prevents
the planner from matching the expression to the table columns.  As a
result, the view-based query falls back to default selectivity
estimates, leading to a poor plan choice.

To fix, we can strip all PHVs when looking up statistical data.  This
is safe during estimation because PHVs are transparent for the purpose
of statistics lookup; they do not alter the value distribution of the
underlying expression.

Regarding implementation, I considered reusing the stripper for index
operands in 0001 but decided to use a dedicated function.  While the
traversal structure is similar, the logic is fundamentally different.
Stripping PHVs for index matching requires strict checks on
phnullingrels and phexpr to ensure execution correctness.  For
statistics lookup, however, we can be permissive and unconditionally
strip all PHVs.  It could be argued that we can combine them into a
single function using flags, but that seems to unnecessarily entangle
two unrelated logics.

There is one plan change in the regression tests with 0002, but that
is precisely because the row count estimates become more accurate with
this patch.  For instance:

on master:
->  Seq Scan on parttbl1 parttbl  (cost=0.00..41.88 rows=13 width=8)
                          (actual time=0.034..0.036 rows=1.00 loops=1)


on patched:
->  Seq Scan on parttbl1 parttbl  (cost=0.00..41.88 rows=1 width=8)
                          (actual time=0.062..0.064 rows=1.00 loops=1)

- Richard

Attachment

pgsql-bugs by date:

Previous
From: Kirill Reshke
Date:
Subject: Re: BUG #19352: SQL Error messages do not include schema name along with table or data object name.
Next
From: PG Bug reporting form
Date:
Subject: BUG #19353: Error XX000 if referencing expanded array in grouping set: variable not found in subplan target list