Thread: Bad selectivity estimate when using a sub query to determine WHERE condition

Bad selectivity estimate when using a sub query to determine WHERE condition

From

Chris Borckholder

Date:

10 February 2020, 10:34:01

Hi,

I have a large table of immutable events that need to be aggregated regularly to derive statistics. To improve the performance, that table is rolled up every 15minutes, so that online checks can aggregate rolled up data and combine it with latest events created after the last roll up.

To implement this a query is executed that selects only events after the time of the last rollup.

That time is determined dynamically based on a log table.

When using a sub select or CTE to get the latest roll up time, the query planner fails to recognize that a most of the large table would be filtered out by the condition and tries a sequential scan instead of an index scan.

When using the literal value for the WHERE condition, the plan correctly uses an index scan, which is much faster.

I analyzed the involved tables and increased the collected histogram, but the query plan did not improve. Is there a way to help the query planner recognize this in the dynamic case?

Best Regards

Chris

==== Original query with a CTE to get the timestamp to filter on

https://explain.depesz.com/s/Hsix

EXPLAIN (ANALYZE, BUFFERS) WITH current_rollup AS (
SELECT COALESCE(MAX(window_end), '-infinity') AS cutoff
FROM exchange.ledger_zerosum_rollup
)
SELECT *
FROM exchange.ledger
WHERE created > (SELECT cutoff FROM current_rollup);

==== Query with literal value

https://explain.depesz.com/s/ULAq

EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM exchange.ledger
WHERE created > '2020-02-10T08:54:39.857789Z';

Re: Bad selectivity estimate when using a sub query to determine WHERE condition

From

Tom Lane

Date:

10 February 2020, 15:39:31

Chris Borckholder <chris.borckholder@bitpanda.com> writes:
> When using a sub select or CTE to get the latest roll up time, the query
> planner fails to recognize that a most of the large table would be filtered
> out by the condition and tries a sequential scan instead of an index scan.
> When using the literal value for the WHERE condition, the plan correctly
> uses an index scan, which is much faster.

Yeah, a scalar sub-select is pretty much a black box to the planner.

> EXPLAIN (ANALYZE, BUFFERS) WITH current_rollup AS (
>     SELECT COALESCE(MAX(window_end), '-infinity') AS cutoff
>     FROM exchange.ledger_zerosum_rollup
> )
> SELECT *
> FROM exchange.ledger
> WHERE created > (SELECT cutoff FROM current_rollup);

Well, it's not that hard to get rid of that scalar sub-select: since
you're already relying on current_rollup to produce exactly one row,
you could write a plain join instead, something like

WITH current_rollup AS ...
SELECT l.*
FROM exchange.ledger l, current_rollup c
WHERE l.created > c.cutoff;

Unfortunately I doubt that will improve matters much, since the
planner also knows relatively little about MAX() and nothing about
COALESCE, so it's not going to be able to estimate what comes out
of the WITH.  I think you're going to have to cheat a bit.

The form of cheating that comes to mind is to wrap the sub-select
in a function that's marked STABLE:

create function current_rollup_cutoff() returns timestamp -- or whatever
stable language sql as $$
SELECT COALESCE(MAX(window_end), '-infinity') AS cutoff
FROM exchange.ledger_zerosum_rollup
$$;

SELECT *
FROM exchange.ledger
WHERE created > current_rollup_cutoff();

I have not actually tried this, but I think that since the function is
marked stable, the planner would test-run it to get an estimated value,
and then produce a plan similar to what you'd get with a literal constant.

Of course, then it's going to run the function once more when the query is
executed for-real, so this approach doubles the cost of getting the MAX().
That shouldn't be too awful if you have an index on window_end, though.

If you like living dangerously, you could cheat a LOT and mark the
function immutable so that its value gets substituted at plan time.
But that will only work for interactive submission of the outer
query --- if the plan gets cached and re-used, you'll have a stale
cutoff value.  Personally I wouldn't risk that.

            regards, tom lane

Re: Bad selectivity estimate when using a sub query to determineWHERE condition

From

Justin Pryzby

Date:

10 February 2020, 19:13:04

On Mon, Feb 10, 2020 at 11:34:01AM +0100, Chris Borckholder wrote:
> I have a large table of immutable events that need to be aggregated
> regularly to derive statistics. To improve the performance, that table is
> rolled up every 15minutes, so that online checks can aggregate rolled up
> data and combine it with latest events created after the last roll up.
> 
> To implement this a query is executed that selects only events after the
> time of the last rollup.
> That time is determined dynamically based on a log table.

Perhaps that could be done as an indexed column in the large table, rather
than querying a 2nd log table.
Possibly with a partial index on that column: WHERE unprocessed='t'.

> When using a sub select or CTE to get the latest roll up time, the query
> planner fails to recognize that a most of the large table would be filtered
> out by the condition and tries a sequential scan instead of an index scan.
> When using the literal value for the WHERE condition, the plan correctly
> uses an index scan, which is much faster.
> 
> I analyzed the involved tables and increased the collected histogram, but
> the query plan did not improve. Is there a way to help the query planner
> recognize this in the dynamic case?

Also, if you used partitioning with pgostgres since v11, then I think most
partitions would be excluded:

https://www.postgresql.org/docs/12/release-12.html
|Allow partition elimination during query execution (David Rowley, Beena Emerson)
|Previously, partition elimination only happened at planning time, meaning many joins and prepared queries could not
usepartition elimination.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=499be013de65242235ebdde06adb08db887f0ea5

https://www.postgresql.org/about/featurematrix/detail/332/

Justin

Re: Bad selectivity estimate when using a sub query to determineWHERE condition

From

Chris Borckholder

Date:

12 February 2020, 08:09:25

Using a column to mark rolled up rows might have been a better choice, but there are unfortunately some regulatory requirements

that require that table to be immutable. I'm not sure about the implications w.r.t. auto vacuum, which is already a consideration for us due to the sheer size of the table.

I'm planning to partition the table as soon as we finish upgrading to v11.

Thanks for your insight!

Best Regards

Chris

On Mon, Feb 10, 2020 at 8:13 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Mon, Feb 10, 2020 at 11:34:01AM +0100, Chris Borckholder wrote:
> I have a large table of immutable events that need to be aggregated
> regularly to derive statistics. To improve the performance, that table is
> rolled up every 15minutes, so that online checks can aggregate rolled up
> data and combine it with latest events created after the last roll up.
>
> To implement this a query is executed that selects only events after the
> time of the last rollup.
> That time is determined dynamically based on a log table.

Perhaps that could be done as an indexed column in the large table, rather
than querying a 2nd log table.
Possibly with a partial index on that column: WHERE unprocessed='t'.

> When using a sub select or CTE to get the latest roll up time, the query
> planner fails to recognize that a most of the large table would be filtered
> out by the condition and tries a sequential scan instead of an index scan.
> When using the literal value for the WHERE condition, the plan correctly
> uses an index scan, which is much faster.
>
> I analyzed the involved tables and increased the collected histogram, but
> the query plan did not improve. Is there a way to help the query planner
> recognize this in the dynamic case?

Also, if you used partitioning with pgostgres since v11, then I think most
partitions would be excluded:

https://www.postgresql.org/docs/12/release-12.html
|Allow partition elimination during query execution (David Rowley, Beena Emerson)
|Previously, partition elimination only happened at planning time, meaning many joins and prepared queries could not use partition elimination.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=499be013de65242235ebdde06adb08db887f0ea5

https://www.postgresql.org/about/featurematrix/detail/332/

Justin

Re: Bad selectivity estimate when using a sub query to determineWHERE condition

From

Chris Borckholder

Date:

12 February 2020, 08:12:34

On Mon, Feb 10, 2020 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Well, it's not that hard to get rid of that scalar sub-select: since
> you're already relying on current_rollup to produce exactly one row,
> you could write a plain join instead, something like
>

Using a join instead of the sub-select did already help.

EXPLAIN (ANALYZE, BUFFERS ) WITH current_rollup AS (
  SELECT COALESCE(MAX(window_end), '-infinity') AS cutoff
  FROM exchange.ledger_zerosum_rollup
)
SELECT *
FROM exchange.ledger, current_rollup
WHERE created > current_rollup.cutoff;

https://explain.depesz.com/s/Zurb

I'm a bit confused, because the row estimate on the index scan for the
ledger table seems to be way off still,
but nonetheless the planner now chooses the index scan.
Maybe it has more insight into the result of the CTE this way?
Or picks the index scan because it fits well with the nested loop?

> The form of cheating that comes to mind is to wrap the sub-select
> in a function that's marked STABLE:
> create function current_rollup_cutoff() returns timestamp -- or whatever
> stable language sql as $$
> SELECT COALESCE(MAX(window_end), '-infinity') AS cutoff
> FROM exchange.ledger_zerosum_rollup
> $$;
> SELECT *
> FROM exchange.ledger
> WHERE created > current_rollup_cutoff();
> I have not actually tried this, but I think that since the function is
> marked stable, the planner would test-run it to get an estimated value,
> and then produce a plan similar to what you'd get with a literal constant.

The version with a function is even better, the query planner now uses
good estimates and produces a trivial execution plan.
I'll go with that one as it seems to be the most future proof approach.

https://explain.depesz.com/s/34m8

Thanks for your insight!

Best Regards
Chris