Thread: Surprising SeqScan of appendRel that can't contribute any rows to the result
Surprising SeqScan of appendRel that can't contribute any rows to the result
From
Dmytro Astapov
Date:
Hi!
OS: Debian, Rock Linux
Postgres versions: 13.6, 15.6, 17.0
Setup:
create table partA(id bigint not null, payload bigint);
insert into partA select s, s from generate_series(1,10000) s;
analyze partA;
create index on partA(id);
create index on partA(payload);
create table partB(id bigint not null);
insert into partB select s from generate_series(1,10000) s;
analyze partB;
create index on partB(id);
create view vw as
select id, payload from partA
union all
select id, NULL as payload from partB;
insert into partA select s, s from generate_series(1,10000) s;
analyze partA;
create index on partA(id);
create index on partA(payload);
create table partB(id bigint not null);
insert into partB select s from generate_series(1,10000) s;
analyze partB;
create index on partB(id);
create view vw as
select id, payload from partA
union all
select id, NULL as payload from partB;
As you can see, we have a view that UNION ALLs two tables with different number of columns. Missing column from partB is stubbed out with a constant NULL.
Now we want to join this view with a small table that has some numbers that we want to find in the `payload` column:
create table some_ids(id bigint not null);
insert into some_ids select s from generate_series(1,2) s;
insert into some_ids select s from generate_series(1,2) s;
analyze some_ids;
explain select * from some_ids i join vw on (vw.payload = i.id);
Surprisingly, this does SeqScan on partB in NestedLoops over some_ids with a filter `some_ids.id = NULL::bigint`:
Nested Loop (cost=0.29..359.16 rows=200 width=24) |
-> Seq Scan on some_ids i (cost=0.00..1.02 rows=2 width=8) |
-> Append (cost=0.29..178.56 rows=51 width=16) |
-> Index Scan using parta_payload_idx on parta (cost=0.29..8.30 rows=1 width=16) |
Index Cond: (payload = i.id) |
-> Seq Scan on partb (cost=0.00..170.00 rows=50 width=16) |
Filter: (i.id = NULL::bigint) |
At the same time `explain select * from vw where payload = 1` correctly skips over partB entirely (the node is eliminated from execution plan), and so does:
explain select * from vw where payload in (1,2);
However, any query that does not use explicit literal values still leads to SeqScan access on partB, such as:
explain select * from vw where payload in (select id from some_ids);
explain select * from vw where payload = ANY(ARRAY(select id from some_ids));
or various forms of joins
Do you know if this is expected/documented, or is this a bug?
Same setup in db-fiddle if you want to give it a quick spin: https://www.db-fiddle.com/f/hNLCR9wou9TYzcLG57q9kj/3 or https://dbfiddle.uk/5o5LQlEB
Best regards, Dmytro
Re: Surprising SeqScan of appendRel that can't contribute any rows to the result
From
David Rowley
Date:
On Wed, 18 Dec 2024 at 12:17, Dmytro Astapov <dastapov@gmail.com> wrote: > Surprisingly, this does SeqScan on partB in NestedLoops over some_ids with a filter `some_ids.id = NULL::bigint`: > > Nested Loop (cost=0.29..359.16 rows=200 width=24) > -> Seq Scan on some_ids i (cost=0.00..1.02 rows=2 width=8) > -> Append (cost=0.29..178.56 rows=51 width=16) > -> Index Scan using parta_payload_idx on parta (cost=0.29..8.30 rows=1 width=16) > Index Cond: (payload = i.id) > -> Seq Scan on partb (cost=0.00..170.00 rows=50 width=16) > Filter: (i.id = NULL::bigint) > > At the same time `explain select * from vw where payload = 1` correctly skips over partB entirely (the node is eliminatedfrom execution plan), and so does: > explain select * from vw where payload in (1,2); > > However, any query that does not use explicit literal values still leads to SeqScan access on partB, such as: > explain select * from vw where payload in (select id from some_ids); > explain select * from vw where payload = ANY(ARRAY(select id from some_ids)); > or various forms of joins > > Do you know if this is expected/documented, or is this a bug? TL;DR is it's not a bug and expected behaviour. We tend not to do much in terms of documentation about which optimisations the query planner does, so it's probably not documented anywhere aside from perhaps the source code. It might be possible for us to eliminate the scan to "partb" for the first of the plans shown above. However, the code that applies in your example case where the planner does manage to eliminate the scan does so using "base" quals, i.e. quals that are pushed down into the scan level. See apply_child_basequals(). For the Nested Loop example, the i.id = NULL::bigint isn't a base qual, so it does not work for that case. When we're building paramerised paths, as per what's used in your Nested Loop example above, we've already done the work to eliminate non-matching union children. We don't really have any concept of "this union child does not match for this specific parameterisation", so we'd need to invent something to do that (which perhaps is just removing or not adding the particular unneeded subpath from the Append pathlist.) For the other cases that depend on the results from subqueries, it's more tricky and in many cases not possible to eliminate the scans during query planner for those cases as the planner does not have information to know what will be returned by the subqueries. There might be something very limited we can do in terms of looking to see if the operator is strict or not so that we at least know that NULLs will never match, but that might be quite a corner case that it might not be worth the complexity to make that work. Someone might need to write it and see how complex it is to implement before we'd know if it was a worthwhile optimisation or not. David
Re: Surprising SeqScan of appendRel that can't contribute any rows to the result
From
Dmytro Astapov
Date:
Thank you for the very detailed answer, much appreciated!
For the benefit of people who might find this in the future via search: so far the best workaround seems to be something along the lines of:
alter table partB add column always_null bigint;
create index on partB(always_null);
And then I change vw so that instead of constant NULL I expose this column instead. Thenat instead of seqscan I get index scan on always_null which (relatively) quickly yields zero rows.
On Wed, 18 Dec 2024, 00:08 David Rowley, <dgrowleyml@gmail.com> wrote:
On Wed, 18 Dec 2024 at 12:17, Dmytro Astapov <dastapov@gmail.com> wrote:
> Surprisingly, this does SeqScan on partB in NestedLoops over some_ids with a filter `some_ids.id = NULL::bigint`:
>
> Nested Loop (cost=0.29..359.16 rows=200 width=24)
> -> Seq Scan on some_ids i (cost=0.00..1.02 rows=2 width=8)
> -> Append (cost=0.29..178.56 rows=51 width=16)
> -> Index Scan using parta_payload_idx on parta (cost=0.29..8.30 rows=1 width=16)
> Index Cond: (payload = i.id)
> -> Seq Scan on partb (cost=0.00..170.00 rows=50 width=16)
> Filter: (i.id = NULL::bigint)
>
> At the same time `explain select * from vw where payload = 1` correctly skips over partB entirely (the node is eliminated from execution plan), and so does:
> explain select * from vw where payload in (1,2);
>
> However, any query that does not use explicit literal values still leads to SeqScan access on partB, such as:
> explain select * from vw where payload in (select id from some_ids);
> explain select * from vw where payload = ANY(ARRAY(select id from some_ids));
> or various forms of joins
>
> Do you know if this is expected/documented, or is this a bug?
TL;DR is it's not a bug and expected behaviour.
We tend not to do much in terms of documentation about which
optimisations the query planner does, so it's probably not documented
anywhere aside from perhaps the source code. It might be possible for
us to eliminate the scan to "partb" for the first of the plans shown
above. However, the code that applies in your example case where the
planner does manage to eliminate the scan does so using "base" quals,
i.e. quals that are pushed down into the scan level. See
apply_child_basequals(). For the Nested Loop example, the i.id =
NULL::bigint isn't a base qual, so it does not work for that case.
When we're building paramerised paths, as per what's used in your
Nested Loop example above, we've already done the work to eliminate
non-matching union children. We don't really have any concept of "this
union child does not match for this specific parameterisation", so
we'd need to invent something to do that (which perhaps is just
removing or not adding the particular unneeded subpath from the Append
pathlist.)
For the other cases that depend on the results from subqueries, it's
more tricky and in many cases not possible to eliminate the scans
during query planner for those cases as the planner does not have
information to know what will be returned by the subqueries. There
might be something very limited we can do in terms of looking to see
if the operator is strict or not so that we at least know that NULLs
will never match, but that might be quite a corner case that it might
not be worth the complexity to make that work. Someone might need to
write it and see how complex it is to implement before we'd know if it
was a worthwhile optimisation or not.
David
Re: Surprising SeqScan of appendRel that can't contribute any rows to the result
From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes: > On Wed, 18 Dec 2024 at 12:17, Dmytro Astapov <dastapov@gmail.com> wrote: >> Surprisingly, this does SeqScan on partB in NestedLoops over some_ids with a filter `some_ids.id = NULL::bigint`: > TL;DR is it's not a bug and expected behaviour. The particular case shown here might be fixable by re-applying eval_const_expressions after we've derived the pushed-down qual. Most of the time that'd be a waste of cycles though, and I'm not sure how often it would help. regards, tom lane