Thread: Improve EXPLAIN output for multicolumn B-Tree Index
Hi,
Regarding the multicolumn B-Tree Index, I'm considering
if we can enhance the EXPLAIN output. There have been requests
for this from our customer.
As the document says, we need to use it carefully.
> The exact rule is that equality constraints on leading columns,
> plus any inequality constraints on the first column that does
> not have an equality constraint, will be used to limit the portion
> of the index that is scanned.
https://www.postgresql.org/docs/17/indexes-multicolumn.html
However, it's not easy to confirm whether multi-column indexes are
being used efficiently because we need to compare the index
definitions and query conditions individually.
For instance, just by looking at the following EXPLAIN result, we
can't determine whether the index is being used efficiently or not
at a glance. Indeed, the current index definition is not suitable
for the query, so the cost is significantly high.
=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Scan using test_idx on public.test (cost=0.42..12754.76 rows=1 width=18) (actual time=0.033..54.115 rows=1 loops=1)
Output: id1, id2, id3, value
Index Cond: ((test.id1 = 1) AND (test.id3 = 101)) -- Is it efficient or not?
Planning Time: 0.145 ms
Execution Time: 54.150 ms
(6 rows)
So, I'd like to improve the output to be more user-friendly.
# Idea
I'm considering adding new information, "Index Bound Cond", which specifies
what quals will be used for the boundary condition of the B-Tree index.
(Since this is just my current idea, I'm open to changing the output.)
Here is an example output.
-- prepare for the test
CREATE TABLE test (id1 int, id2 int, id3 int, value varchar(32));
CREATE INDEX test_idx ON test(id1, id2, id3); -- multicolumn B-Tree index
INSERT INTO test (SELECT i % 2, i, i, 'hello' FROM generate_series(1,1000000) s(i));
ANALYZE;
-- explain
=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 = 101;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using test_idx on public.test (cost=0.42..8.45 rows=1 width=18) (actual time=0.046..0.047 rows=1 loops=1)
Output: id1, id2, id3, value
Index Cond: ((test.id1 = 1) AND (test.id2 = 101))
Index Bound Cond: ((test.id1 = 1) AND (test.id2 = 101)) -- The B-Tree index is used efficiently.
Planning Time: 0.124 ms
Execution Time: 0.076 ms
(6 rows)
=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Scan using test_idx on public.test (cost=0.42..12754.76 rows=1 width=18) (actual time=0.033..54.115 rows=1 loops=1)
Output: id1, id2, id3, value
Index Cond: ((test.id1 = 1) AND (test.id3 = 101))
Index Bound Cond: (test.id1 = 1) -- The B-tree index is *not* used efficiently
-- compared to the previous execution conditions,
-- because it differs from "Index Cond".
Planning Time: 0.145 ms
Execution Time: 54.150 ms
(6 rows)
# PoC patch
The PoC patch makes the following changes:
* Adds a new variable related to bound conditions
to IndexPath, IndexScan, IndexOnlyScan, and BitmapIndexScan
* Adds quals for bound conditions to IndexPath when estimating cost, since
the B-Tree index considers the boundary condition in btcostestimate()
* Adds quals for bound conditions to the output of EXPLAIN
Thank you for reading my suggestion. Please feel free to comment.
* Is this feature useful? Is there a possibility it will be accepted?
* Are there any other ideas for determining if multicolumn indexes are
being used efficiently? Although I considered calculating the efficiency using
pg_statio_all_indexes.idx_blks_read and pg_stat_all_indexes.idx_tup_read,
I believe improving the EXPLAIN output is better because it can be output
per query and it's more user-friendly.
* Is "Index Bound Cond" the proper term?I also considered changing
"Index Cond" to only show quals for the boundary condition and adding
a new term "Index Filter".
* Would it be better to add new interfaces to Index AM? Is there any case
to output the EXPLAIN for each index context? At least, I think it's worth
considering whether it's good for amcostestimate() to modify the
IndexPath directly as the PoC patch does.
Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
Attachment
Hi,
Regarding the multicolumn B-Tree Index, I'm considering
if we can enhance the EXPLAIN output. There have been requests
for this from our customer.
As the document says, we need to use it carefully.
> The exact rule is that equality constraints on leading columns,
> plus any inequality constraints on the first column that does
> not have an equality constraint, will be used to limit the portion
> of the index that is scanned.
https://www.postgresql.org/docs/17/indexes-multicolumn.html
However, it's not easy to confirm whether multi-column indexes are
being used efficiently because we need to compare the index
definitions and query conditions individually.
For instance, just by looking at the following EXPLAIN result, we
can't determine whether the index is being used efficiently or not
at a glance. Indeed, the current index definition is not suitable
for the query, so the cost is significantly high.
=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Scan using test_idx on public.test (cost=0.42..12754.76 rows=1 width=18) (actual time=0.033..54.115 rows=1 loops=1)
Output: id1, id2, id3, value
Index Cond: ((test.id1 = 1) AND (test.id3 = 101)) -- Is it efficient or not?
Planning Time: 0.145 ms
Execution Time: 54.150 ms
(6 rows)
So, I'd like to improve the output to be more user-friendly.
# Idea
I'm considering adding new information, "Index Bound Cond", which specifies
what quals will be used for the boundary condition of the B-Tree index.
(Since this is just my current idea, I'm open to changing the output.)
Here is an example output.
-- prepare for the test
CREATE TABLE test (id1 int, id2 int, id3 int, value varchar(32));
CREATE INDEX test_idx ON test(id1, id2, id3); -- multicolumn B-Tree index
INSERT INTO test (SELECT i % 2, i, i, 'hello' FROM generate_series(1,1000000) s(i));
ANALYZE;
-- explain
=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 = 101;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using test_idx on public.test (cost=0.42..8.45 rows=1 width=18) (actual time=0.046..0.047 rows=1 loops=1)
Output: id1, id2, id3, value
Index Cond: ((test.id1 = 1) AND (test.id2 = 101))
Index Bound Cond: ((test.id1 = 1) AND (test.id2 = 101)) -- The B-Tree index is used efficiently.
Planning Time: 0.124 ms
Execution Time: 0.076 ms
(6 rows)
=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Scan using test_idx on public.test (cost=0.42..12754.76 rows=1 width=18) (actual time=0.033..54.115 rows=1 loops=1)
Output: id1, id2, id3, value
Index Cond: ((test.id1 = 1) AND (test.id3 = 101))
Index Bound Cond: (test.id1 = 1) -- The B-tree index is *not* used efficiently
-- compared to the previous execution conditions,
-- because it differs from "Index Cond".
Planning Time: 0.145 ms
Execution Time: 54.150 ms
(6 rows)
# PoC patch
The PoC patch makes the following changes:
* Adds a new variable related to bound conditions
to IndexPath, IndexScan, IndexOnlyScan, and BitmapIndexScan
* Adds quals for bound conditions to IndexPath when estimating cost, since
the B-Tree index considers the boundary condition in btcostestimate()
* Adds quals for bound conditions to the output of EXPLAIN
Thank you for reading my suggestion. Please feel free to comment.
* Is this feature useful? Is there a possibility it will be accepted?
* Are there any other ideas for determining if multicolumn indexes are
being used efficiently? Although I considered calculating the efficiency using
pg_statio_all_indexes.idx_blks_read and pg_stat_all_indexes.idx_tup_read,
I believe improving the EXPLAIN output is better because it can be output
per query and it's more user-friendly.
* Is "Index Bound Cond" the proper term?I also considered changing
"Index Cond" to only show quals for the boundary condition and adding
a new term "Index Filter".
* Would it be better to add new interfaces to Index AM? Is there any case
to output the EXPLAIN for each index context? At least, I think it's worth
considering whether it's good for amcostestimate() to modify the
IndexPath directly as the PoC patch does.
On Fri, 21 Jun 2024 07:12:25 +0000 <Masahiro.Ikeda@nttdata.com> wrote: > * Is this feature useful? Is there a possibility it will be accepted? I think adding such information to EXPLAIN outputs is useful because it will help users confirm the effect of a multicolumn index on a certain query and decide to whether leave, drop, or recreate the index, and so on. > * Are there any other ideas for determining if multicolumn indexes are > > being used efficiently? Although I considered calculating the efficiency using > > pg_statio_all_indexes.idx_blks_read and pg_stat_all_indexes.idx_tup_read, > > I believe improving the EXPLAIN output is better because it can be output > > per query and it's more user-friendly. It seems for me improving EXPLAIN is a natural way to show information on query optimization like index scans. > * Is "Index Bound Cond" the proper term?I also considered changing > > "Index Cond" to only show quals for the boundary condition and adding > > a new term "Index Filter". "Index Bound Cond" seems not intuitive for me because I could not find description explaining what this means from the documentation. I like "Index Filter" that implies the index has to be scanned. > * Would it be better to add new interfaces to Index AM? Is there any case > > to output the EXPLAIN for each index context? At least, I think it's worth > > considering whether it's good for amcostestimate() to modify the > > IndexPath directly as the PoC patch does. I am not sure it is the best way to modify IndexPath in amcostestimate(), but I don't have better ideas for now. Regards, Yugo Nagata > > > > > Regards, > > -- > > Masahiro Ikeda > > NTT DATA CORPORATION > > -- Yugo NAGATA <nagata@sraoss.co.jp>
> I am unable to decide whether reporting the bound quals is just enough to decide the efficiency of index without knowingthe difference in the number of index tuples selectivity and heap tuple selectivity. The difference seems to be abetter indicator of index efficiency whereas the bound quals will help debug the in-efficiency, if any. > Also, do we want to report bound quals even if they are the same as index conditions or just when they are different? Thank you for your comment. After receiving your comment, I thought it would be better to also report information that wouldmake the difference in selectivity understandable. One idea I had is to output the number of index tuples inefficientlyextracted, like “Rows Removed by Filter”. Users can check the selectivity and efficiency by looking at the number. Also, I thought it would be better to change the way bound quals are reported to align with the "Filter". I think it wouldbe better to modify it so that it does not output when the bound quals are the same as the index conditions. In my local PoC patch, I have modified the output as follows, what do you think? =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 = 101; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Index Scan using test_idx on ikedamsh.test (cost=0.42..8.45 rows=1 width=18) (actual time=0.082..0.086 rows=1 loops=1) Output: id1, id2, id3, value Index Cond: ((test.id1 = 1) AND (test.id2 = 101)) -- If it’s efficient, the output won’t change. Planning Time: 5.088 ms Execution Time: 0.162 ms (5 rows) =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Index Scan using test_idx on ikedamsh.test (cost=0.42..12630.10 rows=1 width=18) (actual time=0.175..279.819 rows=1 loops=1) Output: id1, id2, id3, value Index Cond: (test.id1 = 1) -- Change the output. Show only the bound quals. Index Filter: (test.id3 = 101) -- New. Output quals which are not used as the bound quals Rows Removed by Index Filter: 499999 -- New. Output when ANALYZE option is specified Planning Time: 0.354 ms Execution Time: 279.908 ms (7 rows) Regards, -- Masahiro Ikeda NTT DATA CORPORATION
> > * Is this feature useful? Is there a possibility it will be accepted? > > I think adding such information to EXPLAIN outputs is useful because it will help users > confirm the effect of a multicolumn index on a certain query and decide to whether > leave, drop, or recreate the index, and so on. Thank you for your comments and for empathizing with the utility of the approach. > > * Are there any other ideas for determining if multicolumn indexes are > > > > being used efficiently? Although I considered calculating the > > efficiency using > > > > pg_statio_all_indexes.idx_blks_read and > > pg_stat_all_indexes.idx_tup_read, > > > > I believe improving the EXPLAIN output is better because it can be > > output > > > > per query and it's more user-friendly. > > It seems for me improving EXPLAIN is a natural way to show information on query > optimization like index scans. OK, I'll proceed with the way. > > * Is "Index Bound Cond" the proper term?I also considered changing > > > > "Index Cond" to only show quals for the boundary condition and adding > > > > a new term "Index Filter". > > "Index Bound Cond" seems not intuitive for me because I could not find description > explaining what this means from the documentation. I like "Index Filter" that implies the > index has to be scanned. OK, I think you are right. Even at this point, there are things like ‘Filter’ and ‘Rows Removed by Filter’, so it seems natural to align with them. I described a new output example in the previous email, how about that? > > * Would it be better to add new interfaces to Index AM? Is there any > > case > > > > to output the EXPLAIN for each index context? At least, I think it's > > worth > > > > considering whether it's good for amcostestimate() to modify the > > > > IndexPath directly as the PoC patch does. > > I am not sure it is the best way to modify IndexPath in amcostestimate(), but I don't > have better ideas for now. OK, I’ll consider what the best way to change is. In addition, if we add "Rows Removed by Index Filter", we might need to consider a method to receive the number of filtered tuples at execution time from Index AM. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Mon, 24 Jun 2024 at 04:38, <Masahiro.Ikeda@nttdata.com> wrote: > > In my local PoC patch, I have modified the output as follows, what do you think? > > =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 = 101; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------- > Index Scan using test_idx on ikedamsh.test (cost=0.42..8.45 rows=1 width=18) (actual time=0.082..0.086 rows=1 loops=1) > Output: id1, id2, id3, value > Index Cond: ((test.id1 = 1) AND (test.id2 = 101)) -- If it’s efficient, the output won’t change. > Planning Time: 5.088 ms > Execution Time: 0.162 ms > (5 rows) > > =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------- > Index Scan using test_idx on ikedamsh.test (cost=0.42..12630.10 rows=1 width=18) (actual time=0.175..279.819 rows=1 loops=1) > Output: id1, id2, id3, value > Index Cond: (test.id1 = 1) -- Change the output. Show only the bound quals. > Index Filter: (test.id3 = 101) -- New. Output quals which are not used as the bound quals I think this is too easy to confuse with the pre-existing 'Filter' condition, which you'll find on indexes with INCLUDE-d columns or filters on non-index columns. Furthermore, I think this is probably not helpful (maybe even harmful) for index types like GIN and BRIN, where index searchkey order is mostly irrelevant to the index shape and performance. Finally, does this change the index AM API? Does this add another scankey argument to ->amrescan? > Rows Removed by Index Filter: 499999 -- New. Output when ANALYZE option is specified Separate from the changes to Index Cond/Index Filter output changes I think this can be useful output, though I'd probably let the AM specify what kind of filter data to display. E.g. BRIN may well want to display how many ranges matched the predicate, vs how many ranges were unsummarized and thus returned; two conditions which aren't as easy to differentiate but can be important debugging query performance. > Planning Time: 0.354 ms > Execution Time: 279.908 ms > (7 rows) Was this a test against the same dataset as the one you'd posted your measurements of your first patchset with? The execution time seems to have slown down quite significantly, so if the testset is the same then this doesn't bode well for your patchset. Kind regards, Matthias van de Meent
+1 for the idea. On Mon, 24 Jun 2024 at 11:11, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > I think this is too easy to confuse with the pre-existing 'Filter' > condition, which you'll find on indexes with INCLUDE-d columns or > filters on non-index columns. Why not combine them? And both call them Filter? In a sense this filtering acts very similar to INCLUDE based filtering (for btrees at least). Although I might be wrong about that, because when I try to confirm the same perf using the following script I do get quite different timings (maybe you have an idea what's going on here). But even if it does mean something slightly different perf wise, I think using Filter for both is unlikely to confuse anyone. Since, while allowed, it seems extremely unlikely in practice that someone will use the same column as part of the indexed columns and as part of the INCLUDE-d columns (why would you store the same info twice). CREATE TABLE test (id1 int, id2 int, id3 int, value varchar(32)); INSERT INTO test (SELECT i % 10, i % 1000, i, 'hello' FROM generate_series(1,1000000) s(i)); vacuum freeze test; CREATE INDEX test_idx_include ON test(id1, id2) INCLUDE (id3); ANALYZE test; EXPLAIN (VERBOSE, ANALYZE, BUFFERS) SELECT id1, id3 FROM test WHERE id1 = 1 AND id3 = 101; CREATE INDEX test_idx ON test(id1, id2, id3); ANALYZE test; EXPLAIN (VERBOSE, ANALYZE, BUFFERS) SELECT id1, id3 FROM test WHERE id1 = 1 AND id3 = 101; QUERY PLAN ─────────────────────────────────────── Index Only Scan using test_idx_include on public.test (cost=0.42..3557.09 rows=1 width=8) (actual time=0.708..6.639 rows=1 loops=1) Output: id1, id3 Index Cond: (test.id1 = 1) Filter: (test.id3 = 101) Rows Removed by Filter: 99999 Heap Fetches: 0 Buffers: shared hit=1 read=386 Query Identifier: 471139784017641093 Planning: Buffers: shared hit=8 read=1 Planning Time: 0.091 ms Execution Time: 6.656 ms (12 rows) Time: 7.139 ms QUERY PLAN ───────────────────────────────────── Index Only Scan using test_idx on public.test (cost=0.42..2591.77 rows=1 width=8) (actual time=0.238..2.110 rows=1 loops=1) Output: id1, id3 Index Cond: ((test.id1 = 1) AND (test.id3 = 101)) Heap Fetches: 0 Buffers: shared hit=1 read=386 Query Identifier: 471139784017641093 Planning: Buffers: shared hit=10 read=1 Planning Time: 0.129 ms Execution Time: 2.128 ms (10 rows) Time: 2.645 ms
On Mon, 24 Jun 2024 at 11:58, Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > > +1 for the idea. > > On Mon, 24 Jun 2024 at 11:11, Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > I think this is too easy to confuse with the pre-existing 'Filter' > > condition, which you'll find on indexes with INCLUDE-d columns or > > filters on non-index columns. > > Why not combine them? And both call them Filter? In a sense this > filtering acts very similar to INCLUDE based filtering (for btrees at > least). It does not really behave similar: index scan keys (such as the id3=101 scankey) don't require visibility checks in the btree code, while the Filter condition _does_ require a visibility check, and delegates the check to the table AM if the scan isn't Index-Only, or if the VM didn't show all-visible during the check. Furthermore, the index could use the scankey to improve the number of keys to scan using "skip scans"; by realising during a forward scan that if you've reached tuple (1, 2, 3) and looking for (1, _, 1) you can skip forward to (1, 3, _), rather than having to go through tuples (1, 2, 4), (1, 2, 5), ... (1, 2, n). This is not possible for INCLUDE-d columns, because their datatypes and structure are opaque to the index AM; the AM cannot assume anything about or do anything with those values. > Although I might be wrong about that, because when I try to > confirm the same perf using the following script I do get quite > different timings (maybe you have an idea what's going on here). But > even if it does mean something slightly different perf wise, I think > using Filter for both is unlikely to confuse anyone. I don't want A to to be the plan, while showing B' to the user, as the performance picture for the two may be completely different. And, as I mentioned upthread, the differences between AMs in the (lack of) meaning in index column order also makes it quite wrong to generally separate prefixes equalities from the rest of the keys. > Since, while > allowed, it seems extremely unlikely in practice that someone will use > the same column as part of the indexed columns and as part of the > INCLUDE-d columns (why would you store the same info twice). Yeah, people don't generally include the same index column more than once in the same index. > CREATE INDEX test_idx_include ON test(id1, id2) INCLUDE (id3); > CREATE INDEX test_idx ON test(id1, id2, id3); > > QUERY PLAN > ─────────────────────────────────────── > Index Only Scan using test_idx_include on public.test [...] > Time: 7.139 ms > QUERY PLAN > ───────────────────────────────────── > Index Only Scan using test_idx on public.test (cost=0.42..2591.77 [...] > Time: 2.645 ms As you can see, there's a huge difference in performance. Putting both non-bound and "normal" filter clauses in the same Filter clause will make it more difficult to explain performance issues based on only the explain output. Kind regards, Matthias van de Meent Neon (https://neon.tech)
On Mon, 24 Jun 2024 at 13:02, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > It does not really behave similar: index scan keys (such as the > id3=101 scankey) don't require visibility checks in the btree code, > while the Filter condition _does_ require a visibility check, and > delegates the check to the table AM if the scan isn't Index-Only, or > if the VM didn't show all-visible during the check. Any chance you could point me in the right direction for the code/docs/comment about this? I'd like to learn a bit more about why that is the case, because I didn't realize visibility checks worked differently for index scan keys and Filter keys. > Furthermore, the index could use the scankey to improve the number of > keys to scan using "skip scans"; by realising during a forward scan > that if you've reached tuple (1, 2, 3) and looking for (1, _, 1) you > can skip forward to (1, 3, _), rather than having to go through tuples > (1, 2, 4), (1, 2, 5), ... (1, 2, n). This is not possible for > INCLUDE-d columns, because their datatypes and structure are opaque to > the index AM; the AM cannot assume anything about or do anything with > those values. Does Postgres actually support this currently? I thought skip scans were not available (yet). > I don't want A to to be the plan, while showing B' to the user, as the > performance picture for the two may be completely different. And, as I > mentioned upthread, the differences between AMs in the (lack of) > meaning in index column order also makes it quite wrong to generally > separate prefixes equalities from the rest of the keys. Yeah, that makes sense. These specific explain lines probably only/mostly make sense for btree. So yeah we'd want the index AM to be able to add some stuff to the explain plan. > As you can see, there's a huge difference in performance. Putting both > non-bound and "normal" filter clauses in the same Filter clause will > make it more difficult to explain performance issues based on only the > explain output. Fair enough, that's of course the main point of this patch in the first place: being able to better interpret the explain plan when you don't have access to the schema. Still I think Filter is the correct keyword for both, so how about we make it less confusing by making the current "Filter" more specific by calling it something like "Non-key Filter" or "INCLUDE Filter" and then call the other something like "Index Filter" or "Secondary Bound Filter".
> I am unable to decide whether reporting the bound quals is just enough to decide the efficiency of index without knowing the difference in the number of index tuples selectivity and heap tuple selectivity. The difference seems to be a better indicator of index efficiency whereas the bound quals will help debug the in-efficiency, if any.
> Also, do we want to report bound quals even if they are the same as index conditions or just when they are different?
Thank you for your comment. After receiving your comment, I thought it would be better to also report information that would make the difference in selectivity understandable. One idea I had is to output the number of index tuples inefficiently extracted, like “Rows Removed by Filter”. Users can check the selectivity and efficiency by looking at the number.
Also, I thought it would be better to change the way bound quals are reported to align with the "Filter". I think it would be better to modify it so that it does not output when the bound quals are the same as the index conditions.
In my local PoC patch, I have modified the output as follows, what do you think?
=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 = 101;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Index Scan using test_idx on ikedamsh.test (cost=0.42..8.45 rows=1 width=18) (actual time=0.082..0.086 rows=1 loops=1)
Output: id1, id2, id3, value
Index Cond: ((test.id1 = 1) AND (test.id2 = 101)) -- If it’s efficient, the output won’t change.
Planning Time: 5.088 ms
Execution Time: 0.162 ms
(5 rows)
=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Index Scan using test_idx on ikedamsh.test (cost=0.42..12630.10 rows=1 width=18) (actual time=0.175..279.819 rows=1 loops=1)
Output: id1, id2, id3, value
Index Cond: (test.id1 = 1) -- Change the output. Show only the bound quals.
Index Filter: (test.id3 = 101) -- New. Output quals which are not used as the bound quals
Rows Removed by Index Filter: 499999 -- New. Output when ANALYZE option is specified
Planning Time: 0.354 ms
Execution Time: 279.908 ms
(7 rows)
On Mon, 24 Jun 2024 at 14:42, Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > > On Mon, 24 Jun 2024 at 13:02, Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > It does not really behave similar: index scan keys (such as the > > id3=101 scankey) don't require visibility checks in the btree code, > > while the Filter condition _does_ require a visibility check, and > > delegates the check to the table AM if the scan isn't Index-Only, or > > if the VM didn't show all-visible during the check. > > Any chance you could point me in the right direction for the > code/docs/comment about this? I'd like to learn a bit more about why > that is the case, because I didn't realize visibility checks worked > differently for index scan keys and Filter keys. This can be derived by combining how Filter works (it only filters the returned live tuples) and how Index-Only scans work (return the index tuple, unless !ALL_VISIBLE, in which case the heap tuple is projected). There have been several threads more or less recently that also touch this topic and closely related topics, e.g. [0][1]. > > Furthermore, the index could use the scankey to improve the number of > > keys to scan using "skip scans"; by realising during a forward scan > > that if you've reached tuple (1, 2, 3) and looking for (1, _, 1) you > > can skip forward to (1, 3, _), rather than having to go through tuples > > (1, 2, 4), (1, 2, 5), ... (1, 2, n). This is not possible for > > INCLUDE-d columns, because their datatypes and structure are opaque to > > the index AM; the AM cannot assume anything about or do anything with > > those values. > > Does Postgres actually support this currently? I thought skip scans > were not available (yet). Peter Geoghegan has been working on it as project after PG17's IN()-list improvements were committed, and I hear he has the basics working but the further details need fleshing out. > > As you can see, there's a huge difference in performance. Putting both > > non-bound and "normal" filter clauses in the same Filter clause will > > make it more difficult to explain performance issues based on only the > > explain output. > > Fair enough, that's of course the main point of this patch in the > first place: being able to better interpret the explain plan when you > don't have access to the schema. Still I think Filter is the correct > keyword for both, so how about we make it less confusing by making the > current "Filter" more specific by calling it something like "Non-key > Filter" or "INCLUDE Filter" and then call the other something like > "Index Filter" or "Secondary Bound Filter". I'm not sure how debuggable explain plans are without access to the schema, especially when VERBOSE isn't configured, so I would be hesitant to accept that as an argument here. Kind regards, Matthias van de Meent Neon (https://neon.tech) [0] https://www.postgresql.org/message-id/flat/N1xaIrU29uk5YxLyW55MGk5fz9s6V2FNtj54JRaVlFbPixD5z8sJ07Ite5CvbWwik8ZvDG07oSTN-usENLVMq2UAcizVTEd5b-o16ZGDIIU%3D%40yamlcoder.me [1] https://www.postgresql.org/message-id/flat/cf85f46f-b02f-05b2-5248-5000b894ebab%40enterprisedb.com
> On Mon, 24 Jun 2024 at 04:38, <Masahiro.Ikeda@nttdata.com> wrote: > > > > In my local PoC patch, I have modified the output as follows, what do you think? > > > > =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 = > 101; > > QUERY PLAN > > ---------------------------------------------------------------------- > > --------------------------------------------------- > > Index Scan using test_idx on ikedamsh.test (cost=0.42..8.45 rows=1 width=18) > (actual time=0.082..0.086 rows=1 loops=1) > > Output: id1, id2, id3, value > > Index Cond: ((test.id1 = 1) AND (test.id2 = 101)) -- If it’s efficient, the output > won’t change. > > Planning Time: 5.088 ms > > Execution Time: 0.162 ms > > (5 rows) > > > > =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = > 101; > > QUERY PLAN > > ---------------------------------------------------------------------- > > --------------------------------------------------------- > > Index Scan using test_idx on ikedamsh.test (cost=0.42..12630.10 rows=1 > width=18) (actual time=0.175..279.819 rows=1 loops=1) > > Output: id1, id2, id3, value > > Index Cond: (test.id1 = 1) -- Change the output. Show only the > bound quals. > > Index Filter: (test.id3 = 101) -- New. Output quals which are not > used as the bound quals > > I think this is too easy to confuse with the pre-existing 'Filter' > condition, which you'll find on indexes with INCLUDE-d columns or filters on non-index > columns. Thanks for your comment. I forgot the case. > Furthermore, I think this is probably not helpful (maybe even harmful) for index types > like GIN and BRIN, where index searchkey order is mostly irrelevant to the index shape > and performance. Yes, I expected that only B-Tree index support the feature. > Finally, does this change the index AM API? Does this add another scankey argument to > ->amrescan? Yes, I think so. But since I'd like to make users know the index scan will happen without ANALYZE, I planned to change amcostestimate for "Index Filter" and amrescan() for "Rows Removed by Index Filter". > > Rows Removed by Index Filter: 499999 -- New. Output when ANALYZE option > is specified > > Separate from the changes to Index Cond/Index Filter output changes I think this can > be useful output, though I'd probably let the AM specify what kind of filter data to > display. > E.g. BRIN may well want to display how many ranges matched the predicate, vs how > many ranges were unsummarized and thus returned; two conditions which aren't as > easy to differentiate but can be important debugging query performance. OK, thanks. I understood that it would be nice if we could customize to output information specific to other indexes like BRIN. > > Planning Time: 0.354 ms > > Execution Time: 279.908 ms > > (7 rows) > > Was this a test against the same dataset as the one you'd posted your measurements of > your first patchset with? The execution time seems to have slown down quite > significantly, so if the testset is the same then this doesn't bode well for your patchset. Yes, the reason is that the cache hit ratio is very low since I tested after I restarted the machine. I had to add BUFFERS option. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
> +1 for the idea. Thanks! I was interested in the test result that you shared. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
>>=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101; >> QUERY PLAN >>------------------------------------------------------------------------------------------------------------------------------- >> Index Scan using test_idx on ikedamsh.test (cost=0.42..12630.10 rows=1 width=18) (actual time=0.175..279.819 rows=1 loops=1) >> Output: id1, id2, id3, value >> Index Cond: (test.id1 = 1) -- Change the output. Show only the bound quals. >> Index Filter: (test.id3 = 101) -- New. Output quals which are not used as the bound quals >> Rows Removed by Index Filter: 499999 -- New. Output when ANALYZE option is specified >> Planning Time: 0.354 ms >> Execution Time: 279.908 ms >> (7 rows) > > I don't think we want to split these clauses. Index Cond should indicate the conditions applied > to the index scan. Bound quals should be listed separately even though they will have an > intersection with Index Cond. I am not sure whether Index Filter is the right name, > maybe Index Bound Cond: But I don't know this area enough to make a final call. OK, I understood that it's better to only add new ones. I think "Index Filter" fits other than "Index Bound Cond" if we introduce "Rows Removed By Index Filter". > About Rows Removed by Index Filter: it's good to provide a number when ANALYZE is > specified, but it will be also better to specify what was estimated. We do that for (cost snd rows etc.) > but doing that somewhere in the plan output may not have a precedent. I think we should try that > and see what others think. It's interesting! It’s an idea that can be applied not only to multi-column indexes, right? I will consider the implementation and discuss it in a new thread. However, I would like to focus on the feature to output information about multi-column indexes at first. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
> On Mon, 24 Jun 2024 at 14:42, Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > > > > On Mon, 24 Jun 2024 at 13:02, Matthias van de Meent > > <boekewurm+postgres@gmail.com> wrote: > > > It does not really behave similar: index scan keys (such as the > > > id3=101 scankey) don't require visibility checks in the btree code, > > > while the Filter condition _does_ require a visibility check, and > > > delegates the check to the table AM if the scan isn't Index-Only, or > > > if the VM didn't show all-visible during the check. > > > > Any chance you could point me in the right direction for the > > code/docs/comment about this? I'd like to learn a bit more about why > > that is the case, because I didn't realize visibility checks worked > > differently for index scan keys and Filter keys. > > This can be derived by combining how Filter works (it only filters the returned live tuples) > and how Index-Only scans work (return the index tuple, unless !ALL_VISIBLE, in which > case the heap tuple is projected). There have been several threads more or less > recently that also touch this topic and closely related topics, e.g. [0][1]. Thanks! I could understand what is difference between INCLUDE based filter and index filter. > > > As you can see, there's a huge difference in performance. Putting > > > both non-bound and "normal" filter clauses in the same Filter clause > > > will make it more difficult to explain performance issues based on > > > only the explain output. > > > > Fair enough, that's of course the main point of this patch in the > > first place: being able to better interpret the explain plan when you > > don't have access to the schema. Still I think Filter is the correct > > keyword for both, so how about we make it less confusing by making the > > current "Filter" more specific by calling it something like "Non-key > > Filter" or "INCLUDE Filter" and then call the other something like > > "Index Filter" or "Secondary Bound Filter". > > I'm not sure how debuggable explain plans are without access to the schema, especially > when VERBOSE isn't configured, so I would be hesitant to accept that as an argument > here. IMHO, it's nice to be able to understand the differences between each FILTER even without the VERBOSE option. (+1 for Jelte Fennema-Nio's idea) Even without access to the schema, it would be possible to quickly know if the plan is not as expected, and I believe there are virtually no disadvantages to having multiple "XXX FILTER" outputs. If it's better to output such information only with the VERBOSE option, What do you think about the following idea? * When the VERBOSE option is not specified, output as "Filter" in all cases * When the VERBOSE option is specified, output as "Non-key Filter", "INCLUDE Filter" and "Index Filter". In addition, I think it would be good to mention the differences between each filter in the documentation. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Fri, Jun 21, 2024 at 3:12 AM <Masahiro.Ikeda@nttdata.com> wrote: > Regarding the multicolumn B-Tree Index, I'm considering > if we can enhance the EXPLAIN output. There have been requests > for this from our customer. I agree that this is a real problem -- I'm not surprised to hear that your customer asked about it. In the past, we've heard complaints about this problem from Markus Winand, too: https://use-the-index-luke.com/sql/explain-plan/postgresql/filter-predicates As it happens I have been thinking about this problem a lot recently. Specifically the user-facing aspects, what we show in EXPLAIN. It is relevant to my ongoing work on skip scan: https://commitfest.postgresql.org/48/5081/ https://www.postgresql.org/message-id/flat/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com Unfortunately, my patch will make the situation more complicated for your patch. I would like to resolve the tension between the two patches, but I'm not sure how to do that. If you look at the example query that I included in my introductory email on the skip scan thread (the query against the sales_mdam_paper table), you'll see that my patch makes it go much faster. My patch will effectively "convert" nbtree scan keys that would traditionally have to use non-index-bound conditions/filter predicates, into index-bound conditions/access predicates. This all happens at runtime, during nbtree preprocessing (not during planning). This may mean that your patch's approach of determining which columns/scan keys are in which category (bound vs. non-bound) cannot rely on its current approach of placing each type of clause into one of two categories inside btcostestimate() -- the view that we see from btcostestimate() will be made less authoritative by skip scan. What actually matters in what happens during nbtree preprocessing, inside _bt_preprocess_keys(). Unfortunately, this is even more complicated than it sounds. It would be a good idea if we moved _bt_preprocess_keys() to plan time, so that btcostestimate() operated off of authoritative information, rather than independently figuring out the same details for the purposes of costing. We've talked about this before, even [1]. That way your patch could just work off of this authoritative information. But even that doesn't necessarily fix the problem. Note that the skip scan patch makes _bt_preprocess_keys() *indiscriminately* "convert" *all* scan keys to index bound conditions -- at least where that's possible at all. There are minor implementation restrictions that mean that we can't always do that. But overall, the patch more or less eliminates non-bound index conditions. That is, it'll be rare to non-existent for nbtree to fail to mark *any* scan key as SK_BT_REQFWD/SK_BT_REQBKWD. Technically speaking, non-bound conditions mostly won't exist anymore. Of course, this doesn't mean that the problem that your patch is solving will actually go away. I fully expect that the skip scan patch will merely make some scan keys "required-by-scan/index bound condition scan keys in name only". Technically they won't be the problematic kind of index condition, but that won't actually be true in any practical sense. Because users (like your customer) will still get full index scans, and be surprised, just like today. As I explain in my email on the skip scan thread, I believe that the patch's aggressive approach to "converting" scan keys is an advantage. The amount of skipping that actually takes place should be decided dynamically, at runtime. It is a decision that should be made at the level of individual leaf pages (or small groups of leaf pages), not at the level of the whole scan. The distinction between index bound conditions and non-bound conditions becomes much more "squishy", which is mostly (though not entirely) a good thing. I really don't know what to do about this. As I said, I agree with the general idea of this patch -- this is definitely a real problem. And, I don't pretend that my skip scan patch will actually define the problem out of existence (except perhaps in those cases that it actually makes it much faster). Maybe we could make a guess (based on statistics) whether or not any skip attributes will leave the lower-order clauses as useful index bound conditions at runtime. But I don't know...that condition is actually a "continuous" condition now -- it is not a strict dichotomy (it is not either/or, but rather a question of degree, perhaps on a scale of 0.0 - 1.0). It's also possible that we should just do something simple, like your patch, even though technically it won't really be accurate in cases where skip scan is used to good effect. Maybe showing the "default working assumption" about how the scan keys/clauses will behave at runtime is actually the right thing to do. Maybe I am just overthinking it. [1] https://www.postgresql.org/message-id/2587523.1647982549@sss.pgh.pa.us -- the final full paragraph mentions moving _bt_preprocess_keys() into the planner -- Peter Geoghegan
On Thu, 27 Jun 2024 at 22:02, Peter Geoghegan <pg@bowt.ie> wrote: > It's also possible that we should just do something simple, like your > patch, even though technically it won't really be accurate in cases > where skip scan is used to good effect. Maybe showing the "default > working assumption" about how the scan keys/clauses will behave at > runtime is actually the right thing to do. Maybe I am just > overthinking it. IIUC, you're saying that your skip scan will improve the situation Masahiro describes dramatically in some/most cases. But it still won't be as good as a pure index "prefix" scan. If that's the case then I do think you're overthinking this a bit. Because then you'd still want to see this difference between the prefix-scan keys and the skip-scan keys. I think the main thing that the introduction of the skip scan changes is the name that we should show, e.g. instead of "Non-key Filter" we might want to call it "Skip Scan Cond" I do think though that in addition to a "Skip Scan Filtered" count for ANALYZE, it would be very nice to also get a "Skip Scan Skipped" count (if that's possible to measure/estimate somehow). This would allow users to determine how effective the skip scan was, i.e. were they able to skip over large swaths of the index? Or did they skip over nothing because the second column of the index (on which there was no filter) was unique within the table
On Thu, Jun 27, 2024 at 4:46 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > On Thu, 27 Jun 2024 at 22:02, Peter Geoghegan <pg@bowt.ie> wrote: > > It's also possible that we should just do something simple, like your > > patch, even though technically it won't really be accurate in cases > > where skip scan is used to good effect. Maybe showing the "default > > working assumption" about how the scan keys/clauses will behave at > > runtime is actually the right thing to do. Maybe I am just > > overthinking it. > > IIUC, you're saying that your skip scan will improve the situation > Masahiro describes dramatically in some/most cases. "Most cases" seems likely to be overstating it. Overall, I doubt that it makes sense to try to generalize like that. The breakdown of the cases that we see in the field right now (whatever it really is) is bound to be strongly influenced by the current capabilities of Postgres. If something is intolerably slow, then it just isn't tolerated. If something works adequately, then users don't usually care why it is so. > But it still won't > be as good as a pure index "prefix" scan. Typically, no, it won't be. But there's really no telling for sure. The access patterns for a composite index on '(a, b)' with a qual "WHERE b = 5" are identical to a qual explicitly written "WHERE a = any(<every possible value in 'a'>) AND b = 5". > If that's the case then I do think you're overthinking this a bit. > Because then you'd still want to see this difference between the > prefix-scan keys and the skip-scan keys. I think the main thing that > the introduction of the skip scan changes is the name that we should > show, e.g. instead of "Non-key Filter" we might want to call it "Skip > Scan Cond" What about cases where we legitimately have to vary our strategy during the same index scan? We might very well be able to skip over many leaf pages when scanning through a low cardinality subset of the index (low cardinality in respect of a leading column 'a'). Then we might find that there are long runs on leaf pages where no skipping is possible. I don't expect this to be uncommon. I do expect it to happen when the optimizer wasn't particularly expecting it. Like when a full index scan was the fastest plan anyway. Or when a skip scan wasn't quite as good as expected, but nevertheless turned out to be the fastest plan. > I do think though that in addition to a "Skip Scan Filtered" count for > ANALYZE, it would be very nice to also get a "Skip Scan Skipped" count > (if that's possible to measure/estimate somehow). This would allow > users to determine how effective the skip scan was, i.e. were they > able to skip over large swaths of the index? Or did they skip over > nothing because the second column of the index (on which there was no > filter) was unique within the table Yeah, EXPLAIN ANALYZE should probably be showing something about skipping. That provides us with a way of telling the user what really happened, which could help when EXPLAIN output alone turns out to be quite misleading. In fact, that'd make sense even today, without skip scan (just with the 17 work on nbtree SAOP scans). Even with regular SAOP nbtree index scans, the number of primitive scans is hard to predict, and quite indicative of what's really going on with the scan. -- Peter Geoghegan
On Thu, 27 Jun 2024 at 22:02, Peter Geoghegan <pg@bowt.ie> wrote: > Unfortunately, my patch will make the situation more complicated > for your patch. I would like to resolve the tension between the > two patches, but I'm not sure how to do that. OK. I would like to understand more about your proposed patch. I have also registered as a reviewer in the commitfests entry. On 2024-06-28 07:40, Peter Geoghegan wrote: > On Thu, Jun 27, 2024 at 4:46 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote: >> I do think though that in addition to a "Skip Scan Filtered" count for >> ANALYZE, it would be very nice to also get a "Skip Scan Skipped" count >> (if that's possible to measure/estimate somehow). This would allow >> users to determine how effective the skip scan was, i.e. were they >> able to skip over large swaths of the index? Or did they skip overx >> nothing because the second column of the index (on which there was no >> filter) was unique within the table > > Yeah, EXPLAIN ANALYZE should probably be showing something about > skipping. That provides us with a way of telling the user what really > happened, which could help when EXPLAIN output alone turns out to be > quite misleading. > > In fact, that'd make sense even today, without skip scan (just with > the 17 work on nbtree SAOP scans). Even with regular SAOP nbtree index > scans, the number of primitive scans is hard to predict, and quite > indicative of what's really going on with the scan. I agree as well. Although I haven't looked on your patch yet, if it's difficult to know how it can optimize during the planning phase, it's enough for me to just show "Skip Scan Cond (or Non-Key Filter)". This is because users can understand that inefficient index scans *may* occur. If users want more detail, they can execute "EXPLAIN ANALYZE". This will allow them to understand the execution effectively and determine if there is any room to optimize the plan by looking at the counter of "Skip Scan Filtered (or Skip Scan Skipped)". In terms of the concept of EXPLAIN output, I thought that runtime partition pruning is similar. "EXPLAIN without ANALYZE" only shows the possibilities and "EXPLAIN ANALYZE" shows the actual results. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Fri, 28 Jun 2024 at 00:41, Peter Geoghegan <pg@bowt.ie> wrote: > Typically, no, it won't be. But there's really no telling for sure. > The access patterns for a composite index on '(a, b)' with a qual > "WHERE b = 5" are identical to a qual explicitly written "WHERE a = > any(<every possible value in 'a'>) AND b = 5". Hmm, that's true. But in that case the explain plan gives a clear hint that something like that might be going on, because you'll see: Index Cond: a = any(<every possible value in 'a'>) AND b = 5 That does make me think of another way, and maybe more "correct" way, of representing Masahiros situation than adding a new "Skip Scan Cond" row to the EXPLAIN output. We could explicitly include a comparison to all prefix columns in the Index Cond: Index Cond: ((test.id1 = 1) AND (test.id2 = ANY) AND (test.id3 = 101)) Or if you want to go with syntactically correct SQL we could do the following: Index Cond: ((test.id1 = 1) AND ((test.id2 IS NULL) OR (test.id2 IS NOT NULL)) AND (test.id3 = 101)) An additional benefit this provides is that you now know which additional column you should use a more specific filter on to speed up the query. In this case test.id2 OTOH, maybe it's not a great way because actually running that puts the IS NULL+ IS NOT NULL query in the Filter clause (which honestly surprises me because I had expected this "always true expression" would have been optimized away by the planner). > EXPLAIN (VERBOSE, ANALYZE) SELECT id1, id2, id3 FROM test WHERE id1 = 1 AND (id2 IS NULL OR id2 IS NOT NULL) AND id3 =101; QUERY PLAN ───────────────────────────────────────────────────── Index Only Scan using test_idx on public.test (cost=0.42..12809.10 rows=1 width=12) (actual time=0.027..11.234 rows=1 loops=1) Output: id1, id2, id3 Index Cond: ((test.id1 = 1) AND (test.id3 = 101)) Filter: ((test.id2 IS NULL) OR (test.id2 IS NOT NULL)) > What about cases where we legitimately have to vary our strategy > during the same index scan? Would my new suggestion above address this? > In fact, that'd make sense even today, without skip scan (just with > the 17 work on nbtree SAOP scans). Even with regular SAOP nbtree index > scans, the number of primitive scans is hard to predict, and quite > indicative of what's really going on with the scan. *googles nbtree SAOP scans and finds the very helpful[1]* Yes, I feel like this should definitely be part of the ANALYZE output. Seeing how Lukas has to look at pg_stat_user_tables to get this information seems quite annoying[2] and only possible on systems that have no concurrent queries. So it sounds like we'd want a "Primitive Index Scans" counter in ANALYZE too. In addition to the number of filtered rows by, which if we go with my proposal above should probably be called "Rows Removed by Index Cond". [1]: https://www.youtube.com/watch?v=jg2KeSB5DR8 [2]: https://youtu.be/jg2KeSB5DR8?t=188
On Fri, 28 Jun 2024 at 10:59, Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > > On Fri, 28 Jun 2024 at 00:41, Peter Geoghegan <pg@bowt.ie> wrote: > > Typically, no, it won't be. But there's really no telling for sure. > > The access patterns for a composite index on '(a, b)' with a qual > > "WHERE b = 5" are identical to a qual explicitly written "WHERE a = > > any(<every possible value in 'a'>) AND b = 5". > > Hmm, that's true. But in that case the explain plan gives a clear hint > that something like that might be going on, because you'll see: > > Index Cond: a = any(<every possible value in 'a'>) AND b = 5 > > That does make me think of another way, and maybe more "correct" way, > of representing Masahiros situation than adding a new "Skip Scan Cond" > row to the EXPLAIN output. We could explicitly include a comparison to > all prefix columns in the Index Cond: > > Index Cond: ((test.id1 = 1) AND (test.id2 = ANY) AND (test.id3 = 101)) > > Or if you want to go with syntactically correct SQL we could do the following: > > Index Cond: ((test.id1 = 1) AND ((test.id2 IS NULL) OR (test.id2 IS > NOT NULL)) AND (test.id3 = 101)) > > An additional benefit this provides is that you now know which > additional column you should use a more specific filter on to speed up > the query. In this case test.id2 > > OTOH, maybe it's not a great way because actually running that puts > the IS NULL+ IS NOT NULL query in the Filter clause (which honestly > surprises me because I had expected this "always true expression" > would have been optimized away by the planner). > > > EXPLAIN (VERBOSE, ANALYZE) SELECT id1, id2, id3 FROM test WHERE id1 = 1 AND (id2 IS NULL OR id2 IS NOT NULL) AND id3= 101; > QUERY PLAN > ───────────────────────────────────────────────────── > Index Only Scan using test_idx on public.test (cost=0.42..12809.10 > rows=1 width=12) (actual time=0.027..11.234 rows=1 loops=1) > Output: id1, id2, id3 > Index Cond: ((test.id1 = 1) AND (test.id3 = 101)) > Filter: ((test.id2 IS NULL) OR (test.id2 IS NOT NULL)) > > > What about cases where we legitimately have to vary our strategy > > during the same index scan? > > Would my new suggestion above address this? > > > In fact, that'd make sense even today, without skip scan (just with > > the 17 work on nbtree SAOP scans). Even with regular SAOP nbtree index > > scans, the number of primitive scans is hard to predict, and quite > > indicative of what's really going on with the scan. > > *googles nbtree SAOP scans and finds the very helpful[1]* > > Yes, I feel like this should definitely be part of the ANALYZE output. > Seeing how Lukas has to look at pg_stat_user_tables to get this > information seems quite annoying[2] and only possible on systems that > have no concurrent queries. > > So it sounds like we'd want a "Primitive Index Scans" counter in > ANALYZE too. In addition to the number of filtered rows by, which if > we go with my proposal above should probably be called "Rows Removed > by Index Cond". This all just made me more confident that this shows a need to enable index AMs to provide output for EXPLAIN: The knowledge about how index scankeys are actually used is exclusively known to the index AM, because the strategies are often unique to the index AM (or even chosen operator classes), and sometimes can only be applied at runtime: while the index scankeys' sizes, attribute numbers and operators are known in advance (even if not all arguments are filled in; `FROM a JOIN b ON b.id = ANY (a.ref_array)`), the AM can at least show what strategy it is likely going to choose, and how (in)efficient that strategy could be. Right now, Masahiro-san's patch tries to do that with an additional field in IndexPath populated (by proxy) exclusively in btcostestimate. I think that design is wrong, because it wires explain- and btree-specific data through the planner, adding overhead everywhere which is only used for btree- and btree-compatible indexes. I think the better choice would be adding an IndexAmRoutine->amexplain support function, which would get called in e.g. explain.c's ExplainIndexScanDetails to populate a new "Index Scan Details" (name to be bikeshed) subsection of explain plans. This would certainly be possible, as the essentials for outputting things to EXPLAIN are readily available in the explain.h header. Kind regards, Matthias van de Meent Neon (https://neon.tech)
On Thu, Jun 27, 2024 at 11:06 PM <Masahiro.Ikeda@nttdata.com> wrote: > OK. I would like to understand more about your proposed patch. I > have also registered as a reviewer in the commitfests entry. Great! > Although I haven't looked on your patch yet, if it's difficult to know > how it can optimize during the planning phase, it's enough for me to just > show "Skip Scan Cond (or Non-Key Filter)". This is because users can > understand that inefficient index scans *may* occur. That makes sense. The goal of your patch is to highlight when an index scan is using an index that is suboptimal for a particular query (a query that the user runs through EXPLAIN or EXPLAIN ANALYZE). The underlying rules that determine "access predicate vs. filter predicate" are not very complicated -- they're intuitive, even. But even an expert can easily make a mistake on a bad day. It seems to me that all your patch really needs to do is to give the user a friendly nudge in that direction, when it makes sense to. You want to subtly suggest to the user "hey, are you sure that the index the plan uses is exactly what you expected?". Fortunately, even when skip scan works well that should still be a useful nudge. If we assume that the query that the user is looking at is much more important than other queries, then the user really shouldn't be using skip scan in the first place. Even a good skip scan is a little suspicious (it's okay if it "stands out" a bit). > In terms of the concept of EXPLAIN output, I thought that runtime partition > pruning is similar. "EXPLAIN without ANALYZE" only shows the possibilities and > "EXPLAIN ANALYZE" shows the actual results. That seems logical. -- Peter Geoghegan
On 2024-06-28 21:05, Matthias van de Meent wrote: > On Fri, 28 Jun 2024 at 10:59, Jelte Fennema-Nio <postgres@jeltef.nl> wrote: >> >> On Fri, 28 Jun 2024 at 00:41, Peter Geoghegan <pg@bowt.ie> wrote: >> > Typically, no, it won't be. But there's really no telling for sure. >> > The access patterns for a composite index on '(a, b)' with a qual >> > "WHERE b = 5" are identical to a qual explicitly written "WHERE a = >> > any(<every possible value in 'a'>) AND b = 5". >> >> Hmm, that's true. But in that case the explain plan gives a clear hint >> that something like that might be going on, because you'll see: >> >> Index Cond: a = any(<every possible value in 'a'>) AND b = 5 >> >> That does make me think of another way, and maybe more "correct" way, >> of representing Masahiros situation than adding a new "Skip Scan Cond" >> row to the EXPLAIN output. We could explicitly include a comparison to >> all prefix columns in the Index Cond: >> >> Index Cond: ((test.id1 = 1) AND (test.id2 = ANY) AND (test.id3 = 101)) >> >> Or if you want to go with syntactically correct SQL we could do the following: >> >> Index Cond: ((test.id1 = 1) AND ((test.id2 IS NULL) OR (test.id2 IS >> NOT NULL)) AND (test.id3 = 101)) >> >> An additional benefit this provides is that you now know which >> additional column you should use a more specific filter on to speed up >> the query. In this case test.id2 >> >> OTOH, maybe it's not a great way because actually running that puts >> the IS NULL+ IS NOT NULL query in the Filter clause (which honestly >> surprises me because I had expected this "always true expression" >> would have been optimized away by the planner). >> >> > EXPLAIN (VERBOSE, ANALYZE) SELECT id1, id2, id3 FROM test WHERE id1 = 1 AND (id2 IS NULL OR id2 IS NOT NULL) AND id3= 101; >> QUERY PLAN >> ───────────────────────────────────────────────────── >> Index Only Scan using test_idx on public.test (cost=0.42..12809.10 >> rows=1 width=12) (actual time=0.027..11.234 rows=1 loops=1) >> Output: id1, id2, id3 >> Index Cond: ((test.id1 = 1) AND (test.id3 = 101)) >> Filter: ((test.id2 IS NULL) OR (test.id2 IS NOT NULL)) >> >> > What about cases where we legitimately have to vary our strategy >> > during the same index scan? >> >> Would my new suggestion above address this? >> >> > In fact, that'd make sense even today, without skip scan (just with >> > the 17 work on nbtree SAOP scans). Even with regular SAOP nbtree index >> > scans, the number of primitive scans is hard to predict, and quite >> > indicative of what's really going on with the scan. >> >> *googles nbtree SAOP scans and finds the very helpful[1]* >> >> Yes, I feel like this should definitely be part of the ANALYZE output. >> Seeing how Lukas has to look at pg_stat_user_tables to get this >> information seems quite annoying[2] and only possible on systems that >> have no concurrent queries. >> >> So it sounds like we'd want a "Primitive Index Scans" counter in >> ANALYZE too. In addition to the number of filtered rows by, which if >> we go with my proposal above should probably be called "Rows Removed >> by Index Cond". > > This all just made me more confident that this shows a need to enable > index AMs to provide output for EXPLAIN: The knowledge about how index > scankeys are actually used is exclusively known to the index AM, > because the strategies are often unique to the index AM (or even > chosen operator classes), and sometimes can only be applied at > runtime: while the index scankeys' sizes, attribute numbers and > operators are known in advance (even if not all arguments are filled > in; `FROM a JOIN b ON b.id = ANY (a.ref_array)`), the AM can at least > show what strategy it is likely going to choose, and how (in)efficient > that strategy could be. > > Right now, Masahiro-san's patch tries to do that with an additional > field in IndexPath populated (by proxy) exclusively in btcostestimate. > I think that design is wrong, because it wires explain- and > btree-specific data through the planner, adding overhead everywhere > which is only used for btree- and btree-compatible indexes. > > I think the better choice would be adding an IndexAmRoutine->amexplain > support function, which would get called in e.g. explain.c's > ExplainIndexScanDetails to populate a new "Index Scan Details" (name > to be bikeshed) subsection of explain plans. This would certainly be > possible, as the essentials for outputting things to EXPLAIN are > readily available in the explain.h header. Yes, that's one of my concerns. I agree to add IndexAmRoutine->amexplain is better because we can support several use cases. Although I'm not confident to add only IndexAmRoutine->amexplain is enough now, I'll make a PoC patch to confirm it. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On 2024-06-29 03:27, Peter Geoghegan wrote: > On Thu, Jun 27, 2024 at 11:06 PM <Masahiro.Ikeda@nttdata.com> wrote: >> Although I haven't looked on your patch yet, if it's difficult to know >> how it can optimize during the planning phase, it's enough for me to just >> show "Skip Scan Cond (or Non-Key Filter)". This is because users can >> understand that inefficient index scans *may* occur. > > That makes sense. > > The goal of your patch is to highlight when an index scan is using an > index that is suboptimal for a particular query (a query that the user > runs through EXPLAIN or EXPLAIN ANALYZE). The underlying rules that > determine "access predicate vs. filter predicate" are not very > complicated -- they're intuitive, even. But even an expert can easily > make a mistake on a bad day. > > It seems to me that all your patch really needs to do is to give the > user a friendly nudge in that direction, when it makes sense to. You > want to subtly suggest to the user "hey, are you sure that the index > the plan uses is exactly what you expected?". Fortunately, even when > skip scan works well that should still be a useful nudge. If we assume > that the query that the user is looking at is much more important than > other queries, then the user really shouldn't be using skip scan in > the first place. Even a good skip scan is a little suspicious (it's > okay if it "stands out" a bit). Yes, you're right. I'd like users to take the chance easily. -- Masahiro Ikeda NTT DATA CORPORATION
> > I think the better choice would be adding an IndexAmRoutine->amexplain > > support function, which would get called in e.g. explain.c's > > ExplainIndexScanDetails to populate a new "Index Scan Details" (name > > to be bikeshed) subsection of explain plans. This would certainly be > > possible, as the essentials for outputting things to EXPLAIN are > > readily available in the explain.h header. > > Yes, that's one of my concerns. I agree to add IndexAmRoutine->amexplain is better > because we can support several use cases. > > Although I'm not confident to add only IndexAmRoutine->amexplain is enough now, I'll > make a PoC patch to confirm it. I attached the patch adding an IndexAmRoutine->amexplain. This patch changes following. * add a new index AM function "amexplain_function()" and it's called in ExplainNode() Although I tried to add it in ExplainIndexScanDetails(), I think it's not the proper place to show quals. So, amexplain_function() will call after calling show_scanqual() in the patch. * add "amexplain_function" for B-Tree index and show "Non Key Filter" if VERBOSE is specified To avoid confusion with INCLUDE-d columns and non-index column "Filter", I've decided to output only with the VERBOSE option. However, I'm not sure if this is the appropriate solution. It might be a good idea to include words like 'b-tree' to make it clear that it's an output specific to b-tree index. -- Example dataset CREATE TABLE test (id1 int, id2 int, id3 int, value varchar(32)); CREATE INDEX test_idx ON test(id1, id2, id3); -- multicolumn B-Tree index INSERT INTO test (SELECT i % 2, i, i, 'hello' FROM generate_series(1,1000000) s(i)); ANALYZE; -- The output is same as without this patch if it can search efficiently =# EXPLAIN (VERBOSE, ANALYZE, BUFFERS, MEMORY, SERIALIZE) SELECT id3 FROM test WHERE id1 = 1 AND id2 = 101; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Index Only Scan using test_idx on public.test (cost=0.42..4.44 rows=1 width=4) (actual time=0.058..0.060 rows=1 loops=1) Output: id3 Index Cond: ((test.id1 = 1) AND (test.id2 = 101)) Heap Fetches: 0 Buffers: shared hit=4 Planning: Memory: used=14kB allocated=16kB Planning Time: 0.166 ms Serialization: time=0.009 ms output=1kB format=text Execution Time: 0.095 ms (10 rows) -- "Non Key Filter" will be displayed if it will scan index tuples and filter them =# EXPLAIN (VERBOSE, ANALYZE, BUFFERS, MEMORY, SERIALIZE) SELECT id3 FROM test WHERE id1 = 1 AND id3 = 101; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using test_idx on public.test (cost=0.42..12724.10 rows=1 width=4) (actual time=0.055..69.446 rows=1 loops=1) Output: id3 Index Cond: ((test.id1 = 1) AND (test.id3 = 101)) Heap Fetches: 0 Non Key Filter: (test.id3 = 101) Buffers: shared hit=1920 Planning: Memory: used=14kB allocated=16kB Planning Time: 0.113 ms Serialization: time=0.004 ms output=1kB format=text Execution Time: 69.491 ms (11 rows) Although I plan to support "Rows Removed by Non Key Filtered"(or "Skip Scan Filtered"), I'd like to know whether the current direction is good. One of my concerns is there might be a better way to exact quals for boundary conditions in btexplain(). Regards, -- Masahiro Ikeda NTT DATA CORPORATION