Thread: Index not used in certain nested views but not in others
Dear List, I know how tedious mails with a subject of the type "I don't understand what the planner does" are, but on this one I'm really stumped. Regrettably, the situation is also a bit complex. Hopefully, someone will bear with me. So, in a PostgreSQL 15.12 I have a view over a single table with ~20 columns (the only relevant columns here are the ones that somehow contain "pub[lisher]_did", the others are just there for context; I'm going to call these "pubdids" from here on in the prose): CREATE OR REPLACE VIEW ivoa.obs_radio AS SELECT main.obs_publisher_did, main.s_resolution_min, main.s_resolution_max, NULL::real AS s_fov_min, [...] FROM emi.main (emi.main is a physical table). There is another view made up of about 20 tables, looking somewhat like this: CREATE OR REPLACE VIEW ivoa.obscore AS SELECT 'image'::text AS dataproduct_type, NULL::text AS dataproduct_subtype, 2::smallint AS calib_level, 'PPAKM31'::text AS obs_collection, [...] 'ivo://org.gavo.dc/~?'::text || gavo_urlescape(maps.accref) AS obs_publisher_did, [...] FROM ppakm31.maps UNION ALL [lots of similar definitions] UNION ALL SELECT ssa.ssa_dstype AS dataproduct_type, NULL::text AS dataproduct_subtype, [...] ssa.ssa_pubdid AS obs_publisher_did, [...] FROM dfbsspec.ssa UNION ALL [and still more] The dfbsspec.ssa in this definition is another view: CREATE OR REPLACE VIEW dfbsspec.ssa AS SELECT q.accref, q.owner, [...] q.ssa_pubdid, [...] FROM ( SELECT raw_spectra.accref, [...] raw_spectra.pub_did AS ssa_pubdid, [...] FROM dfbsspec.raw_spectra LEFT JOIN dfbsspec.platemeta ON platemeta.plateid = raw_spectra.plate) q raw_spectra finally is a physical table that has an index: "raw_spectra_pub_did" btree (pub_did) The first view, ivoa.obs_radio, is just a few hundred records, dfbsspec.raw_spectra is about 23 Megarows, the total ivoa.obscore is about 100 MRows which occasionally change, so materialising it is *really* unattractive. The pubdids are strings of about 40 characters. You may argue that this whole system looks a bit insane, but of course this is part of a large metadata handling suite, and all these views are, in some sense, more or less automatic adaptations to different metadata schemes, and dramatic simplifications are at least not entriely trivial. So, can you assume for the moment that I can't get rid of the nested views? Now, when I say EXPLAIN ANALYZE SELECT COUNT(*) FROM ivoa.obscore JOIN ivoa.obs_radio USING (obs_publisher_did); I get: Finalize Aggregate (cost=5114082.70..5114082.71 rows=1 width=8) (actual time=22595.715..22731.950 rows=1 loops=1) [...] -> Parallel Append (cost=0.56..4800918.33 rows=19267799 width=40) (actual time=1.566..18985.964rows=15410027 loops=5) -> Parallel Index Only Scan using phot_r_pkey on phot_r (cost=0.56..754384.72 rows=5118036width=32) (actual time=0.854..7995.762 rows=10197024 loops=2) Heap Fetches: 0 [...and lot more of these that have simple pubdid indexes on plain tables, the point being: Postgres *does* use pubdid indexes...] -> Subquery Scan on "*SELECT* 13" (cost=0.00..2685028.32 rows=5803266 width=58) (actualtime=0.142..7554.269 rows=4642657 loops=5) -> Parallel Seq Scan on raw_spectra (cost=0.00..2626995.66 rows=5803266 width=756)(actual time=0.137..6841.379 rows=4642657 loops=5) [... and a few more seqscans where there's no index on the pubdid because they are small, and one or two similar cases] My problem is: I can't seem to figure out why Postgres chooses to ignore the pubdid index on raw_spectra.pub_did and instead does the time-consuming seqscan. I thought maybe the genetic optimiser has kicked in because of the large number of tables and SELECTs in there and chose a suboptimal plan. But switching off the genetic optimiser doesn't change the plan. Trying to investigate more closely, I wanted to simplify the situation and created a view like ivoa.obscore but only having the evil table in it: CREATE TEMPORARY VIEW bla AS (SELECT [...] CAST(ssa_pubdid AS text) AS obs_publisher_did, [...] FROM dfbsspec.ssa) When I then say EXPLAIN ANALYZE SELECT COUNT(*) FROM ivoa.obs_radio JOIN bla USING (obs_publisher_did); the query plan looks like this: Aggregate (cost=4873.00..4873.01 rows=1 width=8) (actual time=2.484..2.486 rows=1 loops=1) -> Nested Loop (cost=0.56..4871.60 rows=561 width=0) (actual time=2.478..2.479 rows=0 loops=1) -> Seq Scan on main (cost=0.00..52.61 rows=561 width=48) (actual time=0.011..0.317 rows=561 loops=1) -> Index Scan using raw_spectra_pub_did on raw_spectra (cost=0.56..8.58 rows=1 width=66) (actual time=0.003..0.003rows=0 loops=561) Index Cond: (pub_did = main.obs_publisher_did) Planning Time: 5.386 ms Execution Time: 2.750 ms -- exactly as it should. So, when the SELECT statement on dfbsspec.ssa stands along in the view definition, Postgres does the right thing; when the exact same query stands in a UNION ALL with other tables, Postgres doesn't use the index. Hu? Is there anything that would explain that behaviour given I've switched off the genetic optimiser and postgres has hopefully exhaustively searched the space of plans in both cases? Thanks a lot! -- Markus
On Wed, Apr 30, 2025 at 3:51 PM Markus Demleitner <msdemlei@ari.uni-heidelberg.de> wrote:
Dear List,
I know how tedious mails with a subject of the type "I don't understand
what the planner does" are, but on this one I'm really stumped.
Regrettably, the situation is also a bit complex. Hopefully, someone
will bear with me.
So, in a PostgreSQL 15.12 I have a view over a single table with ~20
columns (the only relevant columns here are the ones that somehow
contain "pub[lisher]_did", the others are just there for context; I'm
going to call these "pubdids" from here on in the prose):
[snip]
Is there anything that would explain that behaviour given I've switched
off the genetic optimiser and postgres has hopefully exhaustively
searched the space of plans in both cases?
Are the tables regularly analyzed and vacuumed? (The default autovacuum analyze threshold of 20% is pretty high.)
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
Markus Demleitner <msdemlei@ari.uni-heidelberg.de> writes: > So, when the SELECT statement on dfbsspec.ssa stands along in the view > definition, Postgres does the right thing; when the exact same query > stands in a UNION ALL with other tables, Postgres doesn't use the > index. Hu? It's hard to be sure when you've shown us no table definitions and only fragments of the view definitions. But I suspect what is happening here is that the view's UNIONs are causing a data type coercion of raw_spectra.pub_did before it gets to the top level of the view output. That might interfere with the planner's ability to see that the outer query's join operator is compatible with the table's index. regards, tom lane
On Wed, 2025-04-30 at 17:02 -0400, Tom Lane wrote: > Markus Demleitner <msdemlei@ari.uni-heidelberg.de> writes: > > So, when the SELECT statement on dfbsspec.ssa stands along in the view > > definition, Postgres does the right thing; when the exact same query > > stands in a UNION ALL with other tables, Postgres doesn't use the > > index. Hu? > > It's hard to be sure when you've shown us no table definitions and > only fragments of the view definitions. But I suspect what is > happening here is that the view's UNIONs are causing a data type > coercion of raw_spectra.pub_did before it gets to the top level > of the view output. That might interfere with the planner's ability > to see that the outer query's join operator is compatible with > the table's index. For a more detailed description of that problem, see https://www.cybertec-postgresql.com/en/union-all-data-types-performance/ Yours, Laurenz Albe
On 2025-04-30 17:16:47 +0200, Markus Demleitner wrote: > There is another view made up of about 20 tables, looking somewhat > like this: [...] > The first view, ivoa.obs_radio, is just a few hundred records, > dfbsspec.raw_spectra is about 23 Megarows, [...] > -> Parallel Seq Scan on raw_spectra (cost=0.00..2626995.66 rows=5803266 width=756)(actual time=0.137..6841.379 rows=4642657 loops=5) [...] > > My problem is: I can't seem to figure out why Postgres chooses to ignore > the pubdid index on raw_spectra.pub_did and instead does the > time-consuming seqscan. It estimates that it has to read 5803266 of those 23000000 rows. That's 25 %! I'm not surprised that it thinks just reading the whole table is faster than doing almost 6 million index lookups (The actual count is 4642657, so that estimate wasn't totally off). > Trying to investigate more closely, I wanted to simplify the > situation and created a view like ivoa.obscore but only having the > evil table in it: [...] > -> Nested Loop (cost=0.56..4871.60 rows=561 width=0) (actual time=2.478..2.479 rows=0 loops=1) > -> Seq Scan on main (cost=0.00..52.61 rows=561 width=48) (actual time=0.011..0.317 rows=561 loops=1) > -> Index Scan using raw_spectra_pub_did on raw_spectra (cost=0.56..8.58 rows=1 width=66) (actual time=0.003..0.003rows=0 loops=561) > Index Cond: (pub_did = main.obs_publisher_did) Here you select only 561 rows. That's just a tiny fraction of the whole table, so the optimizer estimates that doing a few hundred index lookups is faster than reading the whole table. > So, when the SELECT statement on dfbsspec.ssa stands along in the view > definition, Postgres does the right thing; when the exact same query > stands in a UNION ALL with other tables, Postgres doesn't use the > index. Hu? It is obviously not the exact same query if one of them need to read 10000 times as many rows. hjp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
"Peter J. Holzer" <hjp-pgsql@hjp.at> writes: > On 2025-04-30 17:16:47 +0200, Markus Demleitner wrote: >> dfbsspec.raw_spectra is about 23 Megarows, >> -> Parallel Seq Scan on raw_spectra (cost=0.00..2626995.66 rows=5803266 width=756) (actual time=0.137..6841.379 rows=4642657loops=5) > It estimates that it has to read 5803266 of those 23000000 rows. No, you're misreading that (I admit it's confusing). The rows report is the average per parallel worker, and the loops count indicates we had 5 workers. So actually this parallel seqscan emitted 4642657*5 = 23213285 rows, or the whole table, which is what should be expected given it has no filter condition and no LIMIT. (I am wondering why the estimate is only 5803266 rows, because I don't think that number is scaled for the number of workers...) >> -> Nested Loop (cost=0.56..4871.60 rows=561 width=0) (actual time=2.478..2.479 rows=0 loops=1) >> -> Seq Scan on main (cost=0.00..52.61 rows=561 width=48) (actual time=0.011..0.317 rows=561 loops=1) >> -> Index Scan using raw_spectra_pub_did on raw_spectra (cost=0.56..8.58 rows=1 width=66) (actual time=0.003..0.003rows=0 loops=561) >> Index Cond: (pub_did = main.obs_publisher_did) > Here you select only 561 rows. That's just a tiny fraction of the whole > table, so the optimizer estimates that doing a few hundred index lookups > is faster than reading the whole table. The point here is that the chosen plan shape allows pushing the join qual "raw_spectra.pub_did = main.obs_publisher_did" down to be an index condition, which is exactly what we have to do if we want to avoid reading all of raw_spectra. What Markus is complaining about is that that fails to happen if there's a UNION ALL in the way. Postgres is capable of doing that in other cases, so it's a fair question. regards, tom lane