IN not handled very well? - Mailing list pgsql-performance
From | Ben |
---|---|
Subject | IN not handled very well? |
Date | |
Msg-id | 37536ADC-0D18-485C-9F77-B875D69FBCAA@silentmedia.com Whole thread Raw |
Responses |
Re: IN not handled very well?
|
List | pgsql-performance |
I've got this query with an IN clause: select count(*),public.album.gid,public.album.name,public.album.id from public.album,public.albumjoin,public.puid,public.puidjoin where albumjoin.album = public.album.id and public.puidjoin.track = public.albumjoin.track and public.puid.id = public.puidjoin.puid and public.puid.puid IN (select umdb.puid.name from umdb.puid,umdb.node where umdb.puid.id = umdb.node.puid and umdb.node.dir=5886) group by gid,name,public.album.id having count(*) >= 6 order by count(*) desc; It gives me a rather expensive plan: Sort (cost=35729.07..35729.75 rows=272 width=69) Sort Key: count(*) -> HashAggregate (cost=35713.31..35718.07 rows=272 width=69) Filter: (count(*) >= 6) -> Nested Loop (cost=51.67..35709.91 rows=272 width=69) -> Nested Loop (cost=51.67..34216.30 rows=272 width=4) -> Nested Loop (cost=51.67..33338.04 rows=272 width=4) -> Hash IN Join (cost=51.67..31794.72 rows=218 width=4) Hash Cond: (("outer".puid)::text = "inner".name) -> Seq Scan on puid (cost=0.00..23495.21 rows=1099421 width=44) -> Hash (cost=51.63..51.63 rows=15 width=40) -> Nested Loop (cost=0.00..51.63 rows=15 width=40) -> Index Scan using node_dir on node (cost=0.00..3.22 rows=16 width=4) Index Cond: (dir = 5886) -> Index Scan using puid_pkey on puid (cost=0.00..3.01 rows=1 width=44) Index Cond: (puid.id = "outer".puid) -> Index Scan using puidjoin_puidtrack on puidjoin (cost=0.00..7.05 rows=2 width=8) Index Cond: ("outer".id = puidjoin.puid) -> Index Scan using albumjoin_trackindex on albumjoin (cost=0.00..3.22 rows=1 width=8) Index Cond: ("outer".track = albumjoin.track) -> Index Scan using album_pkey on album (cost=0.00..5.48 rows=1 width=69) Index Cond: ("outer".album = album.id) If I'm reading this right, it looks like it's expensive because it's doing a sequential scan on public.puid.puid to satisfy the IN clause. (Although why it's doing that I'm not sure, given that there's a recently analyzed index on public.puid.puid.) Interestingly, if I replace that IN subselect with the 15 values it will return, my plan improves by two orders of magnitude: Sort (cost=235.53..235.56 rows=12 width=69) Sort Key: count(*) -> HashAggregate (cost=235.11..235.32 rows=12 width=69) Filter: (count(*) >= 6) -> Nested Loop (cost=20.03..234.96 rows=12 width=69) -> Nested Loop (cost=20.03..169.06 rows=12 width=4) -> Nested Loop (cost=20.03..130.32 rows=12 width=4) -> Bitmap Heap Scan on puid (cost=20.03..59.52 rows=10 width=4) Recheck Cond: ((puid = 'f68dcf86-992c-2e4a-21fb-2fc8c56edfeb'::bpchar) OR (puid = '7716dbcf-56ab-623b-ab33-3b2e67a0727c'::bpchar) OR (puid = '724d6a39-0d15-a296-2dd2-127c34f13809'::bpchar) OR (puid = '02f0cd9f-9fa5-abda-06cd-5dbb13826243'::bpchar) OR (puid = '165d5bea- b21f-9302-b991-0927f491787b'::bpchar) OR (puid = '4223dbc8-85af-a92e- b63d-72a726475e2c'::bpchar) OR (puid = '2d43ef9a- c7ee-2425-7fac-8b937cbed178'::bpchar) OR (puid = '9ff81c2f-04b7- cf5d-705f-7b944a5ae093'::bpchar) OR (puid = 'deaddddd-dfaf-18dd-6d4d- c483e8ba60f7'::bpchar) OR (puid = '20939b69- ff98-770a-1444-3b0e9892712f'::bpchar)) -> BitmapOr (cost=20.03..20.03 rows=10 width=0) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = 'f68dcf86-992c-2e4a-21fb-2fc8c56edfeb'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '7716dbcf-56ab-623b-ab33-3b2e67a0727c'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '724d6a39-0d15-a296-2dd2-127c34f13809'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '02f0cd9f-9fa5-abda-06cd-5dbb13826243'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '165d5bea-b21f-9302-b991-0927f491787b'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '4223dbc8-85af-a92e-b63d-72a726475e2c'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '2d43ef9a-c7ee-2425-7fac-8b937cbed178'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '9ff81c2f-04b7-cf5d-705f-7b944a5ae093'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = 'deaddddd-dfaf-18dd-6d4d-c483e8ba60f7'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '20939b69-ff98-770a-1444-3b0e9892712f'::bpchar) -> Index Scan using puidjoin_puidtrack on puidjoin (cost=0.00..7.05 rows=2 width=8) Index Cond: ("outer".id = puidjoin.puid) -> Index Scan using albumjoin_trackindex on albumjoin (cost=0.00..3.22 rows=1 width=8) Index Cond: ("outer".track = albumjoin.track) -> Index Scan using album_pkey on album (cost=0.00..5.48 rows=1 width=69) Index Cond: ("outer".album = album.id) I guess my question is: if postgres is (correctly) estimating that only 15 rows will come out of the subselect, and it knows it can choose a much better plan with bitmap index scans, should it be able to choose the bitmap plan over the sequential scan? Or should I run the subselect myself and then rewrite my query to push in the constant values?
pgsql-performance by date: