Re: Query plan choice issue - Mailing list pgsql-general
From | Yaroslav Tykhiy |
---|---|
Subject | Re: Query plan choice issue |
Date | |
Msg-id | 5CA2290E-801A-4C7F-821F-805C831A8C9A@barnet.com.au Whole thread Raw |
In response to | Re: Query plan choice issue (Martin Gainty <mgainty@hotmail.com>) |
Responses |
Re: Query plan choice issue
|
List | pgsql-general |
Hi Martin, Thank you for your response! On 13/09/2010, at 10:49 AM, Martin Gainty wrote: > a cursory look of the plan details a FTS on dbmail_headername > invoked by the JOIN clause > JOIN dbmail_headername n ON v.headername_id=n.id > you would accelerate the seek appreciably by placing indexes on both > participating columns > v.headername_id > n.id Granted, there was no index on v.headername_id but creating one just slowed the query down, with a different plan: CREATE INDEX dbmail_headervalue_testing ON dbmail_headervalue (headername_id); EXPLAIN ANALYSE ... QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=222020.81..222020.81 rows=1 width=8) (actual time=28636.426..28636.426 rows=0 loops=1) Sort Key: m.message_idnr Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=1409.82..222020.80 rows=1 width=8) (actual time=28636.409..28636.409 rows=0 loops=1) -> Nested Loop (cost=1409.82..222012.27 rows=1 width=24) (actual time=28636.405..28636.405 rows=0 loops=1) -> Nested Loop (cost=1409.82..221959.94 rows=6 width=8) (actual time=28543.441..28624.750 rows=1 loops=1) -> Seq Scan on dbmail_headername n (cost=0.00..111.17 rows=1 width=8) (actual time=0.022..1.114 rows=1 loops=1) Filter: ((headername)::text ~~* 'MESSAGE- ID'::text) -> Bitmap Heap Scan on dbmail_headervalue v (cost=1409.82..221813.70 rows=2805 width=16) (actual time=28543.411..28623.623 rows=1 loops=1) Recheck Cond: (v.headername_id = n.id) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<...@mail.gmail.com>%'::text) -> Bitmap Index Scan on dbmail_headervalue_testing (cost=0.00..1409.82 rows=75940 width=0) (actual time=17555.572..17555.572 rows=1877009 loops=1) Index Cond: (v.headername_id = n.id) -> Index Scan using dbmail_messages_physmessage_idx on dbmail_messages m (cost=0.00..8.71 rows=1 width=16) (actual time=11.646..11.646 rows=0 loops=1) Index Cond: (m.physmessage_id = v.physmessage_id) Filter: ((m.status = ANY ('{0,1}'::integer[])) AND (m.mailbox_idnr = 12345)) -> Index Scan using dbmail_physmessage_pkey on dbmail_physmessage p (cost=0.00..8.52 rows=1 width=8) (never executed) Index Cond: (p.id = m.physmessage_id) Total runtime: 28636.517 ms (19 rows) > I also see a FTS on domain_headervalue invoked by the JOIN cluase > JOIN dbmail_headervalue v ON v.physmessage_id=p.id > place indexes on both columns > v.physmessage_id > p.id Both columns already indexed here: On public.dbmail_headervalue (alias v): "dbmail_headervalue_2" btree (physmessage_id) On public.dbmail_physmessage (alias p): "dbmail_physmessage_pkey" PRIMARY KEY, btree (id) Perhaps I should provide some data on the table sizes. dbmail_headervalue is the largest table with respect to its record count: 36 million records. dbmail_headername is small: 5640 records. dbmail_physmessage and dbmail_messages are of an average size: ~2 million records each. Sorry for my cluelessness on this issue. But Postgresql's ability to build a fast query plan for this query type at least occasionally is encouraging. :-) Yar > the join clause JOIN dbmail_physmessage p ON m.physmessage_id=p.id > uses indexed for both participants > > Martin > ______________________________________________ > Verzicht und Vertraulichkeitanmerkung > > Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene > Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede > unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. > Diese Nachricht dient lediglich dem Austausch von Informationen und > entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten > Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den > Inhalt uebernehmen. > > >> From: yar@barnet.com.au >> To: pgsql-general@postgresql.org >> Subject: [GENERAL] Query plan choice issue >> Date: Mon, 13 Sep 2010 09:36:35 +1000 >> >> Hi all, >> >> I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can >> be executed using either of two different query plans, one taking a >> few milliseconds and the other, tens of seconds. The work_mem setting >> doesn't seem to affect it -- tried to increase or decrease it by 2 or >> 4 times, but it didn't seem to favour the fast plan choice. Honestly, >> I have no idea what affects the plan choice, but I saw Postgresql >> change it at random. >> >> The query in question looks like this -- sorry, it's rather complex: >> >> SELECT message_idnr >> FROM dbmail_messages m >> JOIN dbmail_physmessage p ON m.physmessage_id=p.id >> JOIN dbmail_headervalue v ON v.physmessage_id=p.id >> JOIN dbmail_headername n ON v.headername_id=n.id >> WHERE mailbox_idnr = 12345 AND status IN (0,1) AND >> headername ILIKE 'MESSAGE-ID' AND SUBSTRING(headervalue,0,255) >> ILIKE '%<...@mail.gmail.com>%' >> ORDER BY message_idnr; >> >> It comes from DBMail. That said, I don't think DBMail can be blamed >> here because at least sometimes Postgresql is able to do the right >> thing with respect to that query. >> >> Here is the slow plan: >> QUERY >> PLAN >> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> Sort (cost=1234610.11..1234610.11 rows=1 width=8) (actual >> time=20933.166..20933.168 rows=1 loops=1) >> Sort Key: m.message_idnr >> Sort Method: quicksort Memory: 25kB >> -> Nested Loop (cost=0.00..1234610.10 rows=1 width=8) (actual >> time=3327.658..20933.122 rows=1 loops=1) >> -> Nested Loop (cost=0.00..1234601.59 rows=1 width=24) >> (actual time=3327.599..20933.056 rows=1 loops=1) >> -> Nested Loop (cost=0.00..1234549.74 rows=6 width=8) >> (actual time=3327.503..20932.944 rows=1 loops=1) >> Join Filter: (v.headername_id = n.id) >> -> Seq Scan on dbmail_headername n >> (cost=0.00..108.46 rows=1 width=8) (actual time=0.027..0.985 rows=1 >> loops=1) >> Filter: ((headername)::text ~~* 'MESSAGE- >> ID'::text) >> -> Seq Scan on dbmail_headervalue v >> (cost=0.00..1234407.96 rows=2666 width=16) (actual >> time=3327.465..20931.942 rows=1 loops=1) >> Filter: ("substring"(v.headervalue, 0, >> 255) ~~* '%<...@mail.gmail.com>%'::text) >> -> Index Scan using dbmail_messages_physmessage_idx on >> dbmail_messages m (cost=0.00..8.63 rows=1 width=16) (actual >> time=0.088..0.100 rows=1 loops=1) >> Index Cond: (m.physmessage_id = v.physmessage_id) >> Filter: ((m.status = ANY ('{0,1}'::integer[])) >> AND (m.mailbox_idnr = 12345)) >> -> Index Scan using dbmail_physmessage_pkey on >> dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual >> time=0.048..0.050 rows=1 loops=1) >> Index Cond: (p.id = m.physmessage_id) >> Total runtime: 20933.241 ms >> (17 rows) >> >> And here is the fast plan: >> QUERY >> PLAN >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> Sort (cost=1175284.93..1175284.94 rows=1 width=8) (actual >> time=14.163..14.165 rows=1 loops=1) >> Sort Key: m.message_idnr >> Sort Method: quicksort Memory: 25kB >> -> Nested Loop (cost=0.00..1175284.92 rows=1 width=8) (actual >> time=4.272..14.152 rows=1 loops=1) >> -> Nested Loop (cost=0.00..1175276.41 rows=1 width=24) >> (actual time=4.261..14.135 rows=1 loops=1) >> -> Nested Loop (cost=0.00..1175268.46 rows=1 width=32) >> (actual time=4.249..14.117 rows=1 loops=1) >> -> Index Scan using dbmail_messages_mailbox_idx >> on dbmail_messages m (cost=0.00..4153.35 rows=786 width=16) (actual >> time=0.043..2.810 rows=358 loops=1) >> Index Cond: (mailbox_idnr = 12345) >> Filter: (status = ANY ('{0,1}'::integer[])) >> -> Index Scan using dbmail_headervalue_2 on >> dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual >> time=0.028..0.029 rows=0 loops=358) >> Index Cond: (v.physmessage_id = >> m.physmessage_id) >> Filter: ("substring"(v.headervalue, 0, >> 255) ~~* '%<...@mail.gmail.com>%'::text) >> -> Index Scan using dbmail_headername_pkey on >> dbmail_headername n (cost=0.00..7.94 rows=1 width=8) (actual >> time=0.007..0.008 rows=1 loops=1) >> Index Cond: (n.id = v.headername_id) >> Filter: ((n.headername)::text ~~* 'MESSAGE- >> ID'::text) >> -> Index Scan using dbmail_physmessage_pkey on >> dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual >> time=0.006..0.007 rows=1 loops=1) >> Index Cond: (p.id = m.physmessage_id) >> Total runtime: 14.231 ms >> (18 rows) >> >> Do you think this query plan choice can be affected by >> postgresql.conf >> settings? Or shall I file a bug report? >> >> I'll be happy to provide the table and index structure details if >> needed. >> >> Thanks! >> >> Yar >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general >
pgsql-general by date: