Home > mailing lists
Re: Query plan choice issue - Mailing list pgsql-general

From	Yaroslav Tykhiy
Subject	Re: Query plan choice issue
Date	September 13, 2010 06:07:15
Msg-id	5CA2290E-801A-4C7F-821F-805C831A8C9A@barnet.com.au Whole thread Raw
In response to	Re: Query plan choice issue (Martin Gainty <mgainty@hotmail.com>)
Responses	Re: Query plan choice issue
List	pgsql-general
Tree view
Hi Martin,

Thank you for your response!

On 13/09/2010, at 10:49 AM, Martin Gainty wrote:

> a cursory look of the plan details a FTS on dbmail_headername
> invoked by the JOIN clause
> JOIN dbmail_headername n ON v.headername_id=n.id
> you would accelerate the seek appreciably by placing indexes on both
> participating columns
> v.headername_id
> n.id

Granted, there was no index on v.headername_id but creating one just
slowed the query down, with a different plan:

CREATE INDEX dbmail_headervalue_testing ON dbmail_headervalue
(headername_id);
EXPLAIN ANALYSE ...
                                                                                    QUERY
  PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=222020.81..222020.81 rows=1 width=8) (actual
time=28636.426..28636.426 rows=0 loops=1)
    Sort Key: m.message_idnr
    Sort Method:  quicksort  Memory: 25kB
    ->  Nested Loop  (cost=1409.82..222020.80 rows=1 width=8) (actual
time=28636.409..28636.409 rows=0 loops=1)
          ->  Nested Loop  (cost=1409.82..222012.27 rows=1 width=24)
(actual time=28636.405..28636.405 rows=0 loops=1)
                ->  Nested Loop  (cost=1409.82..221959.94 rows=6
width=8) (actual time=28543.441..28624.750 rows=1 loops=1)
                      ->  Seq Scan on dbmail_headername n
(cost=0.00..111.17 rows=1 width=8) (actual time=0.022..1.114 rows=1
loops=1)
                            Filter: ((headername)::text ~~* 'MESSAGE-
ID'::text)
                      ->  Bitmap Heap Scan on dbmail_headervalue v
(cost=1409.82..221813.70 rows=2805 width=16) (actual
time=28543.411..28623.623 rows=1 loops=1)
                            Recheck Cond: (v.headername_id = n.id)
                            Filter: ("substring"(v.headervalue, 0,
255) ~~* '%<...@mail.gmail.com>%'::text)
                            ->  Bitmap Index Scan on
dbmail_headervalue_testing  (cost=0.00..1409.82 rows=75940 width=0)
(actual time=17555.572..17555.572 rows=1877009 loops=1)
                                  Index Cond: (v.headername_id = n.id)
                ->  Index Scan using dbmail_messages_physmessage_idx
on dbmail_messages m  (cost=0.00..8.71 rows=1 width=16) (actual
time=11.646..11.646 rows=0 loops=1)
                      Index Cond: (m.physmessage_id = v.physmessage_id)
                      Filter: ((m.status = ANY ('{0,1}'::integer[]))
AND (m.mailbox_idnr = 12345))
          ->  Index Scan using dbmail_physmessage_pkey on
dbmail_physmessage p  (cost=0.00..8.52 rows=1 width=8) (never executed)
                Index Cond: (p.id = m.physmessage_id)
  Total runtime: 28636.517 ms
(19 rows)

> I also see a FTS on domain_headervalue invoked by the JOIN cluase
> JOIN dbmail_headervalue v ON v.physmessage_id=p.id
> place indexes on both columns
> v.physmessage_id
> p.id

Both columns already indexed here:

On public.dbmail_headervalue (alias v):
     "dbmail_headervalue_2" btree (physmessage_id)

On public.dbmail_physmessage (alias p):
     "dbmail_physmessage_pkey" PRIMARY KEY, btree (id)

Perhaps I should provide some data on the table sizes.

dbmail_headervalue is the largest table with respect to its record
count: 36 million records.
dbmail_headername is small: 5640 records.
dbmail_physmessage and dbmail_messages are of an average size: ~2
million records each.

Sorry for my cluelessness on this issue.  But Postgresql's ability to
build a fast query plan for this query type at least occasionally is
encouraging. :-)

Yar

> the join clause JOIN dbmail_physmessage p ON m.physmessage_id=p.id
> uses indexed for both participants
>
> Martin
> ______________________________________________
> Verzicht und Vertraulichkeitanmerkung
>
> Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene
> Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede
> unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig.
> Diese Nachricht dient lediglich dem Austausch von Informationen und
> entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten
> Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den
> Inhalt uebernehmen.
>
>
>> From: yar@barnet.com.au
>> To: pgsql-general@postgresql.org
>> Subject: [GENERAL] Query plan choice issue
>> Date: Mon, 13 Sep 2010 09:36:35 +1000
>>
>> Hi all,
>>
>> I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can
>> be executed using either of two different query plans, one taking a
>> few milliseconds and the other, tens of seconds. The work_mem setting
>> doesn't seem to affect it -- tried to increase or decrease it by 2 or
>> 4 times, but it didn't seem to favour the fast plan choice. Honestly,
>> I have no idea what affects the plan choice, but I saw Postgresql
>> change it at random.
>>
>> The query in question looks like this -- sorry, it's rather complex:
>>
>> SELECT message_idnr
>> FROM dbmail_messages m
>> JOIN dbmail_physmessage p ON m.physmessage_id=p.id
>> JOIN dbmail_headervalue v ON v.physmessage_id=p.id
>> JOIN dbmail_headername n ON v.headername_id=n.id
>> WHERE mailbox_idnr = 12345 AND status IN (0,1) AND
>> headername ILIKE 'MESSAGE-ID' AND SUBSTRING(headervalue,0,255)
>> ILIKE '%<...@mail.gmail.com>%'
>> ORDER BY message_idnr;
>>
>> It comes from DBMail. That said, I don't think DBMail can be blamed
>> here because at least sometimes Postgresql is able to do the right
>> thing with respect to that query.
>>
>> Here is the slow plan:
>> QUERY
>> PLAN
>>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Sort (cost=1234610.11..1234610.11 rows=1 width=8) (actual
>> time=20933.166..20933.168 rows=1 loops=1)
>> Sort Key: m.message_idnr
>> Sort Method: quicksort Memory: 25kB
>> -> Nested Loop (cost=0.00..1234610.10 rows=1 width=8) (actual
>> time=3327.658..20933.122 rows=1 loops=1)
>> -> Nested Loop (cost=0.00..1234601.59 rows=1 width=24)
>> (actual time=3327.599..20933.056 rows=1 loops=1)
>> -> Nested Loop (cost=0.00..1234549.74 rows=6 width=8)
>> (actual time=3327.503..20932.944 rows=1 loops=1)
>> Join Filter: (v.headername_id = n.id)
>> -> Seq Scan on dbmail_headername n
>> (cost=0.00..108.46 rows=1 width=8) (actual time=0.027..0.985 rows=1
>> loops=1)
>> Filter: ((headername)::text ~~* 'MESSAGE-
>> ID'::text)
>> -> Seq Scan on dbmail_headervalue v
>> (cost=0.00..1234407.96 rows=2666 width=16) (actual
>> time=3327.465..20931.942 rows=1 loops=1)
>> Filter: ("substring"(v.headervalue, 0,
>> 255) ~~* '%<...@mail.gmail.com>%'::text)
>> -> Index Scan using dbmail_messages_physmessage_idx on
>> dbmail_messages m (cost=0.00..8.63 rows=1 width=16) (actual
>> time=0.088..0.100 rows=1 loops=1)
>> Index Cond: (m.physmessage_id = v.physmessage_id)
>> Filter: ((m.status = ANY ('{0,1}'::integer[]))
>> AND (m.mailbox_idnr = 12345))
>> -> Index Scan using dbmail_physmessage_pkey on
>> dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual
>> time=0.048..0.050 rows=1 loops=1)
>> Index Cond: (p.id = m.physmessage_id)
>> Total runtime: 20933.241 ms
>> (17 rows)
>>
>> And here is the fast plan:
>> QUERY
>> PLAN
>>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Sort (cost=1175284.93..1175284.94 rows=1 width=8) (actual
>> time=14.163..14.165 rows=1 loops=1)
>> Sort Key: m.message_idnr
>> Sort Method: quicksort Memory: 25kB
>> -> Nested Loop (cost=0.00..1175284.92 rows=1 width=8) (actual
>> time=4.272..14.152 rows=1 loops=1)
>> -> Nested Loop (cost=0.00..1175276.41 rows=1 width=24)
>> (actual time=4.261..14.135 rows=1 loops=1)
>> -> Nested Loop (cost=0.00..1175268.46 rows=1 width=32)
>> (actual time=4.249..14.117 rows=1 loops=1)
>> -> Index Scan using dbmail_messages_mailbox_idx
>> on dbmail_messages m (cost=0.00..4153.35 rows=786 width=16) (actual
>> time=0.043..2.810 rows=358 loops=1)
>> Index Cond: (mailbox_idnr = 12345)
>> Filter: (status = ANY ('{0,1}'::integer[]))
>> -> Index Scan using dbmail_headervalue_2 on
>> dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual
>> time=0.028..0.029 rows=0 loops=358)
>> Index Cond: (v.physmessage_id =
>> m.physmessage_id)
>> Filter: ("substring"(v.headervalue, 0,
>> 255) ~~* '%<...@mail.gmail.com>%'::text)
>> -> Index Scan using dbmail_headername_pkey on
>> dbmail_headername n (cost=0.00..7.94 rows=1 width=8) (actual
>> time=0.007..0.008 rows=1 loops=1)
>> Index Cond: (n.id = v.headername_id)
>> Filter: ((n.headername)::text ~~* 'MESSAGE-
>> ID'::text)
>> -> Index Scan using dbmail_physmessage_pkey on
>> dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual
>> time=0.006..0.007 rows=1 loops=1)
>> Index Cond: (p.id = m.physmessage_id)
>> Total runtime: 14.231 ms
>> (18 rows)
>>
>> Do you think this query plan choice can be affected by
>> postgresql.conf
>> settings? Or shall I file a bug report?
>>
>> I'll be happy to provide the table and index structure details if
>> needed.
>>
>> Thanks!
>>
>> Yar
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
pgsql-general by date:
From: Scott Marlowe
Date: 13 September 2010, 04:58:20
Subject: Re: Incrementally Updated Backups
From: Mikko Partio
Date: 13 September 2010, 06:25:20
Subject: Re: Incrementally Updated Backups
Re: Query plan choice issue - Mailing list pgsql-general

Previous

Next