Home > mailing lists

Re: Sequential scan instead of index scan - Mailing list pgsql-performance

From	Ioannis Anagnostopoulos
Subject	Re: Sequential scan instead of index scan
Date	August 6, 2012 19:04:36
Msg-id	50203F5A.3040805@anatec.com Whole thread Raw
In response to	Re: Sequential scan instead of index scan (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Sequential scan instead of index scan
List	pgsql-performance

Tree view

On 06/08/2012 16:34, Tom Lane wrote:

Ioannis Anagnostopoulos <ioannis@anatec.com> writes:

        I think this is a pretty good plan and quite quick given the       size of the table (88Million rows at present). However in real       life the parameter where I search for msg_id is not an array of       3 ids but of 300.000 or more. It is then that the query forgets       the plan and goes to sequential scan. Is there any way around?

If you've got that many, any(array[....]) is a bad choice.  I'd try
putting the IDs into a VALUES(...) list, or even a temporary table, and
then writing the query as a join.  It is a serious mistake to think that
a seqscan is evil when you're dealing with joining that many rows, btw.
What you should probably be looking for is a hash join plan.
		regards, tom lane

Ok in that scenario we are back to square one. Following your suggestion my resultant query is this (the temporary table is tmp_tbl_messages)
select
            ship_pos_messages.*
       from
           feed_all_y2012m08.ship_pos_messages join tmp_tbl_messages on (ship_pos_messages.msg_id = tmp_tbl_messages.msg_id)
       where
           extract('day' from msg_date_rec) = 1
           AND date_trunc('day', msg_date_rec) = '2012-08-01';

which gives us the following explain analyse:

"Merge Join (cost=1214220.48..3818359.46 rows=173574357 width=128) (actual time=465036.958..479089.731 rows=341190 loops=1)"
" Merge Cond: (feed_all_y2012m08.ship_pos_messages.msg_id = tmp_tbl_messages.msg_id)"
" -> Sort (cost=1178961.70..1179223.51 rows=104725 width=128) (actual time=464796.971..476579.208 rows=19512873 loops=1)"
"        Sort Key: feed_all_y2012m08.ship_pos_messages.msg_id"
"        Sort Method: external merge Disk: 1254048kB"
"        -> Append (cost=0.00..1170229.60 rows=104725 width=128) (actual time=0.033..438682.971 rows=19512883 loops=1)"
"              -> Seq Scan on ship_pos_messages (cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000 rows=0 loops=1)"
"                    Filter: ((date_part('day'::text, msg_date_rec) = 1::double precision) AND (date_trunc('day'::text, msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time zone))"
"              -> Seq Scan on ship_a_pos_messages ship_pos_messages (cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000 rows=0 loops=1)"
"                    Filter: ((date_part('day'::text, msg_date_rec) = 1::double precision) AND (date_trunc('day'::text, msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time zone))"
"              -> Index Scan using idx_ship_b_std_pos_messages_date_trunc on ship_b_std_pos_messages ship_pos_messages (cost=0.00..58657.09 rows=5269 width=128) (actual time=0.032..799.171 rows=986344 loops=1)"
"                    Index Cond: (date_trunc('day'::text, msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time zone)"
"                    Filter: (date_part('day'::text, msg_date_rec) = 1::double precision)"
"              -> Index Scan using idx_ship_b_ext_pos_messages_date_trunc on ship_b_ext_pos_messages ship_pos_messages (cost=0.00..1694.64 rows=141 width=128) (actual time=0.026..20.661 rows=26979 loops=1)"
"                    Index Cond: (date_trunc('day'::text, msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time zone)"
"                    Filter: (date_part('day'::text, msg_date_rec) = 1::double precision)"
"              -> Index Scan using idx_ship_a_pos_messages_wk0_date_trunc on ship_a_pos_messages_wk0 ship_pos_messages (cost=0.00..1109877.86 rows=99313 width=128) (actual time=0.029..435784.376 rows=18499560 loops=1)"
"                    Index Cond: (date_trunc('day'::text, msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time zone)"
"                    Filter: (date_part('day'::text, msg_date_rec) = 1::double precision)"
" -> Sort (cost=35258.79..36087.50 rows=331486 width=8) (actual time=239.908..307.576 rows=349984 loops=1)"
"        Sort Key: tmp_tbl_messages.msg_id"
"        Sort Method: quicksort Memory: 28694kB"
"        -> Seq Scan on tmp_tbl_messages (cost=0.00..4863.86 rows=331486 width=8) (actual time=0.047..55.227 rows=349984 loops=1)"
"Total runtime: 479336.869 ms"

Which is a Merge join and not a hash. Any ideas how to make it a hash join?

Kind Regards
Yiannis

pgsql-performance by date:

From: "Midge Brown"
Date: 06 August 2012, 15:37:04
Subject: Re: slow query, different plans

From: Tom Lane
Date: 07 August 2012, 00:13:49
Subject: Re: Sequential scan instead of index scan

Re: Sequential scan instead of index scan - Mailing list pgsql-performance

Previous

Next