Re: Sequential scan instead of index scan - Mailing list pgsql-performance
From | Ioannis Anagnostopoulos |
---|---|
Subject | Re: Sequential scan instead of index scan |
Date | |
Msg-id | 50203F5A.3040805@anatec.com Whole thread Raw |
In response to | Re: Sequential scan instead of index scan (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Sequential scan instead of index scan
|
List | pgsql-performance |
On 06/08/2012 16:34, Tom Lane wrote:
Ok in that scenario we are back to square one. Following your suggestion my resultant query is this (the temporary table is tmp_tbl_messages)Ioannis Anagnostopoulos <ioannis@anatec.com> writes:I think this is a pretty good plan and quite quick given the size of the table (88Million rows at present). However in real life the parameter where I search for msg_id is not an array of 3 ids but of 300.000 or more. It is then that the query forgets the plan and goes to sequential scan. Is there any way around?If you've got that many, any(array[....]) is a bad choice. I'd try putting the IDs into a VALUES(...) list, or even a temporary table, and then writing the query as a join. It is a serious mistake to think that a seqscan is evil when you're dealing with joining that many rows, btw. What you should probably be looking for is a hash join plan. regards, tom lane
select
ship_pos_messages.*
from
feed_all_y2012m08.ship_pos_messages join tmp_tbl_messages on (ship_pos_messages.msg_id = tmp_tbl_messages.msg_id)
where
extract('day' from msg_date_rec) = 1
AND date_trunc('day', msg_date_rec) = '2012-08-01';
which gives us the following explain analyse:
"Merge Join (cost=1214220.48..3818359.46 rows=173574357 width=128) (actual time=465036.958..479089.731 rows=341190 loops=1)"
" Merge Cond: (feed_all_y2012m08.ship_pos_messages.msg_id = tmp_tbl_messages.msg_id)"
" -> Sort (cost=1178961.70..1179223.51 rows=104725 width=128) (actual time=464796.971..476579.208 rows=19512873 loops=1)"
" Sort Key: feed_all_y2012m08.ship_pos_messages.msg_id"
" Sort Method: external merge Disk: 1254048kB"
" -> Append (cost=0.00..1170229.60 rows=104725 width=128) (actual time=0.033..438682.971 rows=19512883 loops=1)"
" -> Seq Scan on ship_pos_messages (cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000 rows=0 loops=1)"
" Filter: ((date_part('day'::text, msg_date_rec) = 1::double precision) AND (date_trunc('day'::text, msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time zone))"
" -> Seq Scan on ship_a_pos_messages ship_pos_messages (cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000 rows=0 loops=1)"
" Filter: ((date_part('day'::text, msg_date_rec) = 1::double precision) AND (date_trunc('day'::text, msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time zone))"
" -> Index Scan using idx_ship_b_std_pos_messages_date_trunc on ship_b_std_pos_messages ship_pos_messages (cost=0.00..58657.09 rows=5269 width=128) (actual time=0.032..799.171 rows=986344 loops=1)"
" Index Cond: (date_trunc('day'::text, msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time zone)"
" Filter: (date_part('day'::text, msg_date_rec) = 1::double precision)"
" -> Index Scan using idx_ship_b_ext_pos_messages_date_trunc on ship_b_ext_pos_messages ship_pos_messages (cost=0.00..1694.64 rows=141 width=128) (actual time=0.026..20.661 rows=26979 loops=1)"
" Index Cond: (date_trunc('day'::text, msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time zone)"
" Filter: (date_part('day'::text, msg_date_rec) = 1::double precision)"
" -> Index Scan using idx_ship_a_pos_messages_wk0_date_trunc on ship_a_pos_messages_wk0 ship_pos_messages (cost=0.00..1109877.86 rows=99313 width=128) (actual time=0.029..435784.376 rows=18499560 loops=1)"
" Index Cond: (date_trunc('day'::text, msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time zone)"
" Filter: (date_part('day'::text, msg_date_rec) = 1::double precision)"
" -> Sort (cost=35258.79..36087.50 rows=331486 width=8) (actual time=239.908..307.576 rows=349984 loops=1)"
" Sort Key: tmp_tbl_messages.msg_id"
" Sort Method: quicksort Memory: 28694kB"
" -> Seq Scan on tmp_tbl_messages (cost=0.00..4863.86 rows=331486 width=8) (actual time=0.047..55.227 rows=349984 loops=1)"
"Total runtime: 479336.869 ms"
Which is a Merge join and not a hash. Any ideas how to make it a hash join?
Kind Regards
Yiannis
pgsql-performance by date: