Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance - Mailing list pgsql-performance
From | Mario Splivalo |
---|---|
Subject | Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance |
Date | |
Msg-id | 49D9F39F.40106@megafon.hr Whole thread Raw |
In response to | Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance (Scott Marlowe <scott.marlowe@gmail.com>) |
Responses |
Re: Forcing seq_scan off for large table joined with tiny
table yeilds improved performance
|
List | pgsql-performance |
Scott Marlowe wrote: > > It's not really solved, it's just a happy coincidence that the current > plan runs well. In order to keep the query planner making good > choices you need to increase stats target for the field in the index > above. The easiest way to do so is to do this: > > alter database mydb set default_statistics_target=100; > > and run analyze again: > > analyze; So, i removed the index on field_name, set default_default_statistics_target to 100, analyzed, and the results are the same: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=1.04..58676.31 rows=218048 width=68) (actual time=0.067..12268.394 rows=6 loops=1) Hash Cond: ((u.field_name)::text = (t.key)::text) -> Seq Scan on photo_info_data u (cost=0.00..47500.30 rows=2398530 width=50) (actual time=0.013..6426.611 rows=2398446 loops=1) -> Hash (cost=1.02..1.02 rows=2 width=18) (actual time=0.015..0.015 rows=2 loops=1) -> Seq Scan on t_query_data t (cost=0.00..1.02 rows=2 width=18) (actual time=0.002..0.006 rows=2 loops=1) Total runtime: 12268.459 ms (6 rows) I even changed default_statistics_target to 1000: ------------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=1.04..58580.29 rows=208561 width=67) (actual time=0.054..12434.283 rows=6 loops=1) Hash Cond: ((u.field_name)::text = (t.key)::text) -> Seq Scan on photo_info_data u (cost=0.00..47499.46 rows=2398446 width=49) (actual time=0.012..6129.923 rows=2398446 loops=1) -> Hash (cost=1.02..1.02 rows=2 width=18) (actual time=0.015..0.015 rows=2 loops=1) -> Seq Scan on t_query_data t (cost=0.00..1.02 rows=2 width=18) (actual time=0.002..0.004 rows=2 loops=1) Total runtime: 12434.338 ms (6 rows) Even when I run this query, I get sequential scan: explain analyze select * from photo_info_data where field_name = 'f-spot' or field_name = 'shutter'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Seq Scan on photo_info_data (cost=0.00..59491.69 rows=1705 width=49) (actual time=0.018..1535.963 rows=6 loops=1) Filter: (((field_name)::text = 'f-spot'::text) OR ((field_name)::text = 'shutter'::text)) Total runtime: 1536.010 ms (3 rows) These are the representations of te values 'f-spot' and 'shutter' for the field field_name in photo_info_data table: xmltest=# select field_name, count(*) from user_info_data where field_name in ('visina', 'spol') group by field_name; field_name | count ------------+------- 'f-spot' | 3 'shutter' | 3 (2 rows) Maybe my test-data is poor? As I've mentioned, photo_info_data has little over 2300000 rows. And this is complete 'distribution' of the data: xmltest=# select field_name, count(*) from user_info_data group by field_name order by count(*) desc; field_name | count ----------------+-------- field_Xx1 | 350000 field_Xx2 | 332447 field_Xx3 | 297414 field_Xx4 | 262394 field_Xx5 | 227396 field_Xx6 | 192547 field_Xx7 | 157612 field_Xx8 | 122543 field_Xx9 | 87442 field_Xx10 | 52296 field_1 | 50000 field_2 | 47389 field_3 | 42412 field_4 | 37390 field_5 | 32366 field_6 | 27238 field_7 | 22360 field_Xx11 | 17589 field_8 | 17412 field_9 | 12383 field_10 | 7386 field_11 | 2410 f-spot | 3 shutter | 3 focal | 3 flash | 3 m_city | 3 person | 3 iso | 2 (29 rows) No matter what field_name value I enter in WHERE condition, planner chooses sequential scan. Only when I add seperate index on field_name, planner chooes index scan or bitmap index scan. Mike
pgsql-performance by date: