Re: Optimize date query for large child tables: GiST or GIN? - Mailing list pgsql-performance

From David Jarvis
Subject Re: Optimize date query for large child tables: GiST or GIN?
Date
Msg-id AANLkTimjPLmnSZDe3nNBVGjJ0CGIQd_Ic8IxHb4fUG9v@mail.gmail.com
Whole thread Raw
In response to Re: Optimize date query for large child tables: GiST or GIN?  (Matthew Wakeling <matthew@flymine.org>)
List pgsql-performance
Hi,

(An EXPLAIN ANALYSE would be better here). Look at the expected number of stations

"Nested Loop  (cost=0.00..994.94 rows=4046 width=4) (actual time=0.053..41.173 rows=78 loops=1)"
"  Join Filter: ((6371.009::double precision * sqrt((pow(radians(((c.latitude_decimal - s.latitude_decimal))::double precision), 2::double precision) + (cos((radians(((c.latitude_decimal + s.latitude_decimal))::double precision) / 2::double precision)) * pow(radians(((c.longitude_decimal - s.longitude_decimal))::double precision), 2::double precision))))) <= 25::double precision)"
"  ->  Index Scan using city_pkey1 on city c  (cost=0.00..6.27 rows=1 width=16) (actual time=0.014..0.016 rows=1 loops=1)"
"        Index Cond: (id = 5182)"
"  ->  Seq Scan on station s  (cost=0.00..321.08 rows=12138 width=20) (actual time=0.007..5.256 rows=12139 loops=1)"
"        Filter: ((s.elevation >= 0) AND (s.elevation <= 3000))"
"Total runtime: 41.235 ms"

expects to have to touch a large proportion of the measurement table, therefore it thinks that it will be fastest to do a seq scan. In actual fact, for 78 stations, the index would be faster, but for 4046 it wouldn't.

This is rather unexpected. I'd have figured it would use the actual number.
 
If you will be querying by season quite regularly, had you considered partitioning by season?

I have no idea what the "regular" queries will be. The purpose of the system is to open the data up to the public using a simple user interface so that they can generate their own custom reports. That user interface allows people to pick year intervals, day ranges, elevations, categories (temperature, precipitation, snow depth, etc.), and lat/long perimeter coordinates (encompassing any number of stations) or a city and radius.

Dave

pgsql-performance by date:

Previous
From: Matthew Wakeling
Date:
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Next
From: Yeb Havinga
Date:
Subject: Re: Optimize date query for large child tables: GiST or GIN?