Nested loop join and date range query - Mailing list pgsql-performance
From | Ian Burrell |
---|---|
Subject | Nested loop join and date range query |
Date | |
Msg-id | d91f09cd0605021555t42905570s5df4dc005190733b@mail.gmail.com Whole thread Raw |
Responses |
Re: Nested loop join and date range query
|
List | pgsql-performance |
We recently upgraded to PostgreSQL 8.1 from 7.4 and a few queries are having performance problems and running for very long times. The commonality seems to be PostgreSQL 8.1 is choosing to use a nested loop join because it estimates there will be only be a single row. There are really thousands of rows and the nested loop version takes much longer. Even with the bad plan, the test query runs quickly. The real query is much more complicated and we have had to kill it after running for 24 hours. SELECT MAX(titles.name) AS title_name, MAX(providers.short_name) AS provider_short_name, SUM(x.xtns) AS xtns, SUM(x.rev) AS rev FROM xtns_by_mso_title_wk x INNER JOIN providers providers ON x.provider_no = providers.provider_no INNER JOIN titles titles ON x.title_no = titles.title_no WHERE x.mso_no = 50 AND x.week BETWEEN '20060423 00:00:00' AND '20060423 00:00:00' GROUP BY x.title_no, x.provider_no The EXPLAIN ANALYZE looks like: GroupAggregate (cost=11.63..11.67 rows=1 width=61) (actual time=1440.550..1467.602 rows=3459 loops=1) -> Sort (cost=11.63..11.64 rows=1 width=61) (actual time=1440.515..1446.634 rows=3934 loops=1) Sort Key: x.title_no, x.provider_no -> Nested Loop (cost=0.00..11.62 rows=1 width=61) (actual time=7.900..1422.686 rows=3934 loops=1) -> Nested Loop (cost=0.00..7.38 rows=1 width=49) (actual time=7.877..1373.392 rows=3934 loops=1) -> Index Scan using unq_xtns_by_mso_title_wk on xtns_by_mso_title_wk x (cost=0.00..4.12 rows=1 width=26) (actual time=7.827..1297.681 rows=3934 loops=1) Index Cond: ((week >= '2006-04-23 00:00:00'::timestamp without time zone) AND (week <= '2006-04-23 00:00:00'::timestamp without time zone) AND (mso_no = 50)) -> Index Scan using pk_titles on titles (cost=0.00..3.25 rows=1 width=27) (actual time=0.010..0.012 rows=1 loops=3934) Index Cond: ("outer".title_no = titles.title_no) -> Index Scan using pk_providers on providers (cost=0.00..4.23 rows=1 width=16) (actual time=0.004..0.005 rows=1 loops=3934) Index Cond: ("outer".provider_no = providers.provider_no) If it is searching over multiple weeks (week BETWEEN '20060417 00:00:00' AND '20060423 00:00:00'), it estimates better and uses a hash join. GroupAggregate (cost=7848.20..7878.48 rows=156 width=61) (actual time=117.761..145.910 rows=3459 loops=1) -> Sort (cost=7848.20..7852.08 rows=1552 width=61) (actual time=117.735..123.823 rows=3934 loops=1) Sort Key: x.title_no, x.provider_no -> Hash Join (cost=5.95..7765.94 rows=1552 width=61) (actual time=6.539..102.825 rows=3934 loops=1) Hash Cond: ("outer".provider_no = "inner".provider_no) -> Nested Loop (cost=0.00..7736.71 rows=1552 width=49) (actual time=5.117..86.980 rows=3934 loops=1) -> Index Scan using idx_xtns_by_mso_ti_wk_wk_mso_t on xtns_by_mso_title_wk x (cost=0.00..2677.04 rows=1552 width=26) (actual time=5.085..18.065 rows=3934 loops=1) Index Cond: ((week >= '2006-04-17 00:00:00'::timestamp without time zone) AND (week <= '2006-04-23 00:00:00'::timestamp without time zone) AND (mso_no = 50)) -> Index Scan using pk_titles on titles (cost=0.00..3.25 rows=1 width=27) (actual time=0.006..0.010 rows=1 loops=3934) Index Cond: ("outer".title_no = titles.title_no) -> Hash (cost=5.16..5.16 rows=316 width=16) (actual time=1.356..1.356 rows=325 loops=1) -> Seq Scan on providers (cost=0.00..5.16 rows=316 width=16) (actual time=0.008..0.691 rows=325 loops=1) If the week range is replace by an equals (week = '20060423 00:00:00'), it also uses a hash join. Unforuntately, the queries are automatically generated and changing them to use an equals could be problematic. GroupAggregate (cost=7828.75..7859.32 rows=157 width=61) (actual time=98.330..125.370 rows=3459 loops=1) -> Sort (cost=7828.75..7832.67 rows=1567 width=61) (actual time=98.303..104.055 rows=3934 loops=1) Sort Key: x.title_no, x.provider_no -> Hash Join (cost=5.95..7745.60 rows=1567 width=61) (actual time=1.785..83.830 rows=3934 loops=1) Hash Cond: ("outer".provider_no = "inner".provider_no) -> Nested Loop (cost=0.00..7716.14 rows=1567 width=49) (actual time=0.170..68.338 rows=3934 loops=1) -> Index Scan using idx_xtns_by_mso_ti_wk_wk_mso_t on xtns_by_mso_title_wk x (cost=0.00..2607.56 rows=1567 width=26) (actual time=0.138..11.993 rows=3934 loops=1) Index Cond: ((week = '2006-04-23 00:00:00'::timestamp without time zone) AND (mso_no = 50)) -> Index Scan using pk_titles on titles (cost=0.00..3.25 rows=1 width=27) (actual time=0.006..0.008 rows=1 loops=3934) Index Cond: ("outer".title_no = titles.title_no) -> Hash (cost=5.16..5.16 rows=316 width=16) (actual time=1.565..1.565 rows=325 loops=1) -> Seq Scan on providers (cost=0.00..5.16 rows=316 width=16) (actual time=0.008..0.677 rows=325 loops=1) Does anyone have some suggestions to try? The most worrying thing is that when the statistics are off, it can do a pathological query. - Ian
pgsql-performance by date: