Thread: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3
Hm, CVS doesn't seem to be using a hash aggregate. At least, if it is it isn't obvious from the plan. The query actually runs slightly slower in CVS than with 7.3, though it's hard to compare because it seems to have done everything differently. Every hash join has become a merge join and every merge join has become a hash join :/ SELECT hier.level_0_id as parent_id, (select localized_text from localized_text where text_id = hier.short_name_text_id and lang_code = 'en') as name, * FROM hier LEFT OUTER JOIN ( SELECT distinct level_0_id, level_1_id FROM cache_foo JOIN foo_hier USING (foo_id) WHERE key_value = 839 AND dist < 60 ) AS cache ON (hier.hier_id = cache.level_1_id) WHERE level = 1 ORDER BY 1,2 The cvs plan: QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=3691.30..3691.49 rows=76 width=1584) (actual time=917.19..917.23 rows=31 loops=1) Sort Key: hier.level_0_id, (subplan) -> Hash Left Join (cost=2837.80..3688.92 rows=76 width=1584) (actual time=914.46..916.89 rows=31 loops=1) Hash Cond: ("outer".hier_id = "inner".level_1_id) -> Seq Scan on hier (cost=0.00..63.86 rows=14 width=1576) (actual time=7.16..7.37 rows=31 loops=1) Filter: ("level" = 1) -> Hash (cost=2799.99..2799.99 rows=15122 width=16) (actual time=905.99..905.99 rows=0 loops=1) -> Subquery Scan "cache" (cost=2686.58..2799.99 rows=15122 width=16) (actual time=853.43..905.84 rows=31loops=1) -> Unique (cost=2686.58..2799.99 rows=15122 width=16) (actual time=853.41..905.68 rows=31 loops=1) -> Sort (cost=2686.58..2724.38 rows=15122 width=16) (actual time=853.40..873.00 rows=16440 loops=1) Sort Key: foo_hier.level_0_id, foo_hier.level_1_id -> Merge Join (cost=123.56..1636.78 rows=15122 width=16) (actual time=248.54..723.14 rows=16440loops=1) Merge Cond: ("outer".foo_id = "inner".foo_id) -> Index Scan using foo_hier_foo on foo_hier (cost=0.00..1173.54 rows=45140 width=12)(actual time=0.04..234.30 rows=45140 loops=1) -> Sort (cost=123.56..123.73 rows=67 width=4) (actual time=248.43..267.31 rows=16441loops=1) Sort Key: cache_foo.foo_id -> Index Scan using idx_cache_foo on cache_foo (cost=0.00..121.53 rows=67width=4) (actual time=0.06..116.21 rows=16486 loops=1) Index Cond: ((key_value = 839) AND (dist < 60::double precision)) SubPlan -> Index Scan using localized_text_pkey on localized_text (cost=0.00..4.01 rows=1 width=516) (actual time=0.03..0.05rows=1 loops=31) Index Cond: ((text_id = $0) AND (lang_code = 'en'::bpchar)) Total runtime: 928.46 msec The 7.3 plan: QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=4209.04..4209.12 rows=31 width=85) (actual time=849.01..849.05 rows=31 loops=1) Sort Key: hier.level_0_id, (subplan) -> Merge Join (cost=4199.51..4208.27 rows=31 width=85) (actual time=846.77..848.69 rows=31 loops=1) Merge Cond: ("outer".hier_id = "inner".level_1_id) -> Sort (cost=64.63..64.71 rows=31 width=77) (actual time=8.38..8.42 rows=31 loops=1) Sort Key: hier.hier_id -> Seq Scan on hier (cost=0.00..63.86 rows=31 width=77) (actual time=7.84..8.20 rows=31 loops=1) Filter: ("level" = 1) -> Sort (cost=4134.88..4139.07 rows=1674 width=16) (actual time=837.74..837.78 rows=31 loops=1) Sort Key: "cache".level_1_id -> Subquery Scan "cache" (cost=3919.66..4045.23 rows=1674 width=16) (actual time=786.64..837.54 rows=31loops=1) -> Unique (cost=3919.66..4045.23 rows=1674 width=16) (actual time=786.63..837.38 rows=31 loops=1) -> Sort (cost=3919.66..3961.52 rows=16743 width=16) (actual time=786.61..804.71 rows=16440 loops=1) Sort Key: foo_hier.level_0_id, foo_hier.level_1_id -> Hash Join (cost=1599.78..2745.06 rows=16743 width=16) (actual time=349.16..628.43 rows=16440loops=1) Hash Cond: ("outer".foo_id = "inner".foo_id) -> Index Scan using idx_cache_foo on cache_foo (cost=0.00..810.43 rows=16743 width=4)(actual time=0.07..144.44 rows=16486 loops=1) Index Cond: ((key_value = 839) AND (dist < 60::double precision)) -> Hash (cost=746.40..746.40 rows=45140 width=12) (actual time=347.32..347.32 rows=0loops=1) -> Seq Scan on foo_hier (cost=0.00..746.40 rows=45140 width=12) (actual time=0.05..222.63rows=45140 loops=1) SubPlan -> Index Scan using localized_text_pkey on localized_text (cost=0.00..4.03 rows=1 width=17) (actual time=0.03..0.03rows=1 loops=31) Index Cond: ((text_id = $0) AND (lang_code = 'en'::bpchar)) Total runtime: 854.57 msec -- greg
Greg Stark <gsstark@mit.edu> writes: > Hm, CVS doesn't seem to be using a hash aggregate. At least, if it is it isn't > obvious from the plan. > SELECT hier.level_0_id as parent_id, > (select localized_text from localized_text where text_id = hier.short_name_text_id and lang_code = 'en') as name, > * > FROM hier LEFT OUTER JOIN ( > SELECT distinct level_0_id, level_1_id > FROM cache_foo JOIN foo_hier USING (foo_id) > WHERE key_value = 839 > AND dist < 60 > ) AS cache ON (hier.hier_id = cache.level_1_id) > WHERE level = 1 > ORDER BY 1,2 Why would you expect hash aggregation to be used here? There's no aggregates ... nor even any GROUP BY. A hash aggregation plan looks like this: regression=# explain select ten, sum(unique1) from tenk1 group by ten; QUERY PLAN ----------------------------------------------------------------- HashAggregate (cost=508.00..508.02 rows=10 width=8) -> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=8) (2 rows) regards, tom lane
Greg Stark <gsstark@mit.edu> writes: > The query actually runs slightly slower in CVS than with 7.3, though > it's hard to compare because it seems to have done everything > differently. Every hash join has become a merge join and every merge > join has become a hash join :/ The estimated row counts for the bottom-level scans show some differences, which probably account for the differing choice of plans. Are these actually the same data with up-to-date ANALYZE stats in both cases? I do not recall that we've made any adjustments to the basic statistical estimation routines since 7.3 ... regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > Hm, CVS doesn't seem to be using a hash aggregate. At least, if it is it isn't > > obvious from the plan. > > > SELECT hier.level_0_id as parent_id, > > (select localized_text from localized_text where text_id = hier.short_name_text_id and lang_code = 'en') as name, > > * > > FROM hier LEFT OUTER JOIN ( > > SELECT distinct level_0_id, level_1_id > > FROM cache_foo JOIN foo_hier USING (foo_id) > > WHERE key_value = 839 > > AND dist < 60 > > ) AS cache ON (hier.hier_id = cache.level_1_id) > > WHERE level = 1 > > ORDER BY 1,2 > > Why would you expect hash aggregation to be used here? There's no > aggregates ... nor even any GROUP BY. Well, "SELECT distinct level_0_id, level_1_id" is equivalent to a GROUP BY level_0_id, level_1_id. Um, I think I grabbed the wrong query from the logs though, sorry. Here's a better example from the actual code, in fact I think it's what the above query turned into after more work. There's only a small decrease in speed from 7.3 to CVS now, but I was hoping for a big speed increase from hash aggregates since most of the time is being sunk into that sort. But it definitely isn't using them. I guess TNSTAAFL. SELECT hier.level_0_id as parent_id, (select localized_text from localized_text where text_id = hier.short_name_text_id and lang_code = 'en') as name, * FROM hier LEFT OUTER JOIN ( SELECT min(dist) AS mindist, count(distinct foo_id) AS num_foos, level_1_id FROM cache_foos JOIN foo_hier USING (foo_id) WHERE key_id = 839 AND dist < 60 GROUP BY level_0_id, level_1_id ) AS cache ON (hier.hier_id = cache.level_1_id) WHERE level = 1 ORDER BY level_0_id; CVS: QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=3936.87..3937.06 rows=76 width=1596) (actual time=1047.13..1047.16 rows=31 loops=1) Sort Key: hier.level_0_id -> Hash Left Join (cost=2989.02..3934.50 rows=76 width=1596) (actual time=1044.90..1046.87 rows=31 loops=1) Hash Cond: ("outer".hier_id = "inner".level_1_id) -> Seq Scan on hier (cost=0.00..63.86 rows=14 width=1576) (actual time=7.92..8.13 rows=31 loops=1) Filter: ("level" = 1) -> Hash (cost=2951.21..2951.21 rows=15122 width=24) (actual time=1033.78..1033.78 rows=0 loops=1) -> Subquery Scan "cache" (cost=2686.58..2951.21 rows=15122 width=24) (actual time=917.66..1033.60 rows=31loops=1) -> GroupAggregate (cost=2686.58..2951.21 rows=15122 width=24) (actual time=917.64..1033.40 rows=31loops=1) -> Sort (cost=2686.58..2724.38 rows=15122 width=24) (actual time=913.00..931.40 rows=16440 loops=1) Sort Key: foo_hier.level_0_id, foo_hier.level_1_id -> Merge Join (cost=123.56..1636.78 rows=15122 width=24) (actual time=280.80..779.05 rows=16440loops=1) Merge Cond: ("outer".foo_id = "inner".foo_id) -> Index Scan using foo_hier_foo on foo_hier (cost=0.00..1173.54 rows=45140 width=12)(actual time=0.04..225.13 rows=45140 loops=1) -> Sort (cost=123.56..123.73 rows=67 width=12) (actual time=280.69..302.62 rows=16441loops=1) Sort Key: cache_foos.foo_id -> Index Scan using idx_cache_foos on cache_foos (cost=0.00..121.53 rows=67width=12) (actual time=0.05..128.19 rows=16486 loops=1) Index Cond: ((key_id = 839) AND (dist < 60::double precision)) SubPlan -> Index Scan using localized_text_pkey on localized_text (cost=0.00..4.01 rows=1 width=516) (actual time=0.03..0.03rows=1 loops=31) Index Cond: ((text_id = $0) AND (lang_code = 'en'::bpchar)) Total runtime: 1058.63 msec 7.3: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Sort (cost=4103.84..4103.92 rows=31 width=97) (actual time=1033.79..1033.82 rows=31 loops=1) Sort Key: hier.level_0_id -> Merge Join (cost=4094.32..4103.08 rows=31 width=97) (actual time=1031.62..1033.54 rows=31 loops=1) Merge Cond: ("outer".hier_id = "inner".level_1_id) -> Sort (cost=64.63..64.71 rows=31 width=77) (actual time=7.92..7.96 rows=31 loops=1) Sort Key: hier.hier_id -> Seq Scan on hier (cost=0.00..63.86 rows=31 width=77) (actual time=7.25..7.77 rows=31 loops=1) Filter: ("level" = 1) -> Sort (cost=4029.69..4033.87 rows=1674 width=24) (actual time=1023.54..1023.58 rows=31 loops=1) Sort Key: "cache".level_1_id -> Subquery Scan "cache" (cost=3730.75..3940.04 rows=1674 width=24) (actual time=829.88..1023.35 rows=31loops=1) -> Aggregate (cost=3730.75..3940.04 rows=1674 width=24) (actual time=829.86..1023.14 rows=31 loops=1) -> Group (cost=3730.75..3856.32 rows=16743 width=24) (actual time=822.88..940.44 rows=16440loops=1) -> Sort (cost=3730.75..3772.61 rows=16743 width=24) (actual time=822.86..841.47 rows=16440loops=1) Sort Key: foo_hier.level_0_id, foo_hier.level_1_id -> Hash Join (cost=1410.87..2556.15 rows=16743 width=24) (actual time=347.17..662.63rows=16440 loops=1) Hash Cond: ("outer".foo_id = "inner".foo_id) -> Index Scan using idx_cache_foos on cache_foos (cost=0.00..810.43 rows=16743width=12) (actual time=0.07..152.56 rows=16486 loops=1) Index Cond: ((key_id = 839) AND (dist < 60::double precision)) -> Hash (cost=746.40..746.40 rows=45140 width=12) (actual time=345.53..345.53rows=0 loops=1) -> Seq Scan on foo_hier (cost=0.00..746.40 rows=45140 width=12) (actualtime=0.06..213.59 rows=45140 loops=1) SubPlan -> Index Scan using localized_text_pkey on localized_text (cost=0.00..4.03 rows=1 width=17) (actual time=0.03..0.03rows=1 loops=31) Index Cond: ((text_id = $0) AND (lang_code = 'en'::bpchar)) Total runtime: 1039.57 msec -- greg
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > The estimated row counts for the bottom-level scans show some > differences, which probably account for the differing choice of plans. > > Are these actually the same data with up-to-date ANALYZE stats in both > cases? I do not recall that we've made any adjustments to the basic > statistical estimation routines since 7.3 ... The exact same data. I just copied the data using pg_dumpall to do the upgrade and haven't updated either database at all since. The analyze stats should be pretty up-to-date but in any case should be the same for both assuming they get copied in the dump as well. -- greg
Greg Stark <gsstark@mit.edu> writes: > There's only a small decrease in speed from 7.3 to CVS now, but I was hoping > for a big speed increase from hash aggregates since most of the time is being > sunk into that sort. But it definitely isn't using them. I guess TNSTAAFL. It looks like it's avoiding the hash choice because it thinks there will be many groups, 15122 to be exact: > -> GroupAggregate (cost=2686.58..2951.21 rows=15122 width=24) (actual time=917.64..1033.40 rows=31loops=1) You could probably persuade it that hashed aggregation will be okay by increasing sort_mem sufficiently. But it would also be interesting to see if the number-of-groups estimate can be improved ... 15122 is rather badly off from the true value of 31 ... regards, tom lane
Greg Stark <gsstark@mit.edu> writes: > The exact same data. I just copied the data using pg_dumpall to do the upgrade > and haven't updated either database at all since. The analyze stats should be > pretty up-to-date but in any case should be the same for both assuming they > get copied in the dump as well. Uh, no, pg_dump doesn't copy pg_statistic; you must do a fresh ANALYZE after loading the dump file. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > You could probably persuade it that hashed aggregation will be okay by > increasing sort_mem sufficiently. But it would also be interesting to > see if the number-of-groups estimate can be improved ... 15122 is rather > badly off from the true value of 31 ... I don't see how you're ever going to reliably come up with a good estimate for this. Consider that it's not just the distribution of the column that matters, but the distribution given the where clauses in effect. This is dependent on what degree the expressions in the where clauses are independent of the expressions we're grouping on, which is hard to predict in foovance. If the prediction is wrong is it just a performance penalty? The hash can still proceed if it has to go to disk? In which case is there a way for me to force it to use hashes if I know better than the optimizer? I've run vacuum full and analyze on both databases again. The data should be identical as I've just copied the database and I haven't updated anything. Two example queries tested with both. It isn't using hash aggregates for either. The plans are still quite different. This is the previous query except I've added hier.level_0_id to the join clause in the hopes it would skip the redundant sort. It didn't work, though the second sort is fast due to there being relatively few records coming out of the group. SELECT hier.level_0_id as parent_id, (select localized_text from localized_text where text_id = hier.short_name_text_id and lang_code = 'en') as name, * FROM hier LEFT OUTER JOIN ( SELECT min(dist) AS mindist, count(distinct foo_id) AS num_foos, level_1_id, level_0_id FROM cache_foos JOIN foo_hier USING (foo_id) WHERE region_id = 839 AND dist < 60 GROUP BY level_0_id, level_1_id ) AS cache ON (hier.hier_id = cache.level_1_id and hier.level_0_id = cache.level_0_id) WHERE level = 1 ORDER BY hier.level_0_id, hier.level_1_id CVS: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Sort (cost=4091.65..4091.73 rows=31 width=101) (actual time=907.95..907.99 rows=31 loops=1) Sort Key: hier.level_0_id, hier.level_1_id -> Merge Left Join (cost=3964.22..4090.89 rows=31 width=101) (actual time=905.53..907.61 rows=31 loops=1) Merge Cond: (("outer".level_0_id = "inner".level_0_id) AND ("outer".hier_id = "inner".level_1_id)) -> Sort (cost=64.63..64.71 rows=31 width=77) (actual time=7.61..7.64 rows=31 loops=1) Sort Key: hier.level_0_id, hier.hier_id -> Seq Scan on hier (cost=0.00..63.86 rows=31 width=77) (actual time=6.47..7.36 rows=31 loops=1) Filter: ("level" = 1) -> Sort (cost=3899.59..3899.90 rows=124 width=24) (actual time=897.76..897.80 rows=31 loops=1) Sort Key: "cache".level_0_id, "cache".level_1_id -> Subquery Scan "cache" (cost=3697.05..3895.28 rows=124 width=24) (actual time=771.57..897.28 rows=31 loops=1) -> GroupAggregate (cost=3697.05..3895.28 rows=124 width=24) (actual time=771.55..897.03 rows=31 loops=1) -> Sort (cost=3697.05..3736.57 rows=15809 width=24) (actual time=764.08..782.64 rows=16440 loops=1) Sort Key: foo_hier.level_0_id, foo_hier.level_1_id -> Hash Join (cost=853.20..2594.49 rows=15809 width=24) (actual time=307.51..603.83 rows=16440loops=1) Hash Cond: ("outer".foo_id = "inner".foo_id) -> Index Scan using idx_cache_foos on cache_foos (cost=0.00..752.94 rows=15808 width=12)(actual time=0.08..111.71 rows=16486 loops=1) Index Cond: ((region_id = 839) AND (dist < 60::double precision)) -> Hash (cost=740.16..740.16 rows=45216 width=12) (actual time=306.11..306.11 rows=0loops=1) -> Seq Scan on foo_hier (cost=0.00..740.16 rows=45216 width=12) (actual time=0.03..143.56rows=45140 loops=1) SubPlan -> Index Scan using localized_text_pkey on localized_text (cost=0.00..4.05 rows=2 width=17) (actual time=0.03..0.03rows=1 loops=31) Index Cond: ((text_id = $0) AND (lang_code = 'en'::bpchar)) Total runtime: 915.22 msec 7.3: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Sort (cost=4056.43..4056.51 rows=31 width=101) (actual time=1036.93..1036.97 rows=31 loops=1) Sort Key: hier.level_0_id, hier.level_1_id -> Merge Join (cost=4047.39..4055.66 rows=31 width=101) (actual time=1034.53..1036.59 rows=31 loops=1) Merge Cond: (("outer".level_0_id = "inner".level_0_id) AND ("outer".hier_id = "inner".level_1_id)) -> Sort (cost=64.63..64.71 rows=31 width=77) (actual time=10.15..10.18 rows=31 loops=1) Sort Key: hier.level_0_id, hier.hier_id -> Seq Scan on hier (cost=0.00..63.86 rows=31 width=77) (actual time=9.59..9.94 rows=31 loops=1) Filter: ("level" = 1) -> Sort (cost=3982.76..3986.82 rows=1624 width=24) (actual time=1024.23..1024.26 rows=31 loops=1) Sort Key: "cache".level_0_id, "cache".level_1_id -> Subquery Scan "cache" (cost=3693.23..3896.18 rows=1624 width=24) (actual time=835.96..1023.98 rows=31loops=1) -> Aggregate (cost=3693.23..3896.18 rows=1624 width=24) (actual time=835.95..1023.76 rows=31 loops=1) -> Group (cost=3693.23..3815.00 rows=16236 width=24) (actual time=828.96..941.70 rows=16440loops=1) -> Sort (cost=3693.23..3733.82 rows=16236 width=24) (actual time=828.93..848.83 rows=16440loops=1) Sort Key: foo_hier.level_0_id, foo_hier.level_1_id -> Hash Join (cost=1432.53..2557.77 rows=16236 width=24) (actual time=350.81..670.83rows=16440 loops=1) Hash Cond: ("outer".foo_id = "inner".foo_id) -> Index Scan using idx_cache_foos on cache_foos (cost=0.00..800.51 rows=16236width=12) (actual time=0.08..159.95 rows=16486 loops=1) Index Cond: ((region_id = 839) AND (dist < 60::double precision)) -> Hash (cost=746.35..746.35 rows=45135 width=12) (actual time=349.15..349.15rows=0 loops=1) -> Seq Scan on foo_hier (cost=0.00..746.35 rows=45135 width=12) (actualtime=0.06..219.15 rows=45140 loops=1) SubPlan -> Index Scan using localized_text_pkey on localized_text (cost=0.00..4.10 rows=1 width=17) (actual time=0.03..0.03rows=1 loops=31) Index Cond: ((text_id = $0) AND (lang_code = 'en'::bpchar)) Total runtime: 1042.86 msec This is another query, one that involves a distinct on and a group. I guess the plan for this one could be different because the JOIN no longer constrains the join order. Though the order seems the same to me, it's the statistics that seem to be different. SELECT * FROM ( SELECT DISTINCT ON (bar_id) bar_id, bar_location_id, num_foos, mindist FROM ( SELECT bar_id, bar_location_id, count(distinct foo_id) as num_foos, min(dist) as mindist FROM cache_foos WHERE region_id = 839 AND dist < 60 AND foo_id is not null GROUP BY bar_id, bar_location_id ) as x ORDER BY bar_id, mindist asc ) AS cache JOIN bar using (bar_id) JOIN bar_location using (bar_location_id) CVS: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Merge Join (cost=3438.87..3815.85 rows=200 width=654) (actual time=470.72..533.88 rows=112 loops=1) Merge Cond: ("outer".bar_location_id = "inner".bar_location_id) -> Index Scan using bar_location_pkey on bar_location (cost=0.00..344.50 rows=11792 width=610) (actual time=0.04..59.41rows=11757 loops=1) -> Sort (cost=3438.87..3439.37 rows=200 width=44) (actual time=435.51..435.68 rows=112 loops=1) Sort Key: "cache".bar_location_id -> Merge Join (cost=3319.28..3431.22 rows=200 width=44) (actual time=407.23..434.97 rows=112 loops=1) Merge Cond: ("outer".bar_id = "inner".bar_id) -> Index Scan using bar_pkey on bar (cost=0.00..99.52 rows=3768 width=20) (actual time=0.02..18.18 rows=3619loops=1) -> Sort (cost=3319.28..3319.78 rows=200 width=20) (actual time=407.17..407.29 rows=112 loops=1) Sort Key: "cache".bar_id -> Subquery Scan "cache" (cost=3232.65..3311.64 rows=200 width=20) (actual time=405.71..406.80 rows=112loops=1) -> Unique (cost=3232.65..3311.64 rows=200 width=20) (actual time=405.70..406.24 rows=112 loops=1) -> Sort (cost=3232.65..3272.14 rows=15797 width=20) (actual time=405.69..405.84 rows=146loops=1) Sort Key: bar_id, mindist -> Subquery Scan x (cost=1854.57..2131.02 rows=15797 width=20) (actual time=272.07..405.08rows=146 loops=1) -> GroupAggregate (cost=1854.57..2131.02 rows=15797 width=20) (actual time=272.06..404.21rows=146 loops=1) -> Sort (cost=1854.57..1894.06 rows=15797 width=20) (actual time=270.37..289.31rows=16440 loops=1) Sort Key: bar_id, bar_location_id -> Index Scan using idx_cache_foos on cache_foos (cost=0.00..752.94rows=15797 width=20) (actual time=0.05..118.41 rows=16440 loops=1) Index Cond: ((region_id = 839) AND (dist < 60::double precision)) Filter: (foo_id IS NOT NULL) Total runtime: 540.00 msec 7.3: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=2236.97..2892.88 rows=162 width=159) (actual time=515.70..562.77 rows=112 loops=1) -> Merge Join (cost=2236.97..2397.80 rows=162 width=44) (actual time=515.58..555.70 rows=112 loops=1) Merge Cond: ("outer".bar_id = "inner".bar_id) -> Index Scan using bar_pkey on bar (cost=0.00..148.37 rows=3850 width=20) (actual time=0.05..29.76 rows=3619loops=1) -> Sort (cost=2236.97..2237.37 rows=162 width=20) (actual time=515.49..515.64 rows=112 loops=1) Sort Key: "cache".bar_id -> Subquery Scan "cache" (cost=2222.92..2231.02 rows=162 width=20) (actual time=513.96..515.09 rows=112loops=1) -> Unique (cost=2222.92..2231.02 rows=162 width=20) (actual time=513.95..514.51 rows=112 loops=1) -> Sort (cost=2222.92..2226.97 rows=1621 width=20) (actual time=513.94..514.11 rows=146 loops=1) Sort Key: bar_id, mindist -> Subquery Scan x (cost=1933.89..2136.50 rows=1621 width=20) (actual time=313.72..513.34rows=146 loops=1) -> Aggregate (cost=1933.89..2136.50 rows=1621 width=20) (actual time=313.71..512.52rows=146 loops=1) -> Group (cost=1933.89..2055.46 rows=16209 width=20) (actual time=311.41..422.29rows=16440 loops=1) -> Sort (cost=1933.89..1974.41 rows=16209 width=20) (actual time=311.39..329.79rows=16440 loops=1) Sort Key: bar_id, bar_location_id -> Index Scan using idx_cache_foos on cache_foos (cost=0.00..800.51rows=16209 width=20) (actual time=0.05..181.92 rows=16440 loops=1) Index Cond: ((region_id = 839) AND (dist < 60::double precision)) Filter: (foo_id IS NOT NULL) -> Index Scan using bar_location_pkey on bar_location (cost=0.00..3.04 rows=1 width=115) (actual time=0.04..0.04 rows=1loops=112) Index Cond: ("outer".bar_location_id = bar_location.bar_location_id) Total runtime: 568.69 msec [BTW I've had to search and repalce on the plans at the request of the client, I hope I didn't lose any relevant information doing that] -- greg
Tom Lane <tgl@sss.pgh.pa.us> writes: > You could probably persuade it that hashed aggregation will be okay by > increasing sort_mem sufficiently. But it would also be interesting to > see if the number-of-groups estimate can be improved ... 15122 is rather > badly off from the true value of 31 ... My sort_mem is already quite large, 53M. I think that's large enough even for 15,122 rows. -- greg
Greg Stark <gsstark@mit.edu> writes: > I don't see how you're ever going to reliably come up with a good > estimate for this. Well, it'll probably never be really good, but it's only been a few weeks that we've had code that tried to do it at all; I'm not prepared to write off the concept without even trying. And I see it's not doing that badly in your example, now that you've ANALYZEd --- 124 estimated groups vs 31 actual is well within what would make me happy. But with only 124 estimated groups, it's certainly not the size of the hashtable that's dissuading it from hashing. [ Looks at code... ] Oh, here's the problem: * Executor doesn't support hashed aggregation with DISTINCT * aggregates. (Doing so would imply storing *all* the input * values in the hash table, which seems like a certain loser.) The count(distinct) you've got in there turns it off. > If the prediction is wrong is it just a performance penalty? The hash can > still proceed if it has to go to disk? Long as you don't run out of swap space, sure ;-). So the estimate only really has to be right within a factor of (swap space)/sort_mem. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > The count(distinct) you've got in there turns it off. AAAH. That makes sense. Unfortunately I think I'm stuck with that, I'll have to check though. It was always a bit mysterious to me how postgres could implement count(distinct) without introducing a separate sort and aggregate for each occurrence. I suppose it could rewrite it into an inner grouping with the added column, then an outer grouping without it with a normal count(). But then that would only work with a single count(distinct). And I'm not clear it would win for this example either since there could actually be a pretty large number of distinct values on that column. hmm... Thanks for all the help. -- greg
Greg Stark <gsstark@mit.edu> writes: > It was always a bit mysterious to me how postgres could implement > count(distinct) without introducing a separate sort and aggregate for each > occurrence. It can't. There's a sort + uniq + aggregate process done behind the scenes in the executor for each DISTINCT aggregate. This doesn't show up on the EXPLAIN output, because the planner has nothing to do with it. I thought about doing this via a separate hashtable for each group ... for about a minute. The trouble is you have to run those things in parallel if you're doing hashed aggregation, so the resources required are really out of the question in most cases. With the group approach, the executor is only processing the values for one outer group at a time, so it only has to run one inner sort + uniq + agg process at a time. I suppose we could consider introducing two implementations (hash or sort/uniq) for a DISTINCT agg within the executor, but it's code that remains unwritten... regards, tom lane
There is a simple case that isn't being handled. A straight DISTINCT is exactly equivalent to a GROUP BY on all the columns listed. Right now it's still doing a Sort+Unique. The HashAgg seems to be about 2-3 times as fast for me. Actually a DISTINCT ON should also be possible to do as a hashagg as long as there's no ORDER BY, though I guess that would be fairly uncommon since it's not that useful. slo=> explain select distinct id from tab; QUERY PLAN --------------------------------------------------------------------------- Unique (cost=3731.53..3932.49 rows=146 width=4) -> Sort (cost=3731.53..3832.01 rows=40192 width=4) Sort Key: id -> Seq Scan on tab (cost=0.00..657.92 rows=40192 width=4) slo=> explain select id from tab group by id; QUERY PLAN --------------------------------------------------------------------- HashAggregate (cost=758.40..758.40 rows=146 width=4) -> Seq Scan on tab (cost=0.00..657.92 rows=40192 width=4) -- greg
Greg Stark <gsstark@mit.edu> writes: > There is a simple case that isn't being handled. A straight DISTINCT is > exactly equivalent to a GROUP BY on all the columns listed. Right now it's > still doing a Sort+Unique. True. The DISTINCT logic is so intertwined with ORDER BY that I couldn't see an easy way to consider a hash-based alternative in the planner. I think it would require querytree changes propagating all the way back to the parser to make this feasible at all, and then you'd still need to think about how to make an honest cost-estimate comparison of the alternatives (preferably without planning the whole query twice). Anyone want to dig into it? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > There is a simple case that isn't being handled. A straight DISTINCT is > > exactly equivalent to a GROUP BY on all the columns listed. Right now it's > > still doing a Sort+Unique. > > True. The DISTINCT logic is so intertwined with ORDER BY that I > couldn't see an easy way to consider a hash-based alternative in the > planner. I think it would require querytree changes propagating all the > way back to the parser to make this feasible at all, and then you'd > still need to think about how to make an honest cost-estimate comparison > of the alternatives (preferably without planning the whole query twice). > Anyone want to dig into it? So my thinking is that DISTINCT and DISTINCT ON are really just special cases of GROUP BY. instead of having separate code paths for them I would suggest rewriting them to GROUP BY, using a "first()" aggregate function for the DISTINCT ON case. Then let the normal logic handle it. The only disadvantage is that if there is an index it's possible to implement DISTINCT ON more efficiently than arbitrary aggregate functions. Is that the case currently for the Unique node? If we have a way of marking arbitrary aggregate functions as only requiring the first or last record then we could notice that the only aggregate in rewritten DISTINCT ON queries is first() which only requires the first record of each value and optimize the index scan. As a bonus this would solve the min()/max() issue as well. -- greg