Thread: pgsql-server/ oc/src/sgml/runtime.sgml rc/back ...
CVSROOT: /cvsroot Module name: pgsql-server Changes by: tgl@postgresql.org 02/11/20 19:42:20 Modified files: doc/src/sgml : runtime.sgml src/backend/optimizer/path: costsize.c src/backend/optimizer/plan: createplan.c planmain.c planner.c src/backend/utils/misc: guc.c postgresql.conf.sample src/bin/psql : tab-complete.c src/include/optimizer: cost.h planmain.h src/test/regress/expected: aggregates.out rangefuncs.out rules.out select_having.out select_having_1.out select_implicit.out select_implicit_1.out subselect.out src/test/regress/input: misc.source src/test/regress/output: misc.source src/test/regress/sql: aggregates.sql rules.sql select_having.sql select_implicit.sql subselect.sql Log message: Finish implementation of hashed aggregation. Add enable_hashagg GUC parameter to allow it to be forced off for comparison purposes. Add ORDER BY clauses to a bunch of regression test queries that will otherwise produce randomly-ordered output in the new regime.
> Log message: > Finish implementation of hashed aggregation. Add enable_hashagg GUC > parameter to allow it to be forced off for comparison purposes. > Add ORDER BY clauses to a bunch of regression test queries that will > otherwise produce randomly-ordered output in the new regime. Out of interest (since I was away while this was proposed I assume), what's the idea with hashed aggergation? I assume each group is now in a hash bucket? How did it work before? Chris
Christopher Kings-Lynne wrote: > > Log message: > > Finish implementation of hashed aggregation. Add enable_hashagg GUC > > parameter to allow it to be forced off for comparison purposes. > > Add ORDER BY clauses to a bunch of regression test queries that will > > otherwise produce randomly-ordered output in the new regime. > > Out of interest (since I was away while this was proposed I assume), > what's the idea with hashed aggergation? I assume each group is now in a > hash bucket? How did it work before? It sequential scanned the group of possible matches. How it hashes the value and looks for matches that way --- much faster and the way most db's do it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > Out of interest (since I was away while this was proposed I assume), > what's the idea with hashed aggergation? Old method: scan rows in order by the GROUP BY columns (requiring a sort, or if you're lucky an indexscan), and execute one aggregation at a time. New method: scan rows in any old order (typically a seqscan), and run all the per-group aggregates in parallel. It's a hash aggregation because we use an in-memory hashtable indexed by the values of the GROUP BY columns to keep track of the running state of each aggregate. The hash method avoids a sort before aggregation, at the cost of a sort afterwards if you want the results in non-random order. But the post-sort is only sorting one row per group, which is usually a lot less data than the input rows. One case where the old method can still win is where you have SELECT ... GROUP BY foo ORDER BY foo LIMIT n; for small n. The hash method does not produce any output till it's read all the input; the old method can produce a few rows very cheaply if foo is indexed. Also, of course, the hash method fails if you have too many groups to permit the hashtable to fit in memory. regards, tom lane