Re: tweaking NTUP_PER_BUCKET - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: tweaking NTUP_PER_BUCKET |
Date | |
Msg-id | 53BEEA9E.2080009@fuzzy.cz Whole thread Raw |
In response to | Re: tweaking NTUP_PER_BUCKET (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: tweaking NTUP_PER_BUCKET
|
List | pgsql-hackers |
On 9.7.2014 16:07, Robert Haas wrote: > On Tue, Jul 8, 2014 at 5:16 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >> Thinking about this a bit more, do we really need to build the hash >> table on the first pass? Why not to do this: >> >> (1) batching >> - read the tuples, stuff them into a simple list >> - don't build the hash table yet >> >> (2) building the hash table >> - we have all the tuples in a simple list, batching is done >> - we know exact row count, can size the table properly >> - build the table > > We could do this, and in fact we could save quite a bit of memory if > we allocated say 1MB chunks and packed the tuples in tightly instead > of palloc-ing each one separately. But I worry that rescanning the > data to build the hash table would slow things down too much. I did a quick test of how much memory we could save by this. The attached patch densely packs the tuples into 32kB chunks (1MB seems way too much because of small work_mem values, but I guess this might be tuned based on number of tuples / work_mem size ...). Tested on query like this (see the first message in this thread how to generate the tables): QUERY PLAN ----------------------------------------------------------------------- Aggregate (cost=2014697.64..2014697.65 rows=1 width=33) (actual time=63796.270..63796.271 rows=1 loops=1) -> Hash Left Join (cost=318458.14..1889697.60 rows=50000016 width=33) (actual time=2865.656..61778.592 rows=50000000 loops=1) Hash Cond: (o.id = i.id) -> Seq Scan on outer_table o (cost=0.00..721239.16 rows=50000016 width=4) (actual time=0.033..2676.234 rows=50000000 loops=1) -> Hash (cost=193458.06..193458.06 rows=10000006 width=37) (actual time=2855.408..2855.408 rows=10000000 loops=1) Buckets: 1048576 Batches: 1 Memory Usage: 703125kB -> Seq Scan on inner_table i (cost=0.00..193458.06 rows=10000006 width=37) (actual time=0.044..952.802 rows=10000000 loops=1) Planning time: 1.139 ms Execution time: 63889.056 ms (9 rows) I.e. it creates a single batch with ~700 MB of tuples. Without the patch, top shows this: VIRT RES SHR S %CPU %MEM COMMAND 2540356 1,356g 5936 R 100,0 17,6 postgres: EXPLAIN and the MemoryContextStats added to MultiExecHash shows this: HashBatchContext: 1451221040 total in 182 blocks; 2826592 free (11 chunks); 1448394448 used So yeah, the overhead is pretty huge in this case - basicaly 100% overhead, because the inner table row width is only ~40B. With wider rows the overhead will be lower. Now, with the patch it looks like this: VIRT RES SHR S %CPU %MEM COMMAND 1835332 720200 6096 R 100,0 8,9 postgres: EXPLAIN HashBatchContext: 729651520 total in 21980 blocks; 0 free (0 chunks); 729651520 used So, pretty much no overhead at all. It was slightly faster too (~5%) but I haven't done much testing so it might be measurement error. This patch is pretty independent of the other changes discussed here (tweaking NTUP_PER_BUCKET / nbuckets) so I'll keep it separate. regards Tomas
Attachment
pgsql-hackers by date: