custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption |
Date | |
Msg-id | 525058FE.1060609@fuzzy.cz Whole thread Raw |
Responses |
Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly
high memory consumption
|
List | pgsql-hackers |
Hi, I've been playing with a custom COUNT(DISTINCT) aggregate based on hash tables - it seems to be working as planned, except for memory consumption which is much higher than I expected. The code is available at https://github.com/tvondra/count_distinct and in short it works like this: 1) There's a simple hash table for each group (so with HashAggregate it's effectively a hash table of hash tables). Thehash table contains only distinct values for that particular group, so the result is simply equal to number of itemsin the table. 2) The table starts very small (4 buckets), and when it grows too large (more 20 items in a bucket on average) it gets resizedon the fly to 4x the size. So the buckets grows 4 -> 16 -> 64 -> 256 -> ... (and the max number of items in thetable grows 80 -> 320 -> 1280 ...). Let's assume I have a table like this CREATE TABLE test_table (a int, b int); INSERT INTO test_table SELECT mod(i,4000), (100000*random())::int FROM generate_series(1,80000000) s(i); And let's run a query like this: SELECT a, count_distinct(b) FROM test_table GROUP a; I.e. there are ~4k groups with ~20k distinct values for each group. This query however consumes >5GB of memory, which is much more than I expected, and I've spent a fair amount of time looking for memory leaks in the code so I'm wondering if I'm missing something important. The custom hash tables are built from these very simple structures: typedef struct hash_element_t { uint32 hash; uint32 length; char *value; /* the actual value (say4B for int4) */ } hash_element_t; typedef struct hash_bucket_t { uint32 nitems; hash_element_t * items; /* array of hash_element_t items */ } hash_bucket_t; typedef struct hash_table_t { uint16 nbits; /* number of significant bits of the hash */ uint32 nbuckets;/* number of buckets (2^nbits) */ uint32 nitems; /* current number of elements in the table */ hash_bucket_t* buckets; /* array of hash_bucket_t */ } hash_table_t; I'm on 64-bit architecture and the example works with int32, which means the sizes should be about this: hash_element_t => 20B hash_bucket_t => 4B + (20B * items in the bucket [in steps of 5]) hash_table_t => 4B + spacefor buckets In the example above, there's ~20k unique values in each group. The threshold is 20 items per bucket on average, so that's 1024 buckets, and the buckets are almost full. So for single group, the hash table size is about 4B + 1024 * (4B + 20 * 20B) = 413700B = ~ 400 kB There are 4000 groups, so the total estimate is ~1.6GB in total. However when executed (9.2, 9.3 and HEAD behave exactly the same), the query consumes almost ~5GB of RAM (excluding shared buffers). This is how the memory context stats: http://pastebin.com/JHjnehvC The interesting part is ExecutorState: 24576 total in 2 blocks; 16368 free (6 chunks); 8208 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks);0 used AggContext: 5106224640 total in 4612 blocks; 565779368 free (1072086 chunks); 4540445272 used TupleHashTable: 253952 total in 5 blocks; 55360 free (16 chunks); 198592 used ExprContext: 0 totalin 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used So there's ~5GB in the AggContext, ~500MB of that is free. I'm aware that there will be some more memory allocated by the HashAggregate itself, but although I'm not sure how much that should be, I would expect a "small amount" compared to the 1.6GB. So I'm wondering what uses those 3.5GB? Is this the overhead of HashAggregate, or am I missing something (e.g. a memory leak in the code)? Tomas
pgsql-hackers by date: