Thread: Bucket and batch
Hi!<br /><br />I saw that functions which use hash join, like as ExecHashGetBucketAndBatch and ExecScanHashBucket, have bucketand batch concepts.<br />So, I would like to know the differences between bucket and batch.<br />I dont understandtables partitioning mechanism in hash join.<br /><br /><br />Thanks,<br />Ana Carolina<br />
"Ana Carolina Brito de Almeida" <anacrl@ig.com.br> writes: > So, I would like to know the differences between bucket and batch. A bucket is, well, one bucket of a hash table --- it holds all the tuples that have the same hash code (for as many bits of the hash code as we are choosing to use). We try to size the hash table with enough buckets so there's not more than 10 tuples per bucket on average. A batch is a range of buckets that we process at the same time. Tuples (from either side of the join) whose hash codes show they fall into batches other than the first one get dumped into temporary holding files, and then (after finishing joining the first batch) we pull each successive batch back into memory and join that portion of the tuples. The batch size is chosen to make the amount of memory needed be approximately work_mem. IOW, there are really nbuckets * nbatches "virtual" buckets in the hash table, but only nbuckets worth of them are kept in memory at any one time. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > "Ana Carolina Brito de Almeida" <anacrl@ig.com.br> writes: >> So, I would like to know the differences between bucket and batch. > > A bucket is, well, one bucket of a hash table --- it holds all the > tuples that have the same hash code (for as many bits of the hash > code as we are choosing to use). We try to size the hash table with > enough buckets so there's not more than 10 tuples per bucket on > average. > > A batch is a range of buckets that we process at the same time. Note that we don't currently do batches for hash aggregates, only joins currently. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning