Re: Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller |
Date | |
Msg-id | CA+TgmoZhDc4mD1MQXxWTdL1siaJ0NTuXmakh++xmz5_nimbx=w@mail.gmail.com Whole thread Raw |
In response to | Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller (b8flowerfire <b8flowerfire@gmail.com>) |
Responses |
Re: Re: why postgresql define NTUP_PER_BUCKET as 10, not
other numbers smaller
|
List | pgsql-hackers |
On Tue, Jun 10, 2014 at 1:13 AM, b8flowerfire <b8flowerfire@gmail.com> wrote: > Thanks for the explanation. But i don't think it is very convincible. > Simply reduce the value of NTUP_PER_BUCKET will enlarge the pointer array > and reduce the tuples in one batch. But is that effect significant to the > performance? > The utilization of the work_mem, i think, is determined by the ratio of size > of the pointer and the size of the tuple. > Let's assume the size of tuple is 28 bytes, which is very reasonable because > it's the sum of the size of HJTUPLE_OVERHEAD(at least 8 bytes), the size of > MinimalTupleData(at least 10 bytes) and the content of a tuple(assume 10 > bytes). And the size of pointer is 4 bytes. The size of a pointer is 8 bytes on most platforms these days. On the flip side, we shouldn't forget that each tuple has a 2-pointer, thus 16-byte, overhead due to the way AllocSetAlloc works, and that before adding that we will round up to the nearest power of two when allocating. So in fact, in your example, each tuple will require 48 bytes on a 64-bit platform, and each pointer will require 8. So if I'm calculation correctly, the memory allocation for the pointers would be about 1.6% of the the total with NTUP_PER_BUCKET = 10 and about 14.3% of the total with NTUP_PER_BUCKET = 1. > As a result, changing the value of NTUP_PER_BUCKET to 1 may increase the > batches number by only about 10%. So it that enough to effect the > performance? Or maybe i can not do the calculation simply in this way. The problem case is when you have 1 batch and the increased memory consumption causes you to switch to 2 batches. That's expensive. It seems clear based on previous testing that *on the average* NTUP_PER_BUCKET = 1 will be better, but in the case where it causes an increase in the number of batches it will be much worse - particularly because the only way we ever increase the number of batches is to double it, which is almost always going to be a huge loss. > Besides, we have larger main-memory now. If we set the work_mem larger, the > more batches effect introduced by the smaller NTUP_PER_BUCKET value may be > reduced, couldn't it? If work_mem is large enough that we're going to do a single batch either way, or the same number of batches either way, then we can reduce NTUP_PER_BUCKET and it should be a clear win. > I have read about discussion about the NTUP_PER_BUCKET before. It seems that > if we change NTUP_PER_BUCKET to 50 or even larger, the performance wouldn't > be much worse. Because every tuple in the chain of a bucket has a hash > value. Having more tuples in a bucket simply increase some comparisons of > two integers. So is it the same if we change it smaller, that we could not > get much better? Is it one of the reasons that we define it as 10? I'm not sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: