Re: [POC] A better way to expand hash indexes. - Mailing list pgsql-hackers
From | Mithun Cy |
---|---|
Subject | Re: [POC] A better way to expand hash indexes. |
Date | |
Msg-id | CAD__Oui1ZfVpvNxaALx2_SLEz6UTQaUGAcFMWOvZKQwpKE30Xg@mail.gmail.com Whole thread Raw |
In response to | [HACKERS] [POC] A better way to expand hash indexes. (Mithun Cy <mithun.cy@enterprisedb.com>) |
Responses |
Re: [POC] A better way to expand hash indexes.
Re: [POC] A better way to expand hash indexes. |
List | pgsql-hackers |
On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > I think we can't change the number of buckets to be created or lowmask > and highmask calculation here without modifying _h_spoolinit() because > it sorts the data to be inserted based on hashkey which in turn > depends on the number of buckets that we are going to create during > create index operation. We either need to allow create index > operation to still always create buckets in power-of-two fashion or we > need to update _h_spoolinit according to new computation. One minor > drawback of using power-of-two scheme for creation of buckets during > create index is that it can lead to wastage of space and will be > inconsistent with what the patch does during split operation. Yes, this was a miss. Now Number of buckets allocated during metap_init is not always a power-of-two number. The hashbuild which uses just the hash_mask to decide which bucket does the hashkey belong to is not sufficient. It can give buckets beyond max_buckets and sorting of index values based on their buckets will be out of order. When we try to actually insert the same in hash index we loose the advantage of the spatial locality which existed before. And, hence indexbuild performance can reduce. As you have said we can solve it if we allocate buckets always in power-of-2 when we do hash index meta page init. But on other occasions, when we try to double the existing buckets we can do the allocation in 4 equal phases. But I think there are 2 more ways to solve same, A. Why not pass all 3 parameters high_mask, low_mask, max-buckets to tuplesort and let them use _hash_hashkey2bucket to figure out which key belong to which bucket. And then sort them. I think this way we make both sorting and insertion to hash index both consistent with each other. B. In tuple sort we can use hash function bucket = hash_key % num_buckets instead of existing one which does bitwise "and" to determine the bucket of hash key. This way we will not wrongly assign buckets beyond max_buckets and sorted hash keys will be in sync with actual insertion order of _hash_doinsert. I am adding both the patches Patch_A and Patch_B. My preference is Patch_A and I am open for suggestion. >+#define SPLITPOINT_PHASES_PER_GRP 4 >+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1) >+#define Buckets_First_Split_Group 4 Fixed. >In the above computation +2 and -2 still bothers me. I think you need >to do this because you have defined split group zero to have four >buckets, how about if you don't force that and rather define to have >split point phases only from split point which has four or more >buckets? Okay as suggested instead of group zero having 4 phases of 1 bucket each, I have recalculated the spare mapping as below. Allocating huge chunks of bucket pages all at once isn't optimal and we will take ages to consume those. To avoid this exponential growth of index size, we did use a trick to breakup allocation of buckets at the splitpoint into 4 equal phases. If (2 ^ x) is the total buckets need to be allocated at a splitpoint (from now on we shall call this as a splitpoint group), then we allocate 1/4th (2 ^ (x - 2)) of total buckets at each phase of splitpoint group. Next quarter of allocation will only happen if buckets of the previous phase have been already consumed. Since for buckets number < 4 we cannot further divide it into multiple phases, the first 3 group will have only one phase of allocation. The groups 0, 1, 2 will allocate 1, 1, 2 buckets respectively at once in one phase. For the groups > 2 Where we allocate buckets > 4, the allocation process is distributed among four equal phases. At group 3 we allocate 4 buckets in 4 different phases {1, 1, 1, 1}, the numbers in curly braces indicate number of buckets allocated within each phase of splitpoint group 3. And, for splitpoint group 4 and 5 allocation phase will be {2, 2, 2, 2} = 16 buckets in total and {4, 4, 4, 4} = 32 buckets in total. We can see that at each splitpoint group we double the total number of buckets from previous group but in an incremental phase. The bucket pages allocated within one phase of a splitpoint group will appear consecutively in the index. The sortbuild_hash_*.patch can be applied independently on any of expand_hashbucket_efficiently_08.patch -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: