Home > mailing lists

Re: [POC] A better way to expand hash indexes. - Mailing list pgsql-hackers

From	Mithun Cy
Subject	Re: [POC] A better way to expand hash indexes.
Date	March 28, 2017 08:13:42
Msg-id	CAD__Oui1ZfVpvNxaALx2_SLEz6UTQaUGAcFMWOvZKQwpKE30Xg@mail.gmail.com Whole thread Raw
In response to	[HACKERS] [POC] A better way to expand hash indexes. (Mithun Cy <mithun.cy@enterprisedb.com>)
Responses	Re: [POC] A better way to expand hash indexes. Re: [POC] A better way to expand hash indexes.
List	pgsql-hackers

Tree view

On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

> I think we can't change the number of buckets to be created or lowmask
> and highmask calculation here without modifying _h_spoolinit() because
> it sorts the data to be inserted based on hashkey which in turn
> depends on the number of buckets that we are going to create during
> create index operation.  We either need to allow create index
> operation to still always create buckets in power-of-two fashion or we
> need to update _h_spoolinit according to new computation.  One minor
> drawback of using power-of-two scheme for creation of buckets during
> create index is that it can lead to wastage of space and will be
> inconsistent with what the patch does during split operation.

Yes, this was a miss. Now Number of buckets allocated during
metap_init is not always a power-of-two number. The hashbuild which
uses just the hash_mask to decide which bucket does the hashkey
belong to is not sufficient. It can give buckets beyond max_buckets
and sorting of index values based on their buckets will be out of
order. When we try to actually insert the same in hash index we loose
the advantage of the spatial locality which existed before. And, hence
indexbuild performance can reduce.

As you have said we can solve it if we allocate buckets always in
power-of-2 when we do hash index meta page init. But on other
occasions, when we try to double the existing buckets we can do the
allocation in 4 equal phases.

But I think there are 2 more ways to solve same,

A. Why not pass all 3 parameters high_mask, low_mask, max-buckets to
tuplesort and let them use _hash_hashkey2bucket to figure out which
key belong to which bucket. And then sort them. I think this way we
make both sorting and insertion to hash index both consistent with
each other.

B. In tuple sort we can use hash function bucket = hash_key %
num_buckets instead of existing one which does bitwise "and" to
determine the bucket of hash key. This way we will not wrongly assign
buckets beyond max_buckets and sorted hash keys will be in sync with
actual insertion order of _hash_doinsert.

I am adding both the patches Patch_A and Patch_B. My preference is
Patch_A and I am open for suggestion.

>+#define SPLITPOINT_PHASES_PER_GRP 4
>+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
>+#define Buckets_First_Split_Group 4
Fixed.

>In the above computation +2 and -2 still bothers me.  I think you need
>to do this because you have defined split group zero to have four
>buckets, how about if you don't force that and rather define to have
>split point phases only from split point which has four or more
>buckets?

Okay as suggested instead of group zero having 4 phases of 1 bucket
each, I have recalculated the spare mapping as below.
Allocating huge chunks of bucket pages all at once isn't optimal and
we will take ages to consume those. To avoid this exponential growth
of index size, we did use a trick to breakup allocation of buckets at
the splitpoint into 4 equal phases.  If (2 ^ x) is the total buckets
need to be allocated at a splitpoint (from now on we shall call this
as a splitpoint group), then we allocate 1/4th (2 ^ (x - 2)) of total
buckets at each phase of splitpoint group. Next quarter of allocation
will only happen if buckets of the previous phase have been already
consumed.  Since for buckets number < 4 we cannot further divide it
into multiple phases, the first 3 group will have only one phase of
allocation. The groups 0, 1, 2 will allocate 1, 1, 2 buckets
respectively at once in one phase. For the groups > 2 Where we
allocate buckets > 4, the allocation process is distributed among four
equal phases. At group 3 we allocate 4 buckets in 4 different phases
{1, 1, 1, 1}, the numbers in curly braces indicate number of buckets
allocated within each phase of splitpoint group 3. And, for splitpoint
group 4 and 5 allocation phase will be {2, 2, 2, 2} = 16 buckets in
total and {4, 4, 4, 4} = 32 buckets in total.  We can see that at each
splitpoint group
we double the total number of buckets from previous group but in an
incremental phase.  The bucket pages allocated within one phase of a
splitpoint group will appear consecutively in the index.

The sortbuild_hash_*.patch can be applied independently on any of
expand_hashbucket_efficiently_08.patch
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

From: Michael Paquier
Date: 28 March 2017, 08:10:25
Subject: Re: Crash on promotion when recovery.conf is renamed

From: "Tsunakawa, Takayuki"
Date: 28 March 2017, 08:21:13
Subject: Re: Crash on promotion when recovery.conf is renamed

Re: [POC] A better way to expand hash indexes. - Mailing list pgsql-hackers

Attachment

Previous

Next