Home > mailing lists

Re: [HACKERS] [POC] A better way to expand hash indexes. - Mailing list pgsql-hackers

From	David Steele
Subject	Re: [HACKERS] [POC] A better way to expand hash indexes.
Date	March 16, 2017 20:25:53
Msg-id	090e90c8-63be-6fa2-554f-6cd3f5348446@pgmasters.net Whole thread Raw
In response to	Re: [HACKERS] [POC] A better way to expand hash indexes. (Mithun Cy <mithun.cy@enterprisedb.com>)
Responses	Re: [HACKERS] [POC] A better way to expand hash indexes.
List	pgsql-hackers

Tree view

On 2/21/17 4:58 AM, Mithun Cy wrote:
> Thanks, Amit
> 
> On Mon, Feb 20, 2017 at 9:51 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> How will high and lowmask calculations work in this new strategy?
>> Till now they always work on doubling strategy and I don't see you
>> have changed anything related to that code.  Check below places.
> 
> It is important that the mask has to be (2^x) -1, if we have to retain
> the same hash map function. So mask variables will take same values as
> before. Only place I think we need change is  _hash_metapinit();
> unfortunately, I did not test for the case where we build the hash
> index on already existing tuples. Now I have fixed in the latest
> patch.
> 
> 
>> Till now, we have worked hard for not changing the page format in a
>> backward incompatible way, so it will be better if we could find some
>> way of doing this without changing the meta page format in a backward
>> incompatible way.
> 
> We are not adding any new variable/ deleting some, we increase the
> size of hashm_spares and hence mapping functions should be adjusted.
> The problem is the block allocation, and its management is based on
> the fact that all of the buckets(will be 2^x in number) belonging to a
> particular split-point is allocated at once and together. The
> hashm_spares is used to record those allocations and that will be used
> further by map functions to reach a particular block in the file. If
> we want to change the way we allocate the buckets then hashm_spares
> will change and hence mapping function. So I do not think we can avoid
> incompatibility issue.
> 
> One thing I can think of is if we can increase the hashm_version of
> hash index; then for old indexes, we can continue to use doubling
> method and its mapping. For new indexes, we can use new way as above.
> 
> Have you considered to store some information in
>> shared memory based on which we can decide how much percentage of
>> buckets are allocated in current table half?  I think we might be able
>> to construct this information after crash recovery as well.
> 
> I think all of above data has to be persistent. I am not able to
> understand what should be/can be stored in shared buffers. Can you
> please correct me if I am wrong?

This patch does not apply at cccbdde:

$ patch -p1 < ../other/expand_hashbucket_efficiently_02
patching file src/backend/access/hash/hashovfl.c
Hunk #1 succeeded at 49 (offset 1 line).
Hunk #2 succeeded at 67 (offset 1 line).
patching file src/backend/access/hash/hashpage.c
Hunk #1 succeeded at 502 with fuzz 1 (offset 187 lines).
Hunk #2 succeeded at 518 with fuzz 2 (offset 168 lines).
Hunk #3 succeeded at 562 (offset 163 lines).
Hunk #4 succeeded at 744 (offset 124 lines).
Hunk #5 FAILED at 774.
Hunk #6 succeeded at 869 (offset 19 lines).
Hunk #7 succeeded at 1450 (offset 242 lines).
Hunk #8 succeeded at 1464 (offset 242 lines).
Hunk #9 succeeded at 1505 (offset 242 lines).
1 out of 9 hunks FAILED -- saving rejects to file
src/backend/access/hash/hashpage.c.rej
patching file src/backend/access/hash/hashutil.c
Hunk #1 succeeded at 150 (offset 1 line).
patching file src/include/access/hash.h
Hunk #2 succeeded at 180 (offset 12 lines).
Hunk #3 succeeded at 382 (offset 18 lines).

It does apply with fuzz on 2b32ac2, so it looks like c11453c and
subsequent commits are the cause.  They represent a fairly substantial
change to hash indexes by introducing WAL logging so I think you should
reevaluate your patches to be sure they still function as expected.

Marked "Waiting on Author".

-- 
-David
david@pgmasters.net

pgsql-hackers by date:

From: George Papadrosou
Date: 16 March 2017, 20:19:54
Subject: Re: [HACKERS] GSOC - TOAST'ing in slices

From: Jeff Janes
Date: 16 March 2017, 20:30:23
Subject: Re: [HACKERS] CREATE/ALTER ROLE PASSWORD ('value' USING 'method')

Re: [HACKERS] [POC] A better way to expand hash indexes. - Mailing list pgsql-hackers

Previous

Next