Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index |
Date | |
Msg-id | CAEepm=3S+MtY+K20_yjj1h1n3vk2m7d0EYNmxZttb_01y18o8w@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
|
List | pgsql-hackers |
On Sun, Mar 4, 2018 at 12:53 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Mar 2, 2018 at 9:27 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> Hmm. I notice that this calls PredicateLockPageSplit() after both >> calls to _hash_splitbucket() (the one in _hash_finish_split() and the >> one in _hash_expandtable()) instead of doing it inside that function, >> and that _hash_splitbucket() unlocks bucket_nbuf before returning. >> What if someone else accesses bucket_nbuf between >> LockBuffer(bucket_nbuf, BUFFER_LOCK_UNLOCK) and >> PredicateLockPageSplit()? Doesn't that mean that another session can >> read a newly created page and miss a predicate lock that is about to >> be transferred to it? > > Yes. I think you are primarily worried about if there is an insert on > new bucket from another session as scans will anyway take the > predicate lock, right? Yeah. >> If that is indeed a race, could it be fixed by >> calling PredicateLockPageSplit() at the start of _hash_splitbucket() >> instead? > > Yes, but I think it would be better if we call this once we are sure > that at least one tuple from the old bucket has been transferred > (consider if all tuples in the old bucket are dead). Apart from this, > I think this patch has missed handling the cases where we scan the > buckets when the split is in progress. In such cases, we scan both > old and new bucket, so I think we need to ensure that we take > PredicateLock on both the buckets during such scans. Hmm. Yeah. So, in _hash_first(), do you think we might just need this? if (H_BUCKET_BEING_POPULATED(opaque)) { ... old_blkno = _hash_get_oldblock_from_newbucket(rel, bucket); ... old_buf = _hash_getbuf(rel, old_blkno, HASH_READ, LH_BUCKET_PAGE); + PredicateLockPage(rel, BufferGetBlockNumber(old_buf), scan->xs_snapshot); TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(old_buf)); That is, if you begin scanning a 'new' bucket, we remember the old bucket and go and scan that too, so we'd better predicate-lock both up front (or I suppose we could do it later when we visit that page, but here it can be done in a single place). What if we begin scanning an 'old' bucket that is being split? I think we'll only do that for tuples that actually belong in the old bucket after the split, so no need to double-lock? And I don't think a split could begin while we are scanning. Do I have that right? As for inserting, I'm not sure if any special treatment is needed, as long as the scan code path (above) and the split code path are correct. I'm not sure though. I'm wondering how to test all this. I'm thinking about a program that repeatedly creates a hash index and then slowly adds more things to it so that buckets split (maybe using distinct keys carefully crafted to hit the same bucket?), while concurrently hammering it with a ton of scans and then ... somehow checking correctness... -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: