Home > mailing lists

Re: [HACKERS] Page Scan Mode in Hash Index - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: [HACKERS] Page Scan Mode in Hash Index
Date	August 9, 2017 12:28:28
Msg-id	CAA4eK1Kph6PEvd0mO8vamYT_2kXu9J_tOX2g_rx5U5fyje364w@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Page Scan Mode in Hash Index (Ashutosh Sharma <ashu.coek88@gmail.com>)
Responses	Re: [HACKERS] Page Scan Mode in Hash Index Re: [HACKERS] Page Scan Mode in Hash Index
List	pgsql-hackers

Tree view

On Mon, Aug 7, 2017 at 5:50 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> On Fri, Aug 4, 2017 at 7:14 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Sun, Jul 30, 2017 at 2:07 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>> Hi,
>>>
>>> On Wed, May 10, 2017 at 2:28 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>>> While doing the code coverage testing of v7 patch shared with - [1], I
>>>> found that there are few lines of code in _hash_next() that are
>>>> redundant and needs to be removed. I basically came to know this while
>>>> testing the scenario where a hash index scan starts when a split is in
>>>> progress. I have removed those lines and attached is the newer version
>>>> of patch.
>>>>
>>>
>>> Please find the new version of patches for page at a time scan in hash
>>> index rebased on top of latest commit in master branch. Also, i have
>>> ran pgindent script with pg_bsd_indent version 2.0 on all the modified
>>> files. Thanks.
>>>
>>
>> Few comments:
>
> Thanks for reviewing the patch.
>

Comments on the latest patch:

1.
/*
+ * We remember prev and next block number along with current block
+ * number so that if fetching the tuples using cursor we know the
+ * page from where to begin. This is for the case where we have
+ * reached the end of bucket chain without finding any matching
+ * tuples.
+ */
+ if (!BlockNumberIsValid(opaque->hasho_nextblkno))
+ prev_blkno = opaque->hasho_prevblkno;

This doesn't seem to be the right place for this comment as you are
not saving next or current block number here.  I am not sure whether
you really need this comment at all.  Will it be better if you move
this to else part of the loop and reword it as:

"Remember next and previous block numbers for scrollable cursors to
know the start position."

2. The code in _hash_readpage looks quite bloated.  I think we can
make it better if we do something like below.
a.
+_hash_readpage(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
{
..
+ if (so->currPos.buf == so->hashso_bucket_buf ||
+ so->currPos.buf == so->hashso_split_bucket_buf)
+ {
+ so->currPos.prevPage = InvalidBlockNumber;
+ so->currPos.nextPage = opaque->hasho_nextblkno;
+ LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+ }
+ else
+ {
+ so->currPos.prevPage = opaque->hasho_prevblkno;
+ so->currPos.nextPage = opaque->hasho_nextblkno;
+ _hash_relbuf(rel, so->currPos.buf);
+ so->currPos.buf = InvalidBuffer;
+ }
..
}

This code chunk is same for both forward and backward scans.  I think
you can have single copy of this code by moving it out of if-else
loop.

b.
+_hash_readpage(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
{
..
+ /* new page, locate starting position by binary search */
+ offnum = _hash_binsearch(page, so->hashso_sk_hash);
+
+ itemIndex = _hash_load_qualified_items(scan, page, offnum, dir);
+
+ while (itemIndex == 0)
+ {
+ /*
+ * Could not find any matching tuples in the current page, move to
+ * the next page. Before leaving the current page, also deal with
+ * any killed items.
+ */
+ if (so->numKilled > 0)
+ _hash_kill_items(scan);
+
+ /*
+ * We remember prev and next block number along with current block
+ * number so that if fetching the tuples using cursor we know the
+ * page from where to begin. This is for the case where we have
+ * reached the end of bucket chain without finding any matching
+ * tuples.
+ */
+ if (!BlockNumberIsValid(opaque->hasho_nextblkno))
+ prev_blkno = opaque->hasho_prevblkno;
+
+ _hash_readnext(scan, &buf, &page, &opaque);
+ if (BufferIsValid(buf))
+ {
+ so->currPos.buf = buf;
+ so->currPos.currPage = BufferGetBlockNumber(buf);
+ so->currPos.lsn = PageGetLSN(page);
+ offnum = _hash_binsearch(page, so->hashso_sk_hash);
+ itemIndex = _hash_load_qualified_items(scan, page,
+   offnum, dir);
..
}

Have just one copy of code search the offset and load qualified items.
Change the above while loop to do..while loop and have a check in
between break the loop when item index is not zero.

3.
- * Find the first item in the index that
- * satisfies the qualification associated with the scan descriptor. On
- * success, the page containing the current index tuple is read locked
- * and pinned, and the scan's opaque data entry is updated to
- * include the buffer.
+ * We find the first item (or, if backward scan, the last item) in
+ * the index that satisfies the qualification associated with the
+ * scan descriptor. On success, the page containing the current
+ * index tuple is read locked and pinned, and data about the
+ * matching tuple(s) on the page has been loaded into so->currPos,
+ * scan->xs_ctup.t_self is set to the heap TID of the current tuple.
+ *
+ * If there are no matching items in the index, we return FALSE,
+ * with no pins or locks held. */bool_hash_first(IndexScanDesc scan, ScanDirection dir)
@@ -229,15 +297,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)

I don't think on success, the lock or pin is held anymore, after this
patch the only pin will be retained and that too for bucket page.
Also, there is no need to modify the part of the comment which is not
related to change in this patch.

I don't see scan->xs_ctup.t_self getting set anywhere in this
function.  I think you are setting it in hashgettuple.  It is better
to move that assignment from hashgettuple to _hash_first so as to be
consistent with _hash_next.

4.
+ * On successful exit, scan->xs_ctup.t_self is set to the TID
+ * of the next heap tuple, and if requested, scan->xs_itup
+ * points to a copy of the index tuple.  so->currPos is updated
+ * as needed.
+ *
+ * On failure exit (no more tuples), we return FALSE with no
+ * pins or locks held. */bool_hash_next(IndexScanDesc scan, ScanDirection dir)

I don't see the usage of scan->xs_itup in this function.  It seems to
me that you have copied it from btree code and forgot to remove part
of the comment which is not relevant to a hash index.

5.
@@ -514,19 +422,14 @@ hashendscan(IndexScanDesc scan)
{
..
+ if (HashScanPosIsValid(so->currPos)) {
- LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
- _hash_kill_items(scan);
- LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
+ /* Before leaving current page, deal with any killed items */
+ if (so->numKilled > 0)
+ _hash_kill_items(scan);
+ _hash_dropscanbuf(rel, so); }

- _hash_dropscanbuf(rel, so);
-
..
}

I don't think it is a good idea to move _hash_dropscanbuf as that
check just ensures if the current buffer is valid.  It doesn't say
anything about other buffers saved in HashScanOpaque.  A similar
change is required in hashrescan.

6.
_hash_kill_items(IndexScanDesc scan)
{
..
+ if (so->hashso_bucket_buf == so->currPos.buf ||
+ HashScanPosIsPinned(so->currPos))
..
}

Isn't second check enough?  It should indirectly cover the first test.

7.
_hash_kill_items(IndexScanDesc scan)
{
..
+ /*
+ * If page LSN differs it means that the page was modified since the
+ * last read. killedItems could be not valid so LP_DEAD hints apply-
+ * ing is not safe.
+ */
+ page = BufferGetPage(buf);
+ if (PageGetLSN(page) != so->currPos.lsn)
+ {
+ _hash_relbuf(rel, buf);
+ return;
+ }
..
}

How does this check cover the case of unlogged tables?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Beena Emerson
Date: 09 August 2017, 11:24:32
Subject: Re: [HACKERS] Default Partition for Range

From: Thomas Munro
Date: 09 August 2017, 12:38:51
Subject: Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix inadequacies in recentlyadded wait events

Re: [HACKERS] Page Scan Mode in Hash Index - Mailing list pgsql-hackers

Previous

Next