Re: [HACKERS] Page Scan Mode in Hash Index - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [HACKERS] Page Scan Mode in Hash Index |
Date | |
Msg-id | CAA4eK1Kph6PEvd0mO8vamYT_2kXu9J_tOX2g_rx5U5fyje364w@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Page Scan Mode in Hash Index (Ashutosh Sharma <ashu.coek88@gmail.com>) |
Responses |
Re: [HACKERS] Page Scan Mode in Hash Index
Re: [HACKERS] Page Scan Mode in Hash Index |
List | pgsql-hackers |
On Mon, Aug 7, 2017 at 5:50 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > On Fri, Aug 4, 2017 at 7:14 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Sun, Jul 30, 2017 at 2:07 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >>> Hi, >>> >>> On Wed, May 10, 2017 at 2:28 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >>>> While doing the code coverage testing of v7 patch shared with - [1], I >>>> found that there are few lines of code in _hash_next() that are >>>> redundant and needs to be removed. I basically came to know this while >>>> testing the scenario where a hash index scan starts when a split is in >>>> progress. I have removed those lines and attached is the newer version >>>> of patch. >>>> >>> >>> Please find the new version of patches for page at a time scan in hash >>> index rebased on top of latest commit in master branch. Also, i have >>> ran pgindent script with pg_bsd_indent version 2.0 on all the modified >>> files. Thanks. >>> >> >> Few comments: > > Thanks for reviewing the patch. > Comments on the latest patch: 1. /* + * We remember prev and next block number along with current block + * number so that if fetching the tuples using cursor we know the + * page from where to begin. This is for the case where we have + * reached the end of bucket chain without finding any matching + * tuples. + */ + if (!BlockNumberIsValid(opaque->hasho_nextblkno)) + prev_blkno = opaque->hasho_prevblkno; This doesn't seem to be the right place for this comment as you are not saving next or current block number here. I am not sure whether you really need this comment at all. Will it be better if you move this to else part of the loop and reword it as: "Remember next and previous block numbers for scrollable cursors to know the start position." 2. The code in _hash_readpage looks quite bloated. I think we can make it better if we do something like below. a. +_hash_readpage(IndexScanDesc scan, Buffer *bufP, ScanDirection dir) { .. + if (so->currPos.buf == so->hashso_bucket_buf || + so->currPos.buf == so->hashso_split_bucket_buf) + { + so->currPos.prevPage = InvalidBlockNumber; + so->currPos.nextPage = opaque->hasho_nextblkno; + LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK); + } + else + { + so->currPos.prevPage = opaque->hasho_prevblkno; + so->currPos.nextPage = opaque->hasho_nextblkno; + _hash_relbuf(rel, so->currPos.buf); + so->currPos.buf = InvalidBuffer; + } .. } This code chunk is same for both forward and backward scans. I think you can have single copy of this code by moving it out of if-else loop. b. +_hash_readpage(IndexScanDesc scan, Buffer *bufP, ScanDirection dir) { .. + /* new page, locate starting position by binary search */ + offnum = _hash_binsearch(page, so->hashso_sk_hash); + + itemIndex = _hash_load_qualified_items(scan, page, offnum, dir); + + while (itemIndex == 0) + { + /* + * Could not find any matching tuples in the current page, move to + * the next page. Before leaving the current page, also deal with + * any killed items. + */ + if (so->numKilled > 0) + _hash_kill_items(scan); + + /* + * We remember prev and next block number along with current block + * number so that if fetching the tuples using cursor we know the + * page from where to begin. This is for the case where we have + * reached the end of bucket chain without finding any matching + * tuples. + */ + if (!BlockNumberIsValid(opaque->hasho_nextblkno)) + prev_blkno = opaque->hasho_prevblkno; + + _hash_readnext(scan, &buf, &page, &opaque); + if (BufferIsValid(buf)) + { + so->currPos.buf = buf; + so->currPos.currPage = BufferGetBlockNumber(buf); + so->currPos.lsn = PageGetLSN(page); + offnum = _hash_binsearch(page, so->hashso_sk_hash); + itemIndex = _hash_load_qualified_items(scan, page, + offnum, dir); .. } Have just one copy of code search the offset and load qualified items. Change the above while loop to do..while loop and have a check in between break the loop when item index is not zero. 3. - * Find the first item in the index that - * satisfies the qualification associated with the scan descriptor. On - * success, the page containing the current index tuple is read locked - * and pinned, and the scan's opaque data entry is updated to - * include the buffer. + * We find the first item (or, if backward scan, the last item) in + * the index that satisfies the qualification associated with the + * scan descriptor. On success, the page containing the current + * index tuple is read locked and pinned, and data about the + * matching tuple(s) on the page has been loaded into so->currPos, + * scan->xs_ctup.t_self is set to the heap TID of the current tuple. + * + * If there are no matching items in the index, we return FALSE, + * with no pins or locks held. */bool_hash_first(IndexScanDesc scan, ScanDirection dir) @@ -229,15 +297,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir) I don't think on success, the lock or pin is held anymore, after this patch the only pin will be retained and that too for bucket page. Also, there is no need to modify the part of the comment which is not related to change in this patch. I don't see scan->xs_ctup.t_self getting set anywhere in this function. I think you are setting it in hashgettuple. It is better to move that assignment from hashgettuple to _hash_first so as to be consistent with _hash_next. 4. + * On successful exit, scan->xs_ctup.t_self is set to the TID + * of the next heap tuple, and if requested, scan->xs_itup + * points to a copy of the index tuple. so->currPos is updated + * as needed. + * + * On failure exit (no more tuples), we return FALSE with no + * pins or locks held. */bool_hash_next(IndexScanDesc scan, ScanDirection dir) I don't see the usage of scan->xs_itup in this function. It seems to me that you have copied it from btree code and forgot to remove part of the comment which is not relevant to a hash index. 5. @@ -514,19 +422,14 @@ hashendscan(IndexScanDesc scan) { .. + if (HashScanPosIsValid(so->currPos)) { - LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE); - _hash_kill_items(scan); - LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK); + /* Before leaving current page, deal with any killed items */ + if (so->numKilled > 0) + _hash_kill_items(scan); + _hash_dropscanbuf(rel, so); } - _hash_dropscanbuf(rel, so); - .. } I don't think it is a good idea to move _hash_dropscanbuf as that check just ensures if the current buffer is valid. It doesn't say anything about other buffers saved in HashScanOpaque. A similar change is required in hashrescan. 6. _hash_kill_items(IndexScanDesc scan) { .. + if (so->hashso_bucket_buf == so->currPos.buf || + HashScanPosIsPinned(so->currPos)) .. } Isn't second check enough? It should indirectly cover the first test. 7. _hash_kill_items(IndexScanDesc scan) { .. + /* + * If page LSN differs it means that the page was modified since the + * last read. killedItems could be not valid so LP_DEAD hints apply- + * ing is not safe. + */ + page = BufferGetPage(buf); + if (PageGetLSN(page) != so->currPos.lsn) + { + _hash_relbuf(rel, buf); + return; + } .. } How does this check cover the case of unlogged tables? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: