Re: Page Scan Mode in Hash Index - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Page Scan Mode in Hash Index |
Date | |
Msg-id | CAA4eK1JWj_kpUgwN2DEWOcpEXdhUoS3yHq52LCkO_2R0fyR0sw@mail.gmail.com Whole thread Raw |
In response to | Re: Page Scan Mode in Hash Index (Ashutosh Sharma <ashu.coek88@gmail.com>) |
Responses |
Re: Page Scan Mode in Hash Index
|
List | pgsql-hackers |
On Mon, Mar 27, 2017 at 7:04 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> >> I am suspicious that _hash_kill_items() is going to have problems if >> the overflow page is freed before it reacquires the lock. >> _btkillitems() contains safeguards against similar cases. > > I have added similar check for hash_kill_items() as well. > I think hash_kill_items has a much bigger problem which is that we can't kill the items if the page has been modified after re-reading it. Consider a case where Vacuum starts before the Scan on the bucket and then Scan went ahead (which is possible after your 0003 patch) and noted the killed items in killed item array and before it could kill all the items, Vacuum remove those items. Now it is quite possible that before scan tries to kill those items, the corresponding itemids have been used by different tuples. I think what we need to do here is to store the LSN of page first time when we have read the page and then compare it with current page lsn after re-reading the page in hash_kill_tems. * + HashScanPosItem items[MaxIndexTuplesPerPage]; /* MUST BE LAST */ +} HashScanPosData; .. HashScanPosItem *killedItems; /* tids and offset numbers of killed items */ int numKilled; /* number of currently storeditems */ + + /* + * Identify all the matching items on a page and save them + * in HashScanPosData + */ + HashScanPosData currPos; /* current position data */} HashScanOpaqueData; After having array of HashScanPosItems as currPos.items, I think you don't need array of HashScanPosItem for killedItems, just an integer array of index in currPos.items should be sufficient. * I think below para in README is not valid after this patch. "To keep concurrency reasonably good, we require readers to cope with concurrent insertions, which means that they have to be able to re-find their current scan position after re-acquiring the buffer content lock on page. Since deletion is not possible while a reader holds the pin on bucket, and we assume that heap tuple TIDs are unique, this can be implemented by searching for the same heap tuple TID previously returned. Insertion does not move index entries across pages, so the previously-returned index entry should always be on the same page, at the same or higher offset number, as it was before." * - next_buf = _hash_getbuf_with_strategy(rel, blkno, HASH_WRITE, - LH_OVERFLOW_PAGE, - bstrategy); - /* - * release the lock on previous page after acquiring the lock on next - * page + * As the hash index scan work in page at a time mode, + * vacuum can release the lock on previous page before + * acquiring lock on the next page. */ .. + next_buf = _hash_getbuf_with_strategy(rel, blkno, HASH_WRITE, + LH_OVERFLOW_PAGE, + bstrategy); + After this change, you need to modify comments on top of function hashbucketcleanup() and _hash_squeezebucket(). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: