Re: Reduce pinning in btree indexes - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Reduce pinning in btree indexes |
Date | |
Msg-id | 54EB8420.7090907@vmware.com Whole thread Raw |
In response to | Reduce pinning in btree indexes (Kevin Grittner <kgrittn@ymail.com>) |
Responses |
Re: Reduce pinning in btree indexes
Re: Reduce pinning in btree indexes |
List | pgsql-hackers |
On 02/15/2015 02:19 AM, Kevin Grittner wrote: > Interestingly, the btree README points out that using the old TID > with a new tuple poses no hazard for a scan using an MVCC snapshot, > because the new tuple would not be visible to a snapshot created > that long ago. The first question is: Do we really need that interlock for the non-MVCC snapshots either? If we do: For non-MVCC snapshots, we need to ensure that all index scans that started before the VACUUM did complete before the VACUUM does. I wonder if we could find a different mechanism to enforce that. Using the pin-interlock for that has always seemed a bit silly to me. Perhaps grab a new heavy-weight lock on the table whenever a non-MVCC index scan on the table begins, and have VACUUM wait on it. > I found that the LP_DEAD hinting > would be a problem with an old TID, but I figured we could work > around that by storing the page LSN into the scan position structure > when the page contents were read, and only doing hinting if that > matched when we were ready to do the hinting. That wouldn't work > for an index which was not WAL-logged, so the patch still holds > pins for those. Or you could use GetFakeLSNForUnloggedRel(). > Robert pointed out that the visibility information > for an index-only scan wasn't checked while the index page READ > lock was held, so those scans also still hold the pins. Why does an index-only scan need to hold the pin? > Finally, there was an "optimization" for marking buffer position > for possible restore that was incompatible with releasing the pin. > I use quotes because the optimization adds overhead to every move > to the next page in order set some variables in a structure when a > mark is requested instead of running two fairly small memcpy() > statements. The two-day benchmark of the customer showed no > performance hit, and looking at the code I would be amazed if the > optimization yielded a measurable benefit. In general, optimization > by adding overhead to moving through a scan to save time in a mark > operation seems dubious. Hmm. Did your test case actually exercise mark/restore? The memcpy()s are not that small. Especially if it's an index-only scan, you're copying a large portion of the page. Some scans call markpos on every tuple, so that could add up. > At some point we could consider building on this patch to recheck > index conditions for heap access when a non-MVCC snapshot is used, > check the visibility map for referenced heap pages when the TIDs > are read for an index-only scan, and/or skip LP_DEAD hinting for > non-WAL-logged indexes. But all those are speculative future work; > this is a conservative implementation that just didn't modify > pinning where there were any confounding factors. Understood. Still, I'd like to see if we can easily get rid of the pinning altogether. - Heikki
pgsql-hackers by date: