Home > mailing lists

Re: Reduce pinning in btree indexes - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Reduce pinning in btree indexes
Date	February 23, 2015 19:49:21
Msg-id	54EB8420.7090907@vmware.com Whole thread Raw
In response to	Reduce pinning in btree indexes (Kevin Grittner <kgrittn@ymail.com>)
Responses	Re: Reduce pinning in btree indexes Re: Reduce pinning in btree indexes
List	pgsql-hackers

Tree view

On 02/15/2015 02:19 AM, Kevin Grittner wrote:
> Interestingly, the btree README points out that using the old TID
> with a new tuple poses no hazard for a scan using an MVCC snapshot,
> because the new tuple would not be visible to a snapshot created
> that long ago.

The first question is: Do we really need that interlock for the non-MVCC 
snapshots either?

If we do: For non-MVCC snapshots, we need to ensure that all index scans 
that started before the VACUUM did complete before the VACUUM does. I 
wonder if we could find a different mechanism to enforce that. Using the 
pin-interlock for that has always seemed a bit silly to me. Perhaps grab 
a new heavy-weight lock on the table whenever a non-MVCC index scan on 
the table begins, and have VACUUM wait on it.

> I found that the LP_DEAD hinting
> would be a problem with an old TID, but I figured we could work
> around that by storing the page LSN into the scan position structure
> when the page contents were read, and only doing hinting if that
> matched when we were ready to do the hinting.  That wouldn't work
> for an index which was not WAL-logged, so the patch still holds
> pins for those.

Or you could use GetFakeLSNForUnloggedRel().

> Robert pointed out that the visibility information
> for an index-only scan wasn't checked while the index page READ
> lock was held, so those scans also still hold the pins.

Why does an index-only scan need to hold the pin?

> Finally, there was an "optimization" for marking buffer position
> for possible restore that was incompatible with releasing the pin.
> I use quotes because the optimization adds overhead to every move
> to the next page in order set some variables in a structure when a
> mark is requested instead of running two fairly small memcpy()
> statements.  The two-day benchmark of the customer showed no
> performance hit, and looking at the code I would be amazed if the
> optimization yielded a measurable benefit.  In general, optimization
> by adding overhead to moving through a scan to save time in a mark
> operation seems dubious.

Hmm. Did your test case actually exercise mark/restore? The memcpy()s 
are not that small. Especially if it's an index-only scan, you're 
copying a large portion of the page. Some scans call markpos on every 
tuple, so that could add up.

> At some point we could consider building on this patch to recheck
> index conditions for heap access when a non-MVCC snapshot is used,
> check the visibility map for referenced heap pages when the TIDs
> are read for an index-only scan, and/or skip LP_DEAD hinting for
> non-WAL-logged indexes.  But all those are speculative future work;
> this is a conservative implementation that just didn't modify
> pinning where there were any confounding factors.

Understood. Still, I'd like to see if we can easily get rid of the 
pinning altogether.
- Heikki

pgsql-hackers by date:

From: Tom Lane
Date: 23 February 2015, 19:43:26
Subject: Precedence of NOT LIKE, NOT BETWEEN, etc

From: Tom Lane
Date: 23 February 2015, 20:08:31
Subject: Re: Precedence of NOT LIKE, NOT BETWEEN, etc

Re: Reduce pinning in btree indexes - Mailing list pgsql-hackers

Previous

Next