Re: temporary tables, indexes, and query plans - Mailing list pgsql-performance

From Robert Haas
Subject Re: temporary tables, indexes, and query plans
Date
Msg-id AANLkTikeaQ98udD7hvmiHtUJQ9LYR0A9+yfsOucUnBc3@mail.gmail.com
Whole thread Raw
In response to Re: temporary tables, indexes, and query plans  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: temporary tables, indexes, and query plans
List pgsql-performance
On Sat, Nov 13, 2010 at 7:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sat, Nov 13, 2010 at 10:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> OK, this is an artifact of the "HOT update" optimization.  Before
>>> creating the index, you did updates on the table that would have been
>>> executed differently if the index had existed.  When the index does get
>>> created, its entries for those updates are incomplete, so the index
>>> can't be used in transactions that could in principle see the unmodified
>>> rows.
>
>> Is the "in principle" here because there might be an open snapshot
>> other than the one under which CREATE INDEX is running, like a cursor?
>
> Well, the test is based on xmin alone, not cmin, so it can't really tell
> the difference.  It's unclear that it'd be worth trying.

Yeah, I'm not familiar with the logic in that area of the code, so I
can't comment all that intelligently.  However, I feel like there's a
class of things that could potentially be optimized if we know that
the only snapshot they could affect is the one we're currently using.
For example, when bulk loading a newly created table with COPY or
CTAS, we could set the xmin-committed hint bit if it weren't for the
possibility that some snapshot with a command-ID equal to or lower
than our own might take a look and get confused.  That seems to
require a BEFORE trigger or another open snapshot.  And, if we
HOT-update a tuple created by our own transaction that can't be of
interest to anyone else ever again, it would be nice to either mark it
for pruning or maybe even overwrite it in place; similarly if we
delete such a tuple it would be nice to schedule its execution.  There
are problems with all of these ideas, and I'm not totally sure how to
make any of it work, but to me this sounds suspiciously like another
instance of a somewhat more general problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: temporary tables, indexes, and query plans
Next
From: Marti Raudsepp
Date:
Subject: Re: MVCC performance issue