Re: CREATE INDEX and HOT - revised design - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: CREATE INDEX and HOT - revised design |
Date | |
Msg-id | 1174592718.6069.172.camel@silverbirch.site Whole thread Raw |
In response to | Re: CREATE INDEX and HOT - revised design ("Pavan Deolasee" <pavan.deolasee@gmail.com>) |
Responses |
Re: CREATE INDEX and HOT - revised design
Re: CREATE INDEX and HOT - revised design |
List | pgsql-hackers |
On Thu, 2007-03-22 at 22:11 +0530, Pavan Deolasee wrote: > With this background, I propose to index ONLY the head of the > HOT-chain. The TID of the root tuple is used instead of the actual > TID of the tuple being indexed. This index will not be available to > the transactions which are started before the CREATE INDEX > transaction. Just like we use "indisvalid" flag to avoid including > an invalid index in the plan, we use the pg_index "xid" to decide > whether to use the index in the plan or not. Only transactions with > txid > pg_index:xid can see the index and use it. > > In fact, the serializable transactions started before CREATE INDEX > can not anyway see the index so all this is done to handle > read-committed transactions. > > In this proposal we indexed only the latest version. But none of the > transactions started after CREATE INDEX can anyway see the > older tuples and hence we should be fine even if we don't index > them in the new index. And none of the older transaction can see > the index, so again we are safe. The design also helps us to > preserve the heap HOT semantics and chain pruning and does not > need VACUUM or any special handling. Well, ISTM you've nailed it. CREATE INDEX returns as soon as possible, but people will have to wait for their next transaction before they can see it and use it too. Nice role reversal to avoid having CREATE INDEX wait. No restrictions on the number of indexes, no restrictions on multiple concurrent index builders and we can do this in just one pass. The ShareLock taken by CREATE INDEX guarantees all transactions that wrote data to the table have completed and that no new data can be added until after the index build commits. So the end of the chain is visible to CREATE INDEX and won't change. As long as you index the latest committed version on each HOT chain, then I think it works. Clearly want to ignore aborted versions. Sounds like you'll need to read the HOT chains in sequence to ensure we don't repeat the VACUUM FULL error. If there are no HOT chains then it will be just a normal seq scan of each block, so there's no real speed loss for situations where no HOT updates have taken place, such as reload from pg_dump. Sounds like you'll need to store the Next TransactionId rather than the TransactionId of the CREATE INDEX. We don't need to store the ComboId as well, since all commands are planned in ComboId sequence, assuming plan invalidation blows away any earlier plans held by our own backend. There is a slight hole in that SERIALIZABLE transactions won't be able to use any indexes they build during their transaction, since they may need to be able to see prior data, but I don't think anybody is going to complain about that restriction. Anyone? Maybe we can use this technique for CREATE INDEX CONCURRENTLY as well, so that it doesn't have to wait either. That needs some careful thinking... it may not work the same because of the locking differences. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: