Re: Multixid hindsight design - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Multixid hindsight design |
Date | |
Msg-id | CA+TgmoY8NMWnr8TaEnATV56y3NwyRZ0WFaAA9gSBz2Y61D7rxA@mail.gmail.com Whole thread Raw |
In response to | Re: Multixid hindsight design (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Multixid hindsight design
|
List | pgsql-hackers |
On Fri, Jun 5, 2015 at 10:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: > It would be a great deal nicer if we didn't have to keep ANY of the > transactional data for a tuple around once it's all-visible. Heikki > defined ephemeral as "only needed when xmin or xmax is in-progress", > but if we extended that definition slightly to "only needed when xmin > or xmax is in-progress or commited but not all-visible" then the > amount non-ephemeral data in the tuple header is 5 bytes (infomasks + > t_hoff). OK, I was wrong here: if you only have that stuff, you can't distinguish between a tuple that is visible to everyone and a tuple that is visible to no one. I think the minimal amount of data we need in order to distinguish visibility once no relevant transactions are in progress is one XID: either XMIN, if the tuple was never updated at all or only be the inserting transaction or one of its subxacts; or XMAX, if the inserting transaction committed. The other visibility information -- including (1) the other of XMIN and XMAX, (2) CMIN and CMAX, and (3) the CTID -- are only interesting the transactions involved are no longer running and, if they committed, visible to all running transactions. Heikki's proposal is basically to merge the 4-byte CID field and the first 4 bytes of the CTID that currently store the block number into one 8-byte field that can store a pointer into this new TED structure. And after mulling it over, that sounds pretty good to me. It's true (as has been pointed out by several people) that the TED will need to be persistent because of prepared transactions. But it would still be a big improvement over the status quo, because: (1) We would no longer need to freeze MultiXacts. TED wouldn't need to be frozen either. You'd just truncate it whenever RecentGlobalXmin advances. (2) If the TED becomes horribly corrupted, you can recover by committing or aborting any prepared transactions, shutting the system down, and truncating it, with no loss of data integrity. Nothing in the TED is required to determine whether tuples are visible to an unrelated transaction - you only need it (a) to determine whether tuples are visible to a particular command within a transaction that has inserted, updated, or deleted the tuple and (b) determine whether tuples are locked. (3) As a bonus, we'd eliminate combo CIDs, because the TED could have space to separately store CMIN and CMAX. Combo CIDs required special handling for logical decoding, and they are one of the nastier barriers to making parallelism support writes (because they are stored in backend-local memory of unbounded size and therefore can't easily be shared with workers), so it wouldn't be very sad if they went away. I'm not quite sure how to decide whether something like this worth (a) the work and (b) the risk of creating new bugs, but the more I think about it, the more the principal of the thing seems sound to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: