Two-phase commit issues - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Two-phase commit issues |
Date | |
Msg-id | 25312.1116450909@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Two-phase commit issues
Re: Two-phase commit issues Re: Two-phase commit issues Re: Two-phase commit issues |
List | pgsql-hackers |
I've started to look seriously at Heikki's patch for two-phase commit. There are a few issues that probably deserve discussion: * The major missing issue that I've come across so far is that subtransaction and multixact state isn't preserved across a crash. Assuming that we want to store only top-level XIDs in the shared-memory list of prepared XIDs (which I think is important), it is essential that crash restart rebuild the pg_subxact status for prepared transactions. The subxacts of a prepared xact have to be seen as still running, and they won't be unless the subxact links are there. Since subxact.c is designed to wipe all its state on restart, we need to recreate those entries. Fortunately this doesn't seem hard: the state file for a prepared xact will include all of its subxact XIDs, and we can just do SubTransSetParent() on them while rereading the state file. (AFAICS it's sufficient to make each subxact link directly to the top XID, even if there was a more complex hierarchy originally.) Similarly, we've got to reconstruct MultiXactIds that any prepared xacts are members of, else row-level locks taken out by prepared xacts won't be enforced correctly. I think this can be handled if we add to the state files a list of all MultiXactIds that each prepared xact belongs to, and then during restart forcibly recreate those MultiXactIds. (They would only be rebuilt with prepared XIDs, not any ordinary XIDs that might originally have been members.) This seems to require some new code in multixact.c, but not anything fundamentally difficult --- Alvaro, do you see any likely problems in this stuff? * The patch is designed to dump state files into WAL as well as onto disk. Why? Wouldn't it be better just to write and fsync the state file before reporting successful prepare? That would get rid of the need for checkpoint-time fsyncs. * I'm inclined to think that the "gid" identifiers for prepared transactions ought to be SQL identifiers (names), not string literals. Was there a particular reason for making them strings? * What are we going to do with GUC variables? My feeling is that the only sane answer is that PREPARE is the same as COMMIT as far as local GUC variables go, and COMMIT/ROLLBACK PREPARED have no effect on GUC state. Otherwise it's really unclear what to do. ConsiderSET myvar = foo;BEGIN;SET myvar = bar;PREPARE gid;SHOWmyvar; -- what do you see ... foo or bar?SET myvar = baz; -- is this even legal?ROLLBACK PREPARED gid;SHOWmyvar; -- now what do you see ... foo or baz? Since local GUC changes aren't going to be saved/restored across a crash anyway, I can't see a point in doing anything really complex. * There are some fairly ugly cases associated with creation and deletion of temporary tables as well. I think we might want to just decree that you can't PREPARE a transaction that included creating or dropping a temp table. Does anyone have much of a problem with that? regards, tom lane
pgsql-hackers by date: