making an unlogged table logged - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | making an unlogged table logged |
Date | |
Msg-id | AANLkTinENZbRxdCwOHkqBba2BHUbfY8_C5JwRXLodxcX@mail.gmail.com Whole thread Raw |
Responses |
Re: making an unlogged table logged
Re: making an unlogged table logged Re: making an unlogged table logged Re: making an unlogged table logged |
List | pgsql-hackers |
Somebody asked about this on Depesz's blog today, and I think it's come up here before too, so I thought it might be worth my writing up a few comments on this. I don't think I'm going to have time to work on this any time soon, but if someone else wants to work up a patch, I'm game to review. I think it'd clearly be a good feature. Generally, to do this, it would be necessary to do the following things (plus anything I'm forgetting): 1. Take an AccessExclusiveLock on the target table. You might think that concurrent selects could be allowed, but I believe that's not the case. Read on. 2. Verify that there are no foreign keys referencing other unlogged tables, because if that were the case then after the change we'd have a permanent table referencing an unlogged table, which would violate referential integrity. (Note that unlogged referencing permanent is OK, but permanent referencing unlogged is a no-no, so what matters when upgrading is "outbound" foreign keys.) 3. Write out all shared buffers for the target table, and drop them. This ensures that there are no buffers floating around for the target relation that are marked BM_UNLOGGED, which would be a bad thing. Or maybe it's possible to just clear the BM_UNLOGGED flag, instead of dropping them. This is the step that makes me think we need an access exclusive lock - otherwise, somebody else might read in a buffer and, seeing that the relation is unlogged (which is true, since we haven't committed yet), mark it BM_UNLOGGED. 4. fsync() any segments of the target relation - of any fork except that init fork - that might have dirty pages not on disk. 5. Arrange for the appropriate file deletions at commit or abort, by updating pendingDeletes. On commit, we want to delete the init fork for the table and all its indexes. On abort, we want to delete everything else, but only for pretend; that is, the abort record should reflect the deletions since they'll need to happen on any standbys, but we shouldn't actually perform them on the master since we don't want to obliterate the contents of the table for no reason. There's a subtle problem here I'm not quite sure how to deal with: what happens if we *crash* without writing an abort record? It seems like that could leave a stray file around on a standby, because the current code only cleans things up on the standby at the start of recovery; to make this bullet-proof, I suppose it'd need to repeat that every time a crash happens on the master, but I don't know how to do that. Note also that if wal_level is minimal, then we need only worry about the commit case; the abort case can be a no-op. 6. If wal_level != minimal, XLOG every page of every fork except the init fork, for both the table and the associated indexes. (Note that this step also requires an AccessExclusiveLock rather than some weaker lock, because of the arbitrary rule that only AccessExclusiveLocks are sent to standbys. If we held only ShareRowExclusiveLock on the master, for example, a Hot Standby backend might read the table while it's only been half-copied.) 7. Update pg_class.relpersistence from 'u' to 'p', for both the table and the associated indexes. Going the other direction ought to be possible too, although it seems somewhat less useful. For that, you'd need to flip around the check in step #2 (i.e. check for a reference FROM a permanent table), perform step #3, skip step #4, do step #5 backwards (create and log init forks, arranging for them to be removed on abort - this too has an issue with crashes that don't leave abort records behind); and arrange for the rest of the forks to be removed on commit on any standby without doing it on the master), skip step #6, and do step #7 backwards. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: