Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CA+Tgmob_Zf8rsDoHY8U=1=yc49CsAZ862DnM920QDm_gsAdqJA@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Amit Khandekar <amitdkhan.pg@gmail.com>) |
Responses |
Re: POC: Cleaning up orphaned files using undo logs
|
List | pgsql-hackers |
On Tue, Jul 23, 2019 at 10:42 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > I think, even though there might not be a correctness issue with the > current code as it stands, we should still use a local variable. > Updating MyUndoWorker is a big side-effect, which the caller is not > supposed to be aware of, because all that function should do is just > get the slot info. Absolutely right. It's just routine good practice to avoid using global variables when there is no compelling reason to do otherwise. The reason you state here is one of several good ones. > Yes, I also think that the function would error out only because of > can't-happen cases, like "too many locks taken" or "out of binary heap > slots" or "out of memory" (this last one is not such a can't happen > case). These cases happen probably due to some bugs, I suppose. But I > was wondering : Generally when the code errors out with such > can't-happen elog() calls, worst thing that happens is that the > transaction gets aborted. Whereas, in this case, the worst thing that > could happen is : the undo action would never get executed, which > means selects for this tuple will keep on accessing the undo log ? > This does not sound like any data consistency issue, so we should be > fine after all ? I don't think so. Every XID present in undo has to be something we can look up in CLOG to figure out which transactions are aborted and which transactions are committed, so that we know which transactions need undo. If we forget to undo the transaction, we can't discard it, which means we can't advance the CLOG transaction horizon, which means we'll eventually start failing to assign XIDs, leading to a refusal of all write transactions. Oops. More generally, it's not OK for the generic undo layer to make assumptions about whether the operations performed by the undo handlers are essential or not. We don't want to impose a design constraint the undo can only be used for things that are not actually critical, because that will make it hard to write AMs that use it. And there's no reason to live with such a design constraint anyway, because, as noted above, CLOG truncation requires it. More generally still, some can't-happen situations should be checked via Assert() and others via elog(). For example, consider some code that looks up a syscache tuple and pulls data from the returned tuple. If the code that handles DDL is written in such a way that the tuple should always exist, then this is a can't-happen situation, but generally the code checks this via elog(), not Assert(), because it could also happen due to the catalog contents being corrupted. If Assert() were used, the checks would not run in production builds, and a corrupt catalog would lead to a seg fault. An elog() is much friendlier. As a general principle, when a certain thing ought to always be true, but it being true depends on a whole lot of assumptions elsewhere in the code, and especially if it also depends on assumptions like "the database is not corrupted," I think elog() is preferable. Assert() is better for things that are more localized and that really can't go wrong for any reason other than a bug. In this case, I think I would tend towards elog(PANIC), but it's arguable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: