Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CA+TgmoZ8AOC0h=j9WEkOV_FVZFW54Q00aVDXMZTJp9jeWZNv=Q@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: POC: Cleaning up orphaned files using undo logs
Re: POC: Cleaning up orphaned files using undo logs |
List | pgsql-hackers |
On Wed, Jul 10, 2019 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > As of now, after we finish executing the rollback actions, the entry > from the hash table is removed. Now, at a later time (when queues are > full and we want to insert a new entry) when we access the queue entry > (to check whether we can remove it) corresponding to the removed hash > table entry, will it be safe to access it? The hash table entry might > have been freed or would have been reused as some other entry by the > time we try to access it. Hmm, yeah, that's a problem. I think we could possibly fix this by having the binary heaps just store a FullTransactionId rather than a pointer to the RollBackHashEntry. Then, if you get a FullTransactionId from the binary heap, you just do a hash table lookup to find the RollBackHashEntry instead of accessing it directly. If it doesn't exist, then you can just discard the entry: it's for some old transaction that's no longer relevant. However, there are a few problems with that idea. One is that I see that you've made the hash table keyed by full_xid + start_urec_ptr rather than just full_xid, so if the queues just point to an XID, it's not enough to find the hash table entry. The comment claims that this is necessary because "in the same transaction, there could be rollback requests for both logged and unlogged relations," but I don't understand why that means we need start_urec_ptr in the hash table key. It would seem more natural to me to have a single entry that covers both the logged and the unlogged undo for that transaction. (Incidentally, I don't think it's correct that RollbackHashEntry starts with FullTransactionId full_xid + UndoRecPtr start_uprec_ptr declared separately; I think it should start with RollbackHashKey - although if we change the key back to just a FullTransactionId then we don't need to worry separately about fixing this issue.) Another problem is that on a 64-bit system, we can pass a FullTransactionId by value, but on a 32-bit system we can't. That's awkward, because if we can't pass the XID by value, then we're back to needing a separately-allocated structure for the queue entries, which I was really hoping to avoid. A second possible approach to this problem is to just reset all the binary heaps (using binaryheap_reset) whenever we insert a new entry into the hash table, and rebuild them the next time they're needed by reinserting all of the current entries in the hash table. That might be too inefficient. You can insert a bunch of things in a row without re-heaping, and you can dequeue a bunch of things in a row without re-heaping, but if they alternate you'll re-heap a lot. I don't know whether that costs enough to worry about; it might be fine. A third possible approach is to allocate a separate array whose entries are reused, and to maintain a freelist of entries from that array. All the real data is stored in this array, and the binary heaps and hash table entries just point to it. When the freelist is empty, the next allocate scans all the binary heaps and removes any pointers to inactive entries; it then puts all inactive entries back onto the freelist. This is more complex than the previous approach, and it doesn't totally avoid re-heaping, because removing pointers to inactive entries from the binary heaps will necessitate a re-heap on next access. However, if the total capacity of the data structures is large compared to the number of entries actually in use, which will usually be true, we'll have to re-heap much less often, because we only have to do it when the number of allocations exhausts *everything* on the free-list, rather than after every allocation. A fourth possible approach is to enhance the simplehash mechanism to allow us to do cleanup when an item to which there might still be residual pointers is reused. We could allow some code supplied by the definer of an individual simplehash implementation to be executed inside SH_INSERT, just at the point where we're going to make an entry status SH_STATUS_IN_USE. What we'd do is add a flag to the structure indicating whether there might be deferred cleanup work for that entry. Maybe it would be called something like 'bool processed' and set when we process the undo work for that entry. If, when we're about to reuse an entry, that flag is set, then we go scan all the binary heaps and remove all entries for which that flag is set. And then we unset the flag for all of those entries. Like the previous approach, this is basically a refinement of the second approach in that it tries to avoid re-heaping too often. Here, instead of re-heaping once we've been through the entire free-list, we'll re-heap when we happen (more or less randomly) happen to reuse a hash table entry that's been reused, but we avoid it when we happen to snag a hash table entry that hasn't been reused recently. This is probably less efficient at avoiding re-heaping than the previous approach, but it avoids a separately-allocated data structure, which is nice. Broadly, you are correct to point out that you need to avoid chasing stale pointers, and there are a bunch of ways to accomplish that: approach #1 avoids using real pointers, and the rest just make sure that any stale pointers don't stick around long enough to cause any harm. There are probably also several other totally realistic alternatives, and I don't know for sure what is best, or how much it matters. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: