Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint) - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint) |
Date | |
Msg-id | CAMsr+YE8_XgRzkJm=e_z=t9E=_wLZBHBFioyMs5jqXVQnpjPXg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint) (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)
|
List | pgsql-hackers |
On 10 March 2017 at 02:55, Robert Haas <robertmhaas@gmail.com> wrote: > Well, that's why I tried to advocate that their should be two separate > XID limits in shared memory: leave what's there now the way it is, and > then add a new limit for "don't try to look up XIDs before this point: > splat". I still think that'd be less risky. I'm coming back to this "cold" after an extended break, so I hope I get the details right. TL;DR: doing it that way duplicates a bunch of stuff and is ugly without offering significant benefit over just fixing the original. I started out with the approach you suggested, but it turns out to be less helpful than you'd think. Simply advancing a new lower limit field before beginning truncation is no help; there's then a race where the lower-limit field can be advanced after we test it but before we actually do the SLRU read from clog. To guard against this it's necessary for SLRU truncation to take an exclusive lock during which it advances the lower bound. That way a concurrent reader can take the lock in shared mode before reading the lower bound and hold it until it finishes the clog read, knowing that vacuum cannot then truncate the data out from under it because it'll block trying to advance the lower limit. A spinlock isn't suitable for this. While we can take the lock only briefly to update the limit field in vac_truncate_clog, in txid_status() we have to hold it from when we test the boundary through to when we finish the SLRU clog lookup, and that lookup does I/O and might sleep. If we release it after testing the lower bound but before the SLRU lookup our race comes back since vacuum can jump in and truncate it out from under us. So now we need a new LWLock used only for vac_truncate_clog before advancing the clog truncation. I implemented just that - a new ClogTruncateLog in the lwlocks array and a new field in ShmemVariableCache for the lower xid bound, IIRC. Other than requiring an extra lwlock acquisition for vac_truncate_clog it works fine ... for the master. But it doesn't fix the much bigger race on the standby. We only emit WAL for xid limits after we truncate clog, and the clog truncation record doesn't record the new limit. So now we need a new, somewhat redundant, xlog record and redo function for this lower clog bound pointer. Which, really, is only actually tracking a slightly more up to date version of oldestXid. At that point I was just papering around a race that should just be fixed at its source instead. Advance oldestXid before truncating clog, and write a clog truncation record that includes the new oldestXid. So... I can go back to the old approach and just add the new xlog record and redo method, new lwlock, new shmemvariablecache field, etc, if you're really concerned this approach is risky. But I'd rather fix the original problem instead. It might be helpful if I separate out the patch that touches oldestXid from the rest for separate review, so I'll update here with a 2-patch series instead of a squashed single patch soon. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: