Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) - Mailing list pgsql-bugs
From | Thomas Munro |
---|---|
Subject | Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |
Date | |
Msg-id | CAEepm=2AUwgy0dZMAXsQZPiRYAqW7x1k0kUbd5nZYUjCbthzQw@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: Re: BUG #12990: Missing pg_multixact/members files
(appears to have wrapped, then truncated)
|
List | pgsql-bugs |
On Tue, Apr 21, 2015 at 12:25 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Hi Alvaro > > On Tue, Apr 21, 2015 at 7:04 AM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> Here's a patch. I have tested locally and it closes the issue for me. >> If those affected can confirm that it stops the file removal from >> happening, I'd appreciate it. > > I was also starting to look at this problem. For what it's worth, > here's a client program that I used to generate a lot of multixact > members. The patch seems to work correctly so far: as the offset > approached wraparound, I saw the warnings first with appropriate OID > and members remaining, and then I was blocked from creating new > multixacts. One thing I noticed about your patch is that it effectively halves the amount of multixact members you can have on disk. Sure, I'd rather hit an error at 2^31 members than a corrupt database at 2^32 members, but I wondered if we should try to allow the full range to be used. I'm not sure whether there is a valid use case for such massive amounts of pg_multixact/members data (or at least one that won't go away if autovacuum heuristics are changed in a later patch, also I understand that there are other recent patches that reduce member traffic), but I if the plan is to backpatch this patch then I suppose it should ideally not halve the amount of an important resource you can use in existing system when people do a point upgrade. Here's a small patch (that applies after your patch) to show how this could be done, using three-way comparisons with an explicit boundary to detect wraparound. There may be other technical problems (for example MultiXactAdvanceNextMXact still uses the MultiXactOffsetPrecedes), or this may be a bad idea just because it breaks with the well convention for wrap around detection established by xids. Also, I wanted to make sure I could reproduce the original bug/corruption in unpatched master with the client program I posted. Here are my notes on doing that (sorry if they belabour the obvious, partly this is just me learning how SLRUs and multixacts work...): ======== Member wraparound happens after segment file "14078" (assuming default page size, you get 32 pages per segment, and 1636 members per page (409 groups of 4 + some extra data), and our max member offset wraps after 0xffffffff, and 0xffffffff / 1636 / 32 = 82040 = 0x14078; incidentally that final segment is a shorter one). Using my test client with 500 sessions and 35k loops I observed this, it wrapped back around to writing to member file "0000" after creating "14078", which is obviously broken, because the start of member segment "0000" holds members for multixact ID 1, which was still in play (it was datminmxid for template0). Looking at the members of multixact ID 1 I see recent xids: postgres=# select pg_get_multixact_members('1'::xid); pg_get_multixact_members -------------------------- (34238661,sh) (34238662,sh) (2 rows) Note that pg_get_multixact_members knows the correct *number* of members for multixact ID 1, it's just that it's looking at members from some much later multixact. By a tedious binary search I found it: postgres=# select pg_get_multixact_members('17094780'::xid); pg_get_multixact_members -------------------------- ... snip ... (34238660,sh) (34238661,sh) <-- here they are! (34238662,sh) <-- (34238663,sh) ... snip ... After a checkpoint, I saw that all the files got deleted except for a few consecutively named files starting at "0000", which would be correct behavior in general, if we hadn't allowed the member offset to wrap. It had correctly kept the segments starting with the one holding the members of multixact ID 1 (the cluster-wide oldest) up until the one that corresponds to MultiXactState->nextOffset. My test program had blown right past member offset 0xffffffff and back to 0 and then kept going. The truncation code isn't the problem per se. To produce the specific error message seen by the bug reporter via normal interactions from a test program, I think we need some magic that I can't figure out how to do yet: we need to run a query that accesses a multixact that has member offset from before offset wraparound, eg 0xffffffff or similar, but whose members are not on a page that is still in memory, after a checkpoint that has unlinked the segment file, so it can try to load it and discover that the segment file is missing! So a pretty complex interaction of concurrent processes, timing and caches. We can more artificially stimulate the error by explicitly asking for multixact members like this though: postgres=# select pg_get_multixact_members('10000000'::xid); ERROR: could not access status of transaction 10000000 DETAIL: Could not open file "pg_multixact/members/BB55": No such file or directory. That's a totally valid multixact ID, obviously since it's been able to figure out which segment to look in for its members. Here's one that tries to open the segment that comes immediately before "0000" in modulo numbering: postgres=# select pg_get_multixact_members('17094770'::xid); ERROR: could not access status of transaction 17094770 DETAIL: Could not open file "pg_multixact/members/14078": No such file or directory. If I tried it with 17094779, the multixact ID immediatly before the one that has overwritten "0000", it does actually work, presumably because its pages happen to be buffered for me so it doesn't try to open the file (guessing here). I don't currently believe it's necessary to reproduce that step via a test program anyway, the root problem is clear enough just from watching the thing wrap. -- Thomas Munro http://www.enterprisedb.com
Attachment
pgsql-bugs by date:
Previous
From: Michael PaquierDate:
Subject: Re: BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high.
Next
From: Heikki LinnakangasDate:
Subject: Re: BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high.