Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts - Mailing list pgsql-bugs
From | Alvaro Herrera |
---|---|
Subject | Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts |
Date | |
Msg-id | 20140619220425.GQ18688@eldon.alvh.no-ip.org Whole thread Raw |
In response to | Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts |
List | pgsql-bugs |
Bruce Momjian wrote: > I wasn't happy with having that delete code added there when we do > directory delete in the function above. I instead broke apart the > delete and copy code and called the delete code where needed, in the > attached patch. Makes sense, yeah. I didn't look closely enough to realize that the function that does the copying also does the rmtree part. I also now realize why the other case (upgrade from 9.3 to 9.4) does not have a bug: we are already deleting the files in that path. Bruce Momjian wrote: > > I think one reason is that not all upgrades see an issue here; for old > > clusters that haven't gone beyond the 0000 offset file, there is no > > problem. For clusters that have gone beyond 0000 but not by much, the > > file will be deleted during the first truncation. It only becomes a > > problem if the cluster is close enough to 2^31. Another thing to keep > > in consideration is that initdb initializes all databases' datminmxid to > > 1. If the old cluster was past the 2^31 point, it means the datminmxid > > doesn't move from 1 until the actual wraparound. > > OK, so the xid has to be beyond 2^31 during pg_upgrade to trigger a > problem? That might explain the rare reporting of this bug. What would > the test query look like so we can tell people when to remove the '0000' > files? Would we need to see the existence of '0000' and high-numbered > files? How high? What does a 2^31 file look like? I misspoke. I ran a few more upgrades, and then tried vacuuming all databases, which is when the truncate code is run. Say the original cluster had an oldestmulti of 10 million. If you just run VACUUM in the new cluster after the upgrade, the 0000 file is not deleted: it's not yet old enough in terms of multixact age. An error is not thrown, because we're still not attempting a truncate. But if you lower the vacuum_multixact_freeze_table_age to 10 million minus one, then we will try the deletion and that will raise the error. I think (didn't actually try) if you just let 150 million multixacts be generated, that's the first time you will get the error. Now if you run a VACUUM FREEZE after the upgrade, the file will be deleted with no error. I now think that the reason most people haven't hit the problem is that they don't generate enough multis after upgrading a database that had enough multis in the old database. This seems a bit curious > Also, is there a reason you didn't remove the 'members/0000' file in your > patch? I have removed it in my version. There's no point. That file is the starting point for new multis anyway, and it's compatible with the new format (because it's all zeroes). I guess you could get in trouble if you initdb the 9.3 database, generate a few multis there, and then drop all tables and use that modified cluster as a "new" cluster for pg_upgrade. I would question the sanity of a person doing something like that, however. Bruce Momjian wrote: > > OK, so the xid has to be beyond 2^31 during pg_upgrade to trigger a > > problem? That might explain the rare reporting of this bug. What would > > the test query look like so we can tell people when to remove the '0000' > > files? Would we need to see the existence of '0000' and high-numbered > > files? How high? What does a 2^31 file look like? > > Also, what would a legitimate 0000 file at wrap-around time look like? > Would there have to be an 'ffff' or 'ffffff' file? Since I was wrong, there is no point in further research here. Anyway the last file before wrapping around in pg_multixact/members is FFFF. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-bugs by date: