Re: Bug in point releases 9.3.6 and 9.2.10? - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Bug in point releases 9.3.6 and 9.2.10? |
Date | |
Msg-id | CAM3SWZQxsq7pHtPoyhmmZ8AhWnuxjgyOH-s5ANt-mF8WAF9eUg@mail.gmail.com Whole thread Raw |
In response to | Re: Bug in point releases 9.3.6 and 9.2.10? (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: Bug in point releases 9.3.6 and 9.2.10?
|
List | pgsql-hackers |
In a hurry right now, so unfortunately I'll need to be brief for now. On Thu, Mar 12, 2015 at 5:21 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2015-03-12 16:42:24 -0700, Peter Geoghegan wrote: >> We want to create a new role when this happens, for various reasons. >> This occurs after recovery ends, but before the database has been >> "unfenced". The template code that generates various ALTER ROLE >> statements in our internal provisioning system - which has apparently >> worked just fine for a long time - is: > > Is this all the code that's exececuted after recovery? How are these > forks brought up? Promoted how? Is it a common 'source' database? We do PITR up to a recovery target. We're talking about the same issue occurring on entirely distinct customer databases, with entirely distinct major PG versions. I'm not sure what other code might have already been run at this point, but it won't have been much. As I said, the only common factor that I know of is all affected databases being on the latest point release. > Have you looked at these files? Are they indeed zero bytes when this > error occurs? I think that they are indeed zero. I looked at that last week, when I didn't consider that this might be a more widespread issue. I'll check again later. > Do you still have a base backup from the relevant time, so you could > repeat the whole thing? Yes. >> The only common factor is that this occurs on the latest point >> releases (either 9.3.6 and 9.2.10, at least so far). In all cases I've >> seen so far, the relation in question is the pg_auth_members heap >> relation. For example: > > Any chance that the new nodes also use a different kernel version or > such? They may differ, but that doesn't seem likely to be relevant, at least to me. This has happened something like 6 or 7 times already, starting late last week. I am unfamiliar with this provisioning code, so, as I mentioned, offhand I cannot be entirely sure that there isn't some other code run when the problem originally arises (that I should have included in my report). What I can tell you is that I saw the same error messages when I manually ran the statements generated by the above code within a transaction...until I ran "VACUUM FULL pg_auth_members;". > This filenode got to be pg_auth_member's original one, given it's below > FirstNormalObjectId. I get a lower value, but that's probably caused by > having fewer collations and other data generated during initdb. That > implies that the table hasn't ever been rewritten. > > What's 12811? It's the same catalog, pg_auth_member. As I said, the messages you saw are on entirely different customer databases, servers and (sometimes) PG version. -- Peter Geoghegan
pgsql-hackers by date: