Re: First draft of snapshot snapshot building design document - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: First draft of snapshot snapshot building design document |
Date | |
Msg-id | 201210181720.27616.andres@2ndquadrant.com Whole thread Raw |
In response to | Re: First draft of snapshot snapshot building design document (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: First draft of snapshot snapshot building design document
|
List | pgsql-hackers |
On Thursday, October 18, 2012 04:47:12 PM Robert Haas wrote: > On Tue, Oct 16, 2012 at 7:30 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > On Thursday, October 11, 2012 01:02:26 AM Peter Geoghegan wrote: > >> The design document [2] really just explains the problem (which is the > >> need for catalog metadata at a point in time to make sense of heap > >> tuples), without describing the solution that this patch offers with > >> any degree of detail. Rather, [2] says "How we build snapshots is > >> somewhat intricate and complicated and seems to be out of scope for > >> this document", which is unsatisfactory. I look forward to reading the > >> promised document that describes this mechanism in more detail. > > > > Here's the first version of the promised document. I hope it answers most > > of the questions. > > > > Input welcome! > > I haven't grokked all of this in its entirety, but I'm kind of > uncomfortable with the relfilenode -> OID mapping stuff. I'm > wondering if we should, when logical replication is enabled, find a > way to cram the table OID into the XLOG record. It seems like that > would simplify things. > > If we don't choose to do that, it's worth noting that you actually > need 16 bytes of data to generate a unique identifier for a relation, > as in database OID + tablespace OID + relfilenode# + backend ID. > Backend ID might be ignorable because WAL-based logical replication is > going to ignore temporary relations anyway, but you definitely need > the other two. ... Hm. I should take look at the way temporary tables are represented. As you say I is not going to matter for WAL decoding, but still... > Another thing to think about is that, like catalog snapshots, > relfilenode mappings have to be time-relativized; that is, you need to > know what the mapping was at the proper point in the WAL sequence, not > what it is now. In practice, the risk here seems to be minimal, > because it takes a while to churn through 4 billion OIDs. However, I > suspect it pays to think about this fairly carefully because if we do > ever run into a situation where the OID counter wraps during a time > period comparable to the replication lag, the bugs will be extremely > difficult to debug. I think with a rollbacks + restarts we might even be able to see the same relfilenode earlier. > Anyhow, adding the table OID to the WAL header would chew up a few > more bytes of WAL space, but it seems like it might be worth it to > avoid having to think very hard about all of these issues. I don't think its necessary to change wal logging here. The relfilenode mapping is now looked up using the timetravel snapshot we've built using (spcNode, relNode) as the key, so the time-relativized lookup is "builtin". If we screw that up way much more is broken anyway. Two problems are left: 1) (reltablespace, relfilenode) is not unique in pg_class because InvalidOid is stored for relfilenode if its a shared or nailed table. That not a problem for the lookup because weve already checked the relmapper before that, so we never look those up anyway. But it violates documented requirements of syscache.c. Even after some looking I haven't found any problem that that could cause. 2) We need to decide whether a HEAP[1-2]_* record did catalog changes when building/updating snapshots. Unfortunately we also need to do this *before* we built the first snapshot. For now treating all tables as catalog modifying before we built the snapshot seems to work fine. I think encoding the oid in the xlog header wouln't help all that much here, because I am pretty sure we want to have the set of "catalog tables" to be extensible at some point... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: