Re: First draft of snapshot snapshot building design document - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: First draft of snapshot snapshot building design document |
Date | |
Msg-id | CA+TgmoZXkCo5FAbU=3JHuXXF0Op2SLhGJcVuFM3tkmcBnmhBMQ@mail.gmail.com Whole thread Raw |
In response to | First draft of snapshot snapshot building design document (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: First draft of snapshot snapshot building design document
|
List | pgsql-hackers |
On Tue, Oct 16, 2012 at 7:30 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On Thursday, October 11, 2012 01:02:26 AM Peter Geoghegan wrote: >> The design document [2] really just explains the problem (which is the >> need for catalog metadata at a point in time to make sense of heap >> tuples), without describing the solution that this patch offers with >> any degree of detail. Rather, [2] says "How we build snapshots is >> somewhat intricate and complicated and seems to be out of scope for >> this document", which is unsatisfactory. I look forward to reading the >> promised document that describes this mechanism in more detail. > > Here's the first version of the promised document. I hope it answers most of > the questions. > > Input welcome! I haven't grokked all of this in its entirety, but I'm kind of uncomfortable with the relfilenode -> OID mapping stuff. I'm wondering if we should, when logical replication is enabled, find a way to cram the table OID into the XLOG record. It seems like that would simplify things. If we don't choose to do that, it's worth noting that you actually need 16 bytes of data to generate a unique identifier for a relation, as in database OID + tablespace OID + relfilenode# + backend ID. Backend ID might be ignorable because WAL-based logical replication is going to ignore temporary relations anyway, but you definitely need the other two. There's nothing, for example, to keep you from having two relations with the same value in pg_class.relfilenode in the same database but in different tablespaces. It's unlikely to happen, because for new relations we set OID = relfilenode, but a subsequent rewrite can bring it about if the stars align just right. (Such situations are, of course, a breeding ground for bugs, which might make you question whether our current scheme for assigning relfilenodes has much of anything to recommend it.) Another thing to think about is that, like catalog snapshots, relfilenode mappings have to be time-relativized; that is, you need to know what the mapping was at the proper point in the WAL sequence, not what it is now. In practice, the risk here seems to be minimal, because it takes a while to churn through 4 billion OIDs. However, I suspect it pays to think about this fairly carefully because if we do ever run into a situation where the OID counter wraps during a time period comparable to the replication lag, the bugs will be extremely difficult to debug. Anyhow, adding the table OID to the WAL header would chew up a few more bytes of WAL space, but it seems like it might be worth it to avoid having to think very hard about all of these issues. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: