Re: First draft of snapshot snapshot building design document - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: First draft of snapshot snapshot building design document |
Date | |
Msg-id | CA+TgmoZ29jsb3yha_+Sshu=miJZNW4BizLTs7cckipkt=a7_7Q@mail.gmail.com Whole thread Raw |
In response to | Re: First draft of snapshot snapshot building design document (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: First draft of snapshot snapshot building design document
|
List | pgsql-hackers |
On Thu, Oct 18, 2012 at 11:20 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On Thursday, October 18, 2012 04:47:12 PM Robert Haas wrote: >> On Tue, Oct 16, 2012 at 7:30 AM, Andres Freund <andres@2ndquadrant.com> > wrote: >> > On Thursday, October 11, 2012 01:02:26 AM Peter Geoghegan wrote: >> >> The design document [2] really just explains the problem (which is the >> >> need for catalog metadata at a point in time to make sense of heap >> >> tuples), without describing the solution that this patch offers with >> >> any degree of detail. Rather, [2] says "How we build snapshots is >> >> somewhat intricate and complicated and seems to be out of scope for >> >> this document", which is unsatisfactory. I look forward to reading the >> >> promised document that describes this mechanism in more detail. >> > >> > Here's the first version of the promised document. I hope it answers most >> > of the questions. >> > >> > Input welcome! >> >> I haven't grokked all of this in its entirety, but I'm kind of >> uncomfortable with the relfilenode -> OID mapping stuff. I'm >> wondering if we should, when logical replication is enabled, find a >> way to cram the table OID into the XLOG record. It seems like that >> would simplify things. >> >> If we don't choose to do that, it's worth noting that you actually >> need 16 bytes of data to generate a unique identifier for a relation, >> as in database OID + tablespace OID + relfilenode# + backend ID. >> Backend ID might be ignorable because WAL-based logical replication is >> going to ignore temporary relations anyway, but you definitely need >> the other two. ... > > Hm. I should take look at the way temporary tables are represented. As you say > I is not going to matter for WAL decoding, but still... > >> Another thing to think about is that, like catalog snapshots, >> relfilenode mappings have to be time-relativized; that is, you need to >> know what the mapping was at the proper point in the WAL sequence, not >> what it is now. In practice, the risk here seems to be minimal, >> because it takes a while to churn through 4 billion OIDs. However, I >> suspect it pays to think about this fairly carefully because if we do >> ever run into a situation where the OID counter wraps during a time >> period comparable to the replication lag, the bugs will be extremely >> difficult to debug. > > I think with a rollbacks + restarts we might even be able to see the same > relfilenode earlier. > >> Anyhow, adding the table OID to the WAL header would chew up a few >> more bytes of WAL space, but it seems like it might be worth it to >> avoid having to think very hard about all of these issues. > > I don't think its necessary to change wal logging here. The relfilenode mapping > is now looked up using the timetravel snapshot we've built using (spcNode, > relNode) as the key, so the time-relativized lookup is "builtin". If we screw > that up way much more is broken anyway. > > Two problems are left: > > 1) (reltablespace, relfilenode) is not unique in pg_class because InvalidOid is > stored for relfilenode if its a shared or nailed table. That not a problem for > the lookup because weve already checked the relmapper before that, so we never > look those up anyway. But it violates documented requirements of syscache.c. > Even after some looking I haven't found any problem that that could cause. > > 2) We need to decide whether a HEAP[1-2]_* record did catalog changes when > building/updating snapshots. Unfortunately we also need to do this *before* we > built the first snapshot. For now treating all tables as catalog modifying > before we built the snapshot seems to work fine. > I think encoding the oid in the xlog header wouln't help all that much here, > because I am pretty sure we want to have the set of "catalog tables" to be > extensible at some point... I don't like catalog-only snapshots at all. I think that's just a recipe for subtle or not-so-subtle breakage down the road... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: