Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node |
Date | |
Msg-id | 201206202115.26350.andres@2ndquadrant.com Whole thread Raw |
In response to | Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: [PATCH 10/16] Introduce the concept that wal has a
'origin' node
|
List | pgsql-hackers |
Hi, On Wednesday, June 20, 2012 08:32:53 PM Heikki Linnakangas wrote: > On 20.06.2012 17:35, Simon Riggs wrote: > > On 20 June 2012 16:23, Heikki Linnakangas > > > > <heikki.linnakangas@enterprisedb.com> wrote: > >> On 20.06.2012 11:17, Simon Riggs wrote: > >>> On 20 June 2012 15:45, Heikki Linnakangas > >>> > >>> <heikki.linnakangas@enterprisedb.com> wrote: > >>>> So, if the origin id is not sufficient for some conflict resolution > >>>> mechanisms, what extra information do you need for those, and where do > >>>> you put it? > >>> > >>> As explained elsewhere, wal_level = logical (or similar) would be used > >>> to provide any additional logical information required. > >>> > >>> Update and Delete WAL records already need to be different in that > >>> mode, so additional info would be placed there, if there were any. > >>> > >>> In the case of reflexive updates you raised, a typical response in > >>> other DBMS would be to represent the query > >>> > >>> UPDATE SET counter = counter + 1 > >>> > >>> by sending just the "+1" part, not the current value of counter, as > >>> would be the case with the non-reflexive update > >>> > >>> UPDATE SET counter = 1 > >>> > >>> Handling such things in Postgres would require some subtlety, which > >>> would not be resolved in first release but is pretty certain not to > >>> require any changes to the WAL record header as a way of resolving it. > >>> Having already thought about it, I'd estimate that is a very long > >>> discussion and not relevant to the OT, but if you wish to have it > >>> here, I won't stop you. > >> > >> Yeah, I'd like to hear briefly how you would handle that without any > >> further changes to the WAL record header. > > > > I already did: > >>> Update and Delete WAL records already need to be different in that > >>> mode, so additional info would be placed there, if there were any. > > > > The case you mentioned relates to UPDATEs only, so I would suggest > > that we add that information to a new form of update record only. > > > > That has nothing to do with the WAL record header. > > Hmm, so you need the origin id in the WAL record header to do filtering. > Except when that's not enough, you add some more fields to heap update > and delete records. Imo the whole +1 stuff doesn't have anything to do with the origin_id proposal and should be ignored for quite a while. We might go to something like it sometime in the future but its nothing we work on (as far as I know ;)). wal_level=logical (in patch 07) currently only changes the following things about the wal stream: For HEAP_(INSERT|(HOT_)?UPDATE|DELETE) * prevent full page writes from removing the row data (could be optimized at some point to just store the tuple slot) For HEAP_DELETE * add the primary key of the changed row HEAP_MULTI_INSERT obviously needs to get the same treatment in future. The only real addition that I forsee in the near future is logging the old primary key when the primary key changes in HEAP_UPDATE. Kevin wants an option for full pre-images of rows in HEAP_(UPDATE|DELETE) > Don't you think it would be simpler to only add the extra fields to heap > insert, update and delete records, and leave the WAL record header > alone? Do you ever need extra information on other record types? Its needed in some more locations: HEAP_HOT_UPDATE, HEAP2_MULTI_INSERT, HEAP_NEWPAGE, HEAP_XACT_(ASSIGN, COMMIT, COMMIT_PREPARED, COMMIT_COMPACT, ABORT, ABORT_PREPARED) and probably some I didn't remember right now. Sure, we can add it to all those but then you need to have individual knowledge about *all* of those because the location where its stored will be different for each of them. To recap why we think origin_id is a sensible design choice: There are many sensible replication topologies where it does make sense that you want to receive changes (on node C) from one node (say B) that originated from some other node (say A). Reasons include: * the order of applying changes should be as similar as possible on all nodes. That means when applying a change on C that originated on B and if changes replicated faster from A->B than from A->C you want to be at least as far with the replication from A as B was. Otherwise the conflict ratio will increase. If you can recreate the stream from the wal of every node and still detect where an individual change originated, thats easy. * the interconnects between some nodes may be more expensive than from others * an interconnect between two nodes may fail but others dont Because of that we think its sensible to be able generate the full LCR stream with all changes, local and remote ones, on each individual node. If you then can filter on individual origin_id's you can build complex replication topologies without much additional complexity. > I'm not saying that we need to implement all possible conflict > resolution algorithms right now - on the contrary I think conflict > resolution belongs outside core - but if we're going to change the WAL > record format to support such conflict resolution, we better make sure > the foundation we provide for it is solid. I think this already provides a lot. At some point we probably want to have support for looking on which node a certain local xid originated and when that was originally executed. While querying that efficiently requires additional support we already have all the information for that. There are some more complexities with consistently determining conflicts on changes that happened in a very small timewindown on different nodes but thats something for another day. > BTW, one way to work around the lack of origin id in the WAL record > header is to just add an origin-id column to the table, indicating the > last node that updated the row. That would be a kludge, but I thought > I'd mention it.. Yuck. The aim is to improve on whats done today ;) -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: