Home > mailing lists

Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node
Date	June 20, 2012 16:16:01
Msg-id	201206202115.26350.andres@2ndquadrant.com Whole thread Raw
In response to	Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses	Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node
List	pgsql-hackers

Tree view

Hi,

On Wednesday, June 20, 2012 08:32:53 PM Heikki Linnakangas wrote:
> On 20.06.2012 17:35, Simon Riggs wrote:
> > On 20 June 2012 16:23, Heikki Linnakangas
> > 
> > <heikki.linnakangas@enterprisedb.com>  wrote:
> >> On 20.06.2012 11:17, Simon Riggs wrote:
> >>> On 20 June 2012 15:45, Heikki Linnakangas
> >>> 
> >>> <heikki.linnakangas@enterprisedb.com>    wrote:
> >>>> So, if the origin id is not sufficient for some conflict resolution
> >>>> mechanisms, what extra information do you need for those, and where do
> >>>> you put it?
> >>> 
> >>> As explained elsewhere, wal_level = logical (or similar) would be used
> >>> to provide any additional logical information required.
> >>> 
> >>> Update and Delete WAL records already need to be different in that
> >>> mode, so additional info would be placed there, if there were any.
> >>> 
> >>> In the case of reflexive updates you raised, a typical response in
> >>> other DBMS would be to represent the query
> >>> 
> >>>    UPDATE SET counter = counter + 1
> >>> 
> >>> by sending just the "+1" part, not the current value of counter, as
> >>> would be the case with the non-reflexive update
> >>> 
> >>>    UPDATE SET counter = 1
> >>> 
> >>> Handling such things in Postgres would require some subtlety, which
> >>> would not be resolved in first release but is pretty certain not to
> >>> require any changes to the WAL record header as a way of resolving it.
> >>> Having already thought about it, I'd estimate that is a very long
> >>> discussion and not relevant to the OT, but if you wish to have it
> >>> here, I won't stop you.
> >> 
> >> Yeah, I'd like to hear briefly how you would handle that without any
> >> further changes to the WAL record header.
> > 
> > I already did:
> >>> Update and Delete WAL records already need to be different in that
> >>> mode, so additional info would be placed there, if there were any.
> > 
> > The case you mentioned relates to UPDATEs only, so I would suggest
> > that we add that information to a new form of update record only.
> > 
> > That has nothing to do with the WAL record header.
> 
> Hmm, so you need the origin id in the WAL record header to do filtering.
> Except when that's not enough, you add some more fields to heap update
> and delete records.
Imo the whole +1 stuff doesn't have anything to do with the origin_id proposal 
and should be ignored for quite a while. We might go to something like it 
sometime in the future but its nothing we work on (as far as I know ;)).

wal_level=logical (in patch 07) currently only changes the following things 
about the wal stream:

For HEAP_(INSERT|(HOT_)?UPDATE|DELETE)
* prevent full page writes from removing the row data (could be optimized at 
some point to just store the tuple slot)

For HEAP_DELETE
* add the primary key of the changed row

HEAP_MULTI_INSERT obviously needs to get the same treatment in future.

The only real addition that I forsee in the near future is logging the old 
primary key when the primary key changes in HEAP_UPDATE.

Kevin wants an option for full pre-images of rows in HEAP_(UPDATE|DELETE)

> Don't you think it would be simpler to only add the extra fields to heap
> insert, update and delete records, and leave the WAL record header
> alone? Do you ever need extra information on other record types?
Its needed in some more locations: HEAP_HOT_UPDATE, HEAP2_MULTI_INSERT, 
HEAP_NEWPAGE, HEAP_XACT_(ASSIGN, COMMIT, COMMIT_PREPARED, COMMIT_COMPACT, 
ABORT, ABORT_PREPARED) and probably some I didn't remember right now.

Sure, we can add it to all those but then you need to have individual 
knowledge about *all* of those because the location where its stored will be 
different for each of them.

To recap why we think origin_id is a sensible design choice:

There are many sensible replication topologies where it does make sense that 
you want to receive changes (on node C) from one node (say B) that originated 
from some other node (say A).
Reasons include:
* the order of applying changes should be as similar as possible on all nodes. 
That means when applying a change on C that originated on B and if changes 
replicated faster from A->B than from A->C you want to be at least as far with 
the replication from A as B was. Otherwise the conflict ratio will increase. 
If you can recreate the stream from the wal of every node and still detect 
where an individual change originated, thats easy.
* the interconnects between some nodes may be more expensive than from others
* an interconnect between two nodes may fail but others dont

Because of that we think its sensible to be able generate the full LCR stream 
with all changes, local and remote ones, on each individual node. If you then 
can filter on individual origin_id's you can build complex replication 
topologies without much additional complexity.

> I'm not saying that we need to implement all possible conflict
> resolution algorithms right now - on the contrary I think conflict
> resolution belongs outside core - but if we're going to change the WAL
> record format to support such conflict resolution, we better make sure
> the foundation we provide for it is solid.
I think this already provides a lot. At some point we probably want to have 
support for looking on which node a certain local xid originated and when that 
was originally executed. While querying that efficiently requires additional 
support we already have all the information for that.

There are some more complexities with consistently determining conflicts on 
changes that happened in a very small timewindown on different nodes but thats 
something for another day.

> BTW, one way to work around the lack of origin id in the WAL record
> header is to just add an origin-id column to the table, indicating the
> last node that updated the row. That would be a kludge, but I thought
> I'd mention it..
Yuck. The aim is to improve on whats done today ;)

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

pgsql-hackers by date:

From: Alvaro Herrera
Date: 20 June 2012, 16:15:37
Subject: Re: pl/perl and utf-8 in sql_ascii databases

From: Simon Riggs
Date: 20 June 2012, 16:17:00
Subject: Re: pgbench--new transaction type

Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node - Mailing list pgsql-hackers

Previous

Next