Re: [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader |
Date | |
Msg-id | 201206142338.33897.andres@2ndquadrant.com Whole thread Raw |
In response to | Re: [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: [PATCH 06/16] Add support for a generic wal reading
facility dubbed XLogReader
|
List | pgsql-hackers |
On Thursday, June 14, 2012 11:19:00 PM Heikki Linnakangas wrote: > On 13.06.2012 14:28, Andres Freund wrote: > > Features: > > - streaming reading/writing > > - filtering > > - reassembly of records > > > > Reusing the ReadRecord infrastructure in situations where the code that > > wants to do so is not tightly integrated into xlog.c is rather hard and > > would require changes to rather integral parts of the recovery code > > which doesn't seem to be a good idea. > It would be nice refactor ReadRecord and its subroutines out of xlog.c. > That file has grown over the years to be really huge, and separating the > code to read WAL sounds like it should be a pretty natural split. I > don't want to duplicate all the WAL reading code, so we really should > find a way to reuse that. I'd suggest rewriting ReadRecord into a thin > wrapper that just calls the new xlogreader code. I aggree that it is not very nice to duplicate it. But I also don't want to go the route of replacing ReadRecord with it for a while, we can replace ReadRecord later if we want. As long as it is in flux like it is right now I don't really see the point in investing energy in it. Also I am not that sure how a callback oriented API fits into the xlog.c workflow? > > Missing: > > - "compressing" the stream when removing uninteresting records > > - writing out correct CRCs > > - validating CRCs > > - separating reader/writer > > - comments. > At a quick glance, I couldn't figure out how this works. There seems to > be some callback functions? If you want to read an xlog stream using > this facility, what do you do? You currently have to fill out 4 callbacks: XLogReaderStateInterestingCB is_record_interesting; XLogReaderStateWriteoutCB writeout_data; XLogReaderStateFinishedRecordCB finished_record; XLogReaderStateReadPageCB read_page; As an example how to use it (from the walsender support for START_LOGICAL_REPLICATION): if(!xlogreader_state){xlogreader_state = XLogReaderAllocate();xlogreader_state->is_record_interesting = RecordRelevantForLogicalReplication;xlogreader_state->finished_record = ProcessRecord;xlogreader_state->writeout_data = WriteoutData;xlogreader_state->read_page= XLogReadPage; /* startptr is the current XLog position */xlogreader_state->startptr = startptr; XLogReaderReset(xlogreader_state); } /* how far does valid data go */ xlogreader_state->endptr = endptr; XLogReaderRead(xlogreader_state); The last step will then call the above callbacks till it reaches endptr. I.e. it first reads a page with "read_page"; then checks whether a record is interesting for the use-case ("is_record_interesting"); in case it is interesting, it gets reassembled and passed to the "finished_record" callback. Then the bytestream gets written out again with "writeout_data". In this case it gets written to the buffer the walsender has allocated. In others it might just get thrown away. > Can this be used for writing WAL, as well as reading? If so, what do you > need the write support for? It currently can replace records which are not interesting (e.g. index changes in the case of logical rep). Filtered records are replaced with XLOG_NOOP records with correct length currently. In future the actual amount of data should really be reduced. I don't know yet know how to map LSNs of uncompressed/compressed stream onto each other... The filtered data is then passed to a writeout callback (in a streaming fashion). The whole writing out part is pretty ugly at the moment and I just bolted it ontop because it was convenient for the moment. I am not yet sure how the api for that should look.... Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: