Re: finding changed blocks using WAL scanning - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: finding changed blocks using WAL scanning |
Date | |
Msg-id | 20190420044238.GX6197@tamriel.snowman.net Whole thread Raw |
In response to | Re: finding changed blocks using WAL scanning (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: finding changed blocks using WAL scanning
|
List | pgsql-hackers |
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > On Fri, Apr 19, 2019 at 8:39 PM Stephen Frost <sfrost@snowman.net> wrote: > > While I do think we should at least be thinking about the load caused > > from scanning the WAL to generate a list of blocks that are changed, the > > load I was more concerned with in the other thread is the effort > > required to actually merge all of those changes together over a large > > amount of WAL. I'm also not saying that we couldn't have either of > > those pieces done as a background worker, just that it'd be really nice > > to have an external tool (or library) that can be used on an independent > > system to do that work. > > Oh. Well, I already explained my algorithm for doing that upthread, > which I believe would be quite cheap. > > 1. When you generate the .modblock files, stick all the block > references into a buffer. qsort(). Dedup. Write out in sorted > order. Having all of the block references in a sorted order does seem like it would help, but would also make those potentially quite a bit larger than necessary (I had some thoughts about making them smaller elsewhere in this discussion). That might be worth it though. I suppose it might also be possible to line up the bitmaps suggested elsewhere to do essentially a BitmapOr of them to identify the blocks changed (while effectively de-duping at the same time). > 2. When you want to use a bunch of .modblock files, do the same thing > MergeAppend does, or what merge-sort does when it does a merge pass. > Read the first 1MB of each file (or whatever amount). Repeatedly pull > an item from whichever file has the lowest remaining value, using a > binary heap. When no buffered data remains for a particular file, > read another chunk from that file. Sure, this is essentially a MergeAppend/MergeSort+GroupAgg on top to get the distinct set, if I'm following what you're suggesting here. > If each .modblock file covers 1GB of WAL, you could the data from > across 1TB of WAL using only 1GB of memory, and that's assuming you > have a 1MB buffer for each .modblock file. You probably don't need > such a large buffer. If you use, say, a 128kB buffer, you could merge > the data from across 8TB of WAL using 1GB of memory. And if you have > 8TB of WAL and you can't spare 1GB for the task of computing which > blocks need to be included in your incremental backup, it's time for a > hardware upgrade. How much additional work is it going to be to sort/dedup for each 1GB of WAL though, along with the resulting size? I'm specifically thinking about some of the very high WAL-generation rate systems that I've seen, with TBs of WAL per day. I get that once you've got nicely sorted inputs that it isn't too bad to generate a distinct set from that, but it seems like that moves the problem to the sorting side then rather than eliminating it, and may make other things less efficient, particularly when there's workloads that have strings of modified buffers together and we could capture that more efficiently. Thanks, Stephen
Attachment
pgsql-hackers by date: