Re: WIP: WAL prefetch (another approach) - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: WIP: WAL prefetch (another approach) |
Date | |
Msg-id | CA+hUKGJ06d3h5JeOtAv4h52n0vG1jOPZxqMCn5FySJQUVZA32w@mail.gmail.com Whole thread Raw |
In response to | Re: WIP: WAL prefetch (another approach) (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: WIP: WAL prefetch (another approach)
|
List | pgsql-hackers |
Hi Alvaro, On Sat, Mar 14, 2020 at 10:15 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I tried my luck at a quick read of this patchset. Thanks! Here's a new patch set, and some inline responses to your feedback: > I didn't manage to go over 0005 though, but I agree with Tomas that > having this be configurable in terms of bytes of WAL is not very > user-friendly. The primary control is now maintenance_io_concurrency, which is basically what Tomas suggested. The byte-based control is just a cap to prevent it reading a crazy distance ahead, that also functions as the on/off switch for the feature. In this version I've added "max" to the name, to make that clearer. > First of all, let me join the crowd chanting that this is badly needed; > I don't need to repeat what Chittenden's talk showed. "WAL recovery is > now 10x-20x times faster" would be a good item for pg13 press release, > I think. We should be careful about over-promising here: Sean basically had a best case scenario for this type of techology, partly due to his 16kB filesystem blocks. Common results may be a lot more pedestrian, though it could get more interesting if we figure out how to get rid of FPWs... > > From a61b4e00c42ace5db1608e02165f89094bf86391 Mon Sep 17 00:00:00 2001 > > From: Thomas Munro <thomas.munro@gmail.com> > > Date: Tue, 3 Dec 2019 17:13:40 +1300 > > Subject: [PATCH 1/5] Allow PrefetchBuffer() to be called with a SMgrRelation. > > > > Previously a Relation was required, but it's annoying to have > > to create a "fake" one in recovery. > > LGTM. > > It's a pity to have to include smgr.h in bufmgr.h. Maybe it'd be sane > to use a forward struct declaration and "struct SMgrRelation *" instead. OK, done. While staring at this, I decided that SharedPrefetchBuffer() was a weird word order, so I changed it to PrefetchSharedBuffer(). Then, by analogy, I figured I should also change the pre-existing function LocalPrefetchBuffer() to PrefetchLocalBuffer(). Do you think this is an improvement? > > From acbff1444d0acce71b0218ce083df03992af1581 Mon Sep 17 00:00:00 2001 > > From: Thomas Munro <tmunro@postgresql.org> > > Date: Mon, 9 Dec 2019 17:10:17 +1300 > > Subject: [PATCH 2/5] Rename GetWalRcvWriteRecPtr() to GetWalRcvFlushRecPtr(). > > > > The new name better reflects the fact that the value it returns > > is updated only when received data has been flushed to disk. > > > > An upcoming patch will make use of the latest data that was > > written without waiting for it to be flushed, so use more > > precise function names. > > Ugh. (Not for your patch -- I mean for the existing naming convention). > It would make sense to rename WalRcvData->receivedUpto in this commit, > maybe to flushedUpto. Ok, I renamed that variable and a related one. There are more things you could rename if you pull on that thread some more, including pg_stat_wal_receiver's received_lsn column, but I didn't do that in this patch. > > From d7fa7d82c5f68d0cccf441ce9e8dfa40f64d3e0d Mon Sep 17 00:00:00 2001 > > From: Thomas Munro <tmunro@postgresql.org> > > Date: Mon, 9 Dec 2019 17:22:07 +1300 > > Subject: [PATCH 3/5] Add WalRcvGetWriteRecPtr() (new definition). > > > > A later patch will read received WAL to prefetch referenced blocks, > > without waiting for the data to be flushed to disk. To do that, > > it needs to be able to see the write pointer advancing in shared > > memory. > > > > The function formerly bearing name was recently renamed to > > WalRcvGetFlushRecPtr(), which better described what it does. > > > + pg_atomic_init_u64(&WalRcv->writtenUpto, 0); > > Umm, how come you're using WalRcv here instead of walrcv? I would flag > this patch for sneaky nastiness if this weren't mostly harmless. (I > think we should do away with local walrcv pointers altogether. But that > should be a separate patch, I think.) OK, done. > > + pg_atomic_uint64 writtenUpto; > > Are we already using uint64s for XLogRecPtrs anywhere? This seems > novel. Given this, I wonder if the comment near "mutex" needs an > update ("except where atomics are used"), or perhaps just move the > member to after the line with mutex. Moved. We use [u]int64 in various places in the replication code. Ideally I'd have a magic way to say atomic<XLogRecPtr> so I didn't have to assume that pg_atomic_uint64 is the right atomic integer width and signedness, but here we are. In dsa.h I made a special typedef for the atomic version of something else, but that's because the size of that thing varied depending on the build, whereas our LSNs are of a fixed width that ought to be en... <trails off>. > I didn't understand the purpose of inc_counter() as written. Why not > just pg_atomic_fetch_add_u64(..., 1)? I didn't want counters that wrap at ~4 billion, but I did want to be able to read and write concurrently without tearing. Instructions like "lock xadd" would provide more guarantees that I don't need, since only one thread is doing all the writing and there's no ordering requirement. It's basically just counter++, but some platforms need a spinlock to perform atomic read and write of 64 bit wide numbers, so more hoop jumping is required. > > /* > > * smgrprefetch() -- Initiate asynchronous read of the specified block of a relation. > > + * > > + * In recovery only, this can return false to indicate that a file > > + * doesn't exist (presumably it has been dropped by a later WAL > > + * record). > > */ > > -void > > +bool > > smgrprefetch(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum) > > I think this API, where the behavior of a low-level module changes > depending on InRecovery, is confusingly crazy. I'd rather have the > callers specifying whether they're OK with a file that doesn't exist. Hmm. But... md.c has other code like that. It's true that I'm adding InRecovery awareness to a function that didn't previously have it, but that's just because we previously had no reason to prefetch stuff in recovery. > > +extern PrefetchBufferResult SharedPrefetchBuffer(SMgrRelation smgr_reln, > > + ForkNumber forkNum, > > + BlockNumber blockNum); > > extern void PrefetchBuffer(Relation reln, ForkNumber forkNum, > > BlockNumber blockNum); > > Umm, I would keep the return values of both these functions in sync. > It's really strange that PrefetchBuffer does not return > PrefetchBufferResult, don't you think? Agreed, and changed. I suspect that other users of the main PrefetchBuffer() call will eventually want that, to do a better job of keeping the request queue full, for example bitmap heap scan and (hypothetical) btree scan with prefetch.
Attachment
pgsql-hackers by date: