Thread: pgsql: snapshot scalability: cache snapshots using a xact completion co
snapshot scalability: cache snapshots using a xact completion counter. Previous commits made it faster/more scalable to compute snapshots. But not building a snapshot is still faster. Now that GetSnapshotData() does not maintain RecentGlobal* anymore, that is actually not too hard: This commit introduces xactCompletionCount, which tracks the number of top-level transactions with xids (i.e. which may have modified the database) that completed in some form since the start of the server. We can avoid rebuilding the snapshot's contents whenever the current xactCompletionCount is the same as it was when the snapshot was originally built. Currently this check happens while holding ProcArrayLock. While it's likely possible to perform the check without acquiring ProcArrayLock, it seems better to do that separately / later, some careful analysis is required. Even with the lock this is a significant win on its own. On a smaller two socket machine this gains another ~1.03x, on a larger machine the effect is roughly double (earlier patch version tested though). If we were able to safely avoid the lock there'd be another significant gain on top of that. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/623a9ba79bbdd11c5eccb30b8bd5c446130e521c Modified Files -------------- src/backend/replication/logical/snapbuild.c | 1 + src/backend/storage/ipc/procarray.c | 125 +++++++++++++++++++++++----- src/backend/utils/time/snapmgr.c | 4 + src/include/access/transam.h | 9 ++ src/include/utils/snapshot.h | 7 ++ 5 files changed, 126 insertions(+), 20 deletions(-)
Re: pgsql: snapshot scalability: cache snapshots using a xact completion co
From
Michael Paquier
Date:
On Tue, Aug 18, 2020 at 04:30:21AM +0000, Andres Freund wrote: > snapshot scalability: cache snapshots using a xact completion counter. > > Previous commits made it faster/more scalable to compute snapshots. But not > building a snapshot is still faster. Now that GetSnapshotData() does not > maintain RecentGlobal* anymore, that is actually not too hard: > > This commit introduces xactCompletionCount, which tracks the number of > top-level transactions with xids (i.e. which may have modified the database) > that completed in some form since the start of the server. > > We can avoid rebuilding the snapshot's contents whenever the current > xactCompletionCount is the same as it was when the snapshot was > originally built. Currently this check happens while holding > ProcArrayLock. While it's likely possible to perform the check without > acquiring ProcArrayLock, it seems better to do that separately / > later, some careful analysis is required. Even with the lock this is a > significant win on its own. > > On a smaller two socket machine this gains another ~1.03x, on a larger > machine the effect is roughly double (earlier patch version tested > though). If we were able to safely avoid the lock there'd be another > significant gain on top of that. spurfowl and more animals are telling us that this commit has broken 2PC: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl&dt=2020-08-18%2004%3A31%3A11 -- Michael
Attachment
Andres Freund <andres@anarazel.de> writes: > snapshot scalability: cache snapshots using a xact completion counter. buildfarm doesn't like this a bit ... regards, tom lane
Re: pgsql: snapshot scalability: cache snapshots using a xact completion co
From
Andres Freund
Date:
Hi, On 2020-08-18 00:55:22 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > snapshot scalability: cache snapshots using a xact completion counter. > > buildfarm doesn't like this a bit ... Yea, looking already. Unless that turns out to be incredibly bad luck and only the first three animals failed (there's a few passes after), or unless I find the issue in the next 30min or so, I'll revert. Greetings, Andres Freund
Re: pgsql: snapshot scalability: cache snapshots using a xact completion co
From
Andres Freund
Date:
On 2020-08-18 13:52:46 +0900, Michael Paquier wrote: > On Tue, Aug 18, 2020 at 04:30:21AM +0000, Andres Freund wrote: > spurfowl and more animals are telling us that this commit has broken > 2PC: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl&dt=2020-08-18%2004%3A31%3A11 It looks like it's a bit more subtle than outright breaking 2PC. We're now at 3 out of 18 BF members having failed. I locally ran also quite a few loops of the normal regression tests without finding an issue. I'd written to Tom that I was planning to revert unless the number of failures were lower than initially indicated. But that actually seems to have come to pass (the failures are quicker to report because they don't run the subsequent tests, of course). I'd like to let the failures accumulate a bit longer, say until tomorrow Midday if I haven't figured it out by then. With the hope of finding some detail to help pinpoint the issue. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I'd written to Tom that I was planning to revert unless the number of > failures were lower than initially indicated. But that actually seems to > have come to pass (the failures are quicker to report because they don't > run the subsequent tests, of course). I'd like to let the failures > accumulate a bit longer, say until tomorrow Midday if I haven't figured > it out by then. With the hope of finding some detail to help pinpoint > the issue. There's certainly no obvious pattern here, so I agree with waiting for more data. regards, tom lane