Re: Improve WALRead() to suck data directly from WAL buffers when possible - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Improve WALRead() to suck data directly from WAL buffers when possible |
Date | |
Msg-id | CALj2ACUpQGiwQTzmoSMOFk5=WiJc06FcYpxzBX0SEej4ProRzg@mail.gmail.com Whole thread Raw |
In response to | Re: Improve WALRead() to suck data directly from WAL buffers when possible (Nathan Bossart <nathandbossart@gmail.com>) |
Responses |
Re: Improve WALRead() to suck data directly from WAL buffers when possible
|
List | pgsql-hackers |
On Wed, Mar 1, 2023 at 9:45 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Tue, Feb 28, 2023 at 10:38:31AM +0530, Bharath Rupireddy wrote: > > On Tue, Feb 28, 2023 at 6:14 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > >> Why do we only read a page at a time in XLogReadFromBuffersGuts()? What is > >> preventing us from copying all the data we need in one go? > > > > Note that most of the WALRead() callers request a single page of > > XLOG_BLCKSZ bytes even if the server has less or more available WAL > > pages. It's the streaming replication wal sender that can request less > > than XLOG_BLCKSZ bytes and upto MAX_SEND_SIZE (16 * XLOG_BLCKSZ). And, > > if we read, say, MAX_SEND_SIZE at once while holding > > WALBufMappingLock, that might impact concurrent inserters (at least, I > > can say it in theory) - one of the main intentions of this patch is > > not to impact inserters much. > > Perhaps we should test both approaches to see if there is a noticeable > difference. It might not be great for concurrent inserts to repeatedly > take the lock, either. If there's no real difference, we might be able to > simplify the code a bit. I took a stab at this - acquire WALBufMappingLock separately for each requested WAL buffer page vs acquire WALBufMappingLock once for all requested WAL buffer pages. I chose the pgbench tpcb-like benchmark that has 3 UPDATE statements and 1 INSERT statement. I ran pgbench for 30min with scale factor 100 and 4096 clients with primary and 1 async standby, see [1]. I captured wait_events to see the contention on WALBufMappingLock. I haven't noticed any contention on the lock and no difference in TPS too, see [2] for results on HEAD, see [3] for results on v6 patch which has "acquire WALBufMappingLock separately for each requested WAL buffer page" strategy and see [4] for results on v7 patch (attached herewith) which has "acquire WALBufMappingLock once for all requested WAL buffer pages" strategy. Another thing to note from the test results is that reduction in WALRead IO wait events from 136 on HEAD to 1 on v6 or v7 patch. So, the read from WAL buffers is really helping here. With these observations, I'd like to use the approach that acquires WALBufMappingLock once for all requested WAL buffer pages unlike v6 and the previous patches. I'm attaching the v7 patch set with this change for further review. [1] shared_buffers = '8GB' wal_buffers = '1GB' max_wal_size = '16GB' max_connections = '5000' archive_mode = 'on' archive_command='cp %p /home/ubuntu/archived_wal/%f' ./pgbench --initialize --scale=100 postgres ./pgbench -n -M prepared -U ubuntu postgres -b tpcb-like -c4096 -j4096 -T1800 [2] HEAD: done in 20.03 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 15.53 s, vacuum 0.19 s, primary keys 4.30 s). tps = 11654.475345 (without initial connection time) 50950253 Lock | transactionid 16472447 Lock | tuple 3869523 LWLock | LockManager 739283 IPC | ProcArrayGroupUpdate 718549 | 439877 LWLock | WALWrite 130737 Client | ClientRead 121113 LWLock | BufferContent 70778 LWLock | WALInsert 43346 IPC | XactGroupUpdate 18547 18546 Activity | LogicalLauncherMain 18545 Activity | AutoVacuumMain 18272 Activity | ArchiverMain 17627 Activity | WalSenderMain 17207 Activity | WalWriterMain 15455 IO | WALSync 14963 LWLock | ProcArray 14747 LWLock | XactSLRU 13943 Timeout | CheckpointWriteDelay 10519 Activity | BgWriterHibernate 8022 Activity | BgWriterMain 4486 Timeout | SpinDelay 4443 Activity | CheckpointerMain 1435 Lock | extend 670 LWLock | XidGen 373 IO | WALWrite 283 Timeout | VacuumDelay 268 IPC | ArchiveCommand 249 Timeout | VacuumTruncate 136 IO | WALRead 115 IO | WALInitSync 74 IO | DataFileWrite 67 IO | WALInitWrite 36 IO | DataFileFlush 35 IO | DataFileExtend 17 IO | DataFileRead 4 IO | SLRUWrite 3 IO | BufFileWrite 2 IO | DataFileImmediateSync 1 Tuples only is on. 1 LWLock | SInvalWrite 1 LWLock | LockFastPath 1 IO | ControlFileSyncUpdate [3] done in 19.99 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 15.52 s, vacuum 0.18 s, primary keys 4.28 s). tps = 11689.584538 (without initial connection time) 50678977 Lock | transactionid 16252048 Lock | tuple 4146827 LWLock | LockManager 768256 | 719923 IPC | ProcArrayGroupUpdate 432836 LWLock | WALWrite 140354 Client | ClientRead 124203 LWLock | BufferContent 74355 LWLock | WALInsert 39852 IPC | XactGroupUpdate 30728 30727 Activity | LogicalLauncherMain 30726 Activity | AutoVacuumMain 30420 Activity | ArchiverMain 29881 Activity | WalSenderMain 29418 Activity | WalWriterMain 23428 Activity | BgWriterHibernate 15960 Timeout | CheckpointWriteDelay 15840 IO | WALSync 15066 LWLock | ProcArray 14577 Activity | CheckpointerMain 14377 LWLock | XactSLRU 7291 Activity | BgWriterMain 4336 Timeout | SpinDelay 1707 Lock | extend 720 LWLock | XidGen 362 Timeout | VacuumTruncate 360 IO | WALWrite 304 Timeout | VacuumDelay 301 IPC | ArchiveCommand 106 IO | WALInitSync 82 IO | DataFileWrite 66 IO | WALInitWrite 45 IO | DataFileFlush 25 IO | DataFileExtend 18 IO | DataFileRead 5 LWLock | LockFastPath 2 IO | DataFileSync 2 IO | DataFileImmediateSync 1 Tuples only is on. 1 LWLock | BufferMapping 1 IO | WALRead 1 IO | SLRUWrite 1 IO | SLRURead 1 IO | ReplicationSlotSync 1 IO | BufFileRead [4] done in 19.92 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 15.53 s, vacuum 0.23 s, primary keys 4.16 s). tps = 11671.869074 (without initial connection time) 50614021 Lock | transactionid 16482561 Lock | tuple 4086451 LWLock | LockManager 777507 | 714329 IPC | ProcArrayGroupUpdate 420593 LWLock | WALWrite 138142 Client | ClientRead 125381 LWLock | BufferContent 75283 LWLock | WALInsert 38759 IPC | XactGroupUpdate 20283 20282 Activity | LogicalLauncherMain 20281 Activity | AutoVacuumMain 20002 Activity | ArchiverMain 19467 Activity | WalSenderMain 19036 Activity | WalWriterMain 15836 IO | WALSync 15708 Timeout | CheckpointWriteDelay 15346 LWLock | ProcArray 15095 LWLock | XactSLRU 11852 Activity | BgWriterHibernate 8424 Activity | BgWriterMain 4636 Timeout | SpinDelay 4415 Activity | CheckpointerMain 2048 Lock | extend 1457 Timeout | VacuumTruncate 646 LWLock | XidGen 402 IO | WALWrite 306 Timeout | VacuumDelay 278 IPC | ArchiveCommand 117 IO | WALInitSync 74 IO | DataFileWrite 66 IO | WALInitWrite 35 IO | DataFileFlush 29 IO | DataFileExtend 24 LWLock | LockFastPath 14 IO | DataFileRead 2 IO | SLRUWrite 2 IO | DataFileImmediateSync 2 IO | BufFileWrite 1 Tuples only is on. 1 LWLock | BufferMapping 1 IO | WALRead 1 IO | SLRURead 1 IO | BufFileRead -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: