Re: Get rid of WALBufMappingLock - Mailing list pgsql-hackers

From Yura Sokolov
Subject Re: Get rid of WALBufMappingLock
Date
Msg-id 0e0ae9c5-db03-45d6-9a61-c29502e3073c@postgrespro.ru
Whole thread Raw
List pgsql-hackers
19.01.2025 03:11, Yura Sokolov пишет:
> Good day, hackers.
> 
> During discussion of Increasing NUM_XLOGINSERT_LOCKS [1], Andres Freund 
> used benchmark which creates WAL records very intensively. While I this 
> it is not completely fair (1MB log records are really rare), it pushed 
> me to analyze write-side waiting of XLog machinery.
> 
> First I tried to optimize WaitXLogInsertionsToFinish, but without great 
> success (yet).
> 
> While profiling, I found a lot of time is spend in the memory clearing 
> under global WALBufMappingLock:
> 
>      MemSet((char *) NewPage, 0, XLOG_BLCKSZ);
> 
> It is obvious scalability bottleneck.
> 
> So "challenge was accepted".
> 
> Certainly, backend should initialize pages without exclusive lock. But 
> which way to ensure pages were initialized? In other words, how to 
> ensure XLogCtl->InitializedUpTo is correct.
> 
> I've tried to play around WALBufMappingLock with holding it for a short 
> time and spinning on XLogCtl->xlblocks[nextidx]. But in the end I found 
> WALBufMappingLock is useless at all.
> 
> Instead of holding lock, it is better to allow backends to cooperate:
> - I bound ConditionVariable to each xlblocks entry,
> - every backend now checks every required block pointed by 
> InitializedUpto was successfully initialized or sleeps on its condvar,
> - when backend sure block is initialized, it tries to update 
> InitializedUpTo with conditional variable.
> 
> Andres's benchmark looks like:
> 
>    c=100 && install/bin/psql -c checkpoint -c 'select pg_switch_wal()' 
> postgres && install/bin/pgbench -n -M prepared -c$c -j$c -f <(echo 
> "SELECT pg_logical_emit_message(true, 'test', repeat('0', 
> 1024*1024));";) -P1 -T45 postgres
> 
> So, it generate 1M records as fast as possible for 45 seconds.
> 
> Test machine is Ryzen 5825U (8c/16th) limited to 2GHz.
> Config:
> 
>    max_connections = 1000
>    shared_buffers = 1024MB
>    fsync = off
>    wal_sync_method = fdatasync
>    full_page_writes = off
>    wal_buffers = 1024MB
>    checkpoint_timeout = 1d
> 
> Results are: "average for 45 sec"  /"1 second max outlier"
> 
> Results for master @ d3d098316913 :
>    25  clients: 2908  /3230
>    50  clients: 2759  /3130
>    100 clients: 2641  /2933
>    200 clients: 2419  /2707
>    400 clients: 1928  /2377
>    800 clients: 1689  /2266
> 
> With v0-0001-Get-rid-of-WALBufMappingLock.patch :
>    25  clients: 3103  /3583
>    50  clients: 3183  /3706
>    100 clients: 3106  /3559
>    200 clients: 2902  /3427
>    400 clients: 2303  /2717
>    800 clients: 1925  /2329
> 
> Combined with v0-0002-several-attempts-to-lock-WALInsertLocks.patch
> 
> No WALBufMappingLock + attempts on XLogInsertLock:
>    25  clients: 3518  /3750
>    50  clients: 3355  /3548
>    100 clients: 3226  /3460
>    200 clients: 3092  /3299
>    400 clients: 2575  /2801
>    800 clients: 1946  /2341
> 
> This results are with untouched NUM_XLOGINSERT_LOCKS == 8.
> 
> [1] http://postgr.es/m/flat/3b11fdc2-9793-403d- 
> b3d4-67ff9a00d447%40postgrespro.ru
> 
> 
> PS.
> Increasing NUM_XLOGINSERT_LOCKS to 64 gives:
>    25  clients: 3457  /3624
>    50  clients: 3215  /3500
>    100 clients: 2750  /3000
>    200 clients: 2535  /2729
>    400 clients: 2163  /2400
>    800 clients: 1700  /2060
> 
> While doing this on master:
>    25  clients  2645  /2953
>    50  clients: 2562  /2968
>    100 clients: 2364  /2756
>    200 clients: 2266  /2564
>    400 clients: 1868  /2228
>    800 clients: 1527  /2133
> 
> So, patched version with increased NUM_XLOGINSERT_LOCKS looks no worse 
> than unpatched without increasing num of locks.

I'm too brave... or too sleepy (it's 3:30am)...
But I took the risk of sending a patch to commitfest:
https://commitfest.postgresql.org/52/5511/

------
regards
Yura Sokolov aka funny-falcon



pgsql-hackers by date:

Previous
From: Yura Sokolov
Date:
Subject: Re: [RFC] Lock-free XLog Reservation from WAL
Next
From: Peter Smith
Date:
Subject: Re: Pgoutput not capturing the generated columns