Home > mailing lists

Re: MultiXact\SLRU buffers configuration - Mailing list pgsql-hackers

From	Andrey M. Borodin
Subject	Re: MultiXact\SLRU buffers configuration
Date	May 14, 2020 05:19:42
Msg-id	3B099683-ECCD-43CD-A3D6-F08C3745002A@yandex-team.ru Whole thread Raw
In response to	Re: MultiXact\SLRU buffers configuration (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses	Re: MultiXact\SLRU buffers configuration
List	pgsql-hackers

Tree view


> 14 мая 2020 г., в 06:25, Kyotaro Horiguchi <horikyota.ntt@gmail.com> написал(а):
>
> At Wed, 13 May 2020 23:08:37 +0500, "Andrey M. Borodin" <x4mmm@yandex-team.ru> wrote in
>>
>>
>>> 11 мая 2020 г., в 16:17, Andrey M. Borodin <x4mmm@yandex-team.ru> написал(а):
>>>
>>> I've went ahead and created 3 patches:
>>> 1. Configurable SLRU buffer sizes for MultiXacOffsets and MultiXactMembers
>>> 2. Reduce locking level to shared on read of MultiXactId members
>>> 3. Configurable cache size
>>
>> I'm looking more at MultiXact and it seems to me that we have a race condition there.
>>
>> When we create a new MultiXact we do:
>> 1. Generate new MultiXactId under MultiXactGenLock
>> 2. Record new mxid with members and offset to WAL
>> 3. Write offset to SLRU under MultiXactOffsetControlLock
>> 4. Write members to SLRU under MultiXactMemberControlLock
>
> But, don't we hold exclusive lock on the buffer through all the steps
> above?
Yes...Unless MultiXact is observed on StandBy. This could lead to observing inconsistent snapshot: one of lockers
committedtuple delete, but standby sees it as alive. 

>> When we read MultiXact we do:
>> 1. Retrieve offset by mxid from SLRU under MultiXactOffsetControlLock
>> 2. If offset is 0 - it's not filled in at step 4 of previous algorithm, we sleep and goto 1
>> 3. Retrieve members from SLRU under MultiXactMemberControlLock
>> 4. ..... what we do if there are just zeroes because step 4 is not executed yet? Nothing, return empty members list.
>
> So transactions never see such incomplete mxids, I believe.
I've observed sleep in step 2. I believe it's possible to observe special effects of step 4 too.
Maybe we could add lock on standby to dismiss this 1000us wait? Sometimes it hits hard on Standbys: if someone is
lockingwhole table on primary - all seq scans on standbys follow him with MultiXactOffsetControlLock contention. 

It looks like this:
0x00007fcd56896ff7 in __GI___select (nfds=nfds@entry=0, readfds=readfds@entry=0x0, writefds=writefds@entry=0x0,
exceptfds=exceptfds@entry=0x0,timeout=timeout@entry=0x7ffd83376fe0) at ../sysdeps/unix/sysv/linux/select.c:41 
#0  0x00007fcd56896ff7 in __GI___select (nfds=nfds@entry=0, readfds=readfds@entry=0x0, writefds=writefds@entry=0x0,
exceptfds=exceptfds@entry=0x0,timeout=timeout@entry=0x7ffd83376fe0) at ../sysdeps/unix/sysv/linux/select.c:41 
#1  0x000056186e0d54bd in pg_usleep (microsec=microsec@entry=1000) at ./build/../src/port/pgsleep.c:56
#2  0x000056186dd5edf2 in GetMultiXactIdMembers (from_pgupgrade=0 '\000', onlyLock=<optimized out>,
members=0x7ffd83377080,multi=3106214809) at ./build/../src/backend/access/transam/multixact.c:1370 
#3  GetMultiXactIdMembers () at ./build/../src/backend/access/transam/multixact.c:1202
#4  0x000056186dd2d2d9 in MultiXactIdGetUpdateXid (xmax=<optimized out>, t_infomask=<optimized out>) at
./build/../src/backend/access/heap/heapam.c:7039
#5  0x000056186dd35098 in HeapTupleGetUpdateXid (tuple=tuple@entry=0x7fcba3b63d58) at
./build/../src/backend/access/heap/heapam.c:7080
#6  0x000056186e0cd0f8 in HeapTupleSatisfiesMVCC (htup=<optimized out>, snapshot=0x56186f44a058, buffer=230684) at
./build/../src/backend/utils/time/tqual.c:1091
#7  0x000056186dd2d922 in heapgetpage (scan=scan@entry=0x56186f4c8e78, page=page@entry=3620) at
./build/../src/backend/access/heap/heapam.c:439
#8  0x000056186dd2ea7c in heapgettup_pagemode (key=0x0, nkeys=0, dir=ForwardScanDirection, scan=0x56186f4c8e78) at
./build/../src/backend/access/heap/heapam.c:1034
#9  heap_getnext (scan=scan@entry=0x56186f4c8e78, direction=direction@entry=ForwardScanDirection) at
./build/../src/backend/access/heap/heapam.c:1801
#10 0x000056186de84f51 in SeqNext (node=node@entry=0x56186f4a4f78) at ./build/../src/backend/executor/nodeSeqscan.c:81
#11 0x000056186de6a3f1 in ExecScanFetch (recheckMtd=0x56186de84ef0 <SeqRecheck>, accessMtd=0x56186de84f20 <SeqNext>,
node=0x56186f4a4f78)at ./build/../src/backend/executor/execScan.c:97 
#12 ExecScan (node=0x56186f4a4f78, accessMtd=0x56186de84f20 <SeqNext>, recheckMtd=0x56186de84ef0 <SeqRecheck>) at
./build/../src/backend/executor/execScan.c:164


Best regards, Andrey Borodin.

pgsql-hackers by date:

From: Kyotaro Horiguchi
Date: 14 May 2020, 05:12:25
Subject: Re: pg13: xlogreader API adjust

From: Fabien COELHO
Date: 14 May 2020, 05:23:05
Subject: Re: PG 13 release notes, first draft

Re: MultiXact\SLRU buffers configuration - Mailing list pgsql-hackers

Previous

Next