Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoBAROsGaRn846j6KfHO3_0xRsb3a_LxLwCjWJG8Hjw9wg@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Tue, Jan 30, 2024 at 7:20 PM John Naylor <johncnaylorls@gmail.com> wrote: > > On Tue, Jan 30, 2024 at 7:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Jan 29, 2024 at 8:48 PM John Naylor <johncnaylorls@gmail.com> wrote: > > > > I meant the macro could probably be > > > > > > Max(SLAB_DEFAULT_BLOCK_SIZE, (size) * N) > > > > > > (Right now N=32). I also realize I didn't answer your question earlier > > > about block sizes being powers of two. I was talking about PG in > > > general -- I was thinking all block sizes were powers of two. If > > > that's true, I'm not sure if it's because programmers find the macro > > > calculations easy to reason about, or if there was an implementation > > > reason for it (e.g. libc behavior). 32*2088 bytes is about 65kB, or > > > just above a power of two, so if we did round that up it would be > > > 128kB. > > > > Thank you for your explanation. It might be better to follow other > > codes. Does the calculation below make sense to you? > > > > RT_SIZE_CLASS_ELEM size_class = RT_SIZE_CLASS_INFO[i]; > > Size inner_blocksize = SLAB_DEFAULT_BLOCK_SIZE; > > while (inner_blocksize < 32 * size_class.allocsize) > > inner_blocksize <<= 1; > > It does make sense, but we can do it more simply: > > Max(SLAB_DEFAULT_BLOCK_SIZE, pg_nextpower2_32(size * 32)) Thanks! I've attached the new patch set (v56). I've squashed previous updates and addressed review comments on v55 in separate patches. Here are the update summary: 0004: fix compiler warning caught by ci test. 0005-0008: address review comments on radix tree codes. 0009: cleanup #define and #undef 0010: use TEST_SHARED_RT macro for shared radix tree test. RT_SHMEM is undefined after including radixtree.h so we should not use it in test code. 0013-0015: address review comments on tidstore codes. 0017-0018: address review comments on vacuum integration codes. Looking at overall changes, there are still XXX and TODO comments in radixtree.h: --- * XXX There are 4 node kinds, and this should never be increased, * for several reasons: * 1. With 5 or more kinds, gcc tends to use a jump table for switch * statements. * 2. The 4 kinds can be represented with 2 bits, so we have the option * in the future to tag the node pointer with the kind, even on * platforms with 32-bit pointers. This might speed up node traversal * in trees with highly random node kinds. * 3. We can have multiple size classes per node kind. Can we just remove "XXX"? --- * WIP: notes about traditional radix tree trading off span vs height... Are you going to write it? --- #ifdef RT_SHMEM /* WIP: do we really need this? */ typedef dsa_pointer RT_HANDLE; #endif I think it's worth having it. --- * WIP: The paper uses at most 64 for this node kind. "isset" happens to fit * inside a single bitmapword on most platforms, so it's a good starting * point. We can make it higher if we need to. */ #define RT_FANOUT_48_MAX (RT_NODE_MAX_SLOTS / 4) Are you going to work something on this? --- /* WIP: We could go first to the higher node16 size class */ newnode = RT_ALLOC_NODE(tree, RT_NODE_KIND_16, RT_CLASS_16_LO); Does it mean to go to RT_CLASS_16_HI and then further go to RT_CLASS_16_LO upon further deletion? --- * TODO: The current locking mechanism is not optimized for high concurrency * with mixed read-write workloads. In the future it might be worthwhile * to replace it with the Optimistic Lock Coupling or ROWEX mentioned in * the paper "The ART of Practical Synchronization" by the same authors as * the ART paper, 2016. I think it's not TODO for now, but a future improvement. We can remove it. --- /* TODO: consider 5 with subclass 1 or 2. */ #define RT_FANOUT_4 4 Is there something we need to do here? --- /* * Return index of the chunk and slot arrays for inserting into the node, * such that the chunk array remains ordered. * TODO: Improve performance for non-SIMD platforms. */ Are you going to work on this? --- /* Delete the element at 'idx' */ /* TODO: replace slow memmove's */ Are you going to work on this? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: