Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | John Naylor |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CANWCAZY46dP5=igG1f3REPq32BqQsKCV19YS-UZy3D309F=qkQ@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Mon, Dec 11, 2023 at 1:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I've attached the updated patch set. From the previous patch set, I've > merged patches 0007 to 0010. The other changes such as adding RT_GET() > still are unmerged for now, for discussion. Probably we can make them > as follow-up patches as we discussed. 0011 to 0015 patches are new > changes for v44 patch set, which removes RT_SEARCH() and RT_SET() and > support variable-length values. This looks like the right direction, and I'm pleased it's not much additional code on top of my last patch. v44-0014: +#ifdef RT_VARLEN_VALUE + /* XXX: need to choose block sizes? */ + tree->leaf_ctx = AllocSetContextCreate(ctx, + "radix tree leaves", + ALLOCSET_DEFAULT_SIZES); +#else + tree->leaf_ctx = SlabContextCreate(ctx, + "radix tree leaves", + RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)), + sizeof(RT_VALUE_TYPE)); +#endif /* RT_VARLEN_VALUE */ Choosing block size: Similar to what we've discussed previously around DSA segments, we might model this on CreateWorkExprContext() in src/backend/executor/execUtils.c. Maybe tid store can pass maint_w_m / autovac_w_m (later work_mem for bitmap scan). RT_CREATE could set the max block size to 1/16 of that, or less. Also, it occurred to me that compile-time embeddable values don't need a leaf context. I'm not sure how many places assume that there is always a leaf context. If not many, it may be worth not creating one here, just to be tidy. + size_t copysize; - memcpy(leaf.local, value_p, sizeof(RT_VALUE_TYPE)); + copysize = sizeof(RT_VALUE_TYPE); +#endif + + memcpy(leaf.local, value_p, copysize); I'm not sure this indirection adds clarity. I guess the intent was to keep from saying "memcpy" twice, but now the code has to say "copysize = foo" twice. For varlen case, we need to watch out for slowness because of memcpy. Let's put that off for later testing, though. We may someday want to avoid a memcpy call for the varlen case, so let's keep it flexible here. v44-0015: +#define SizeOfBlocktableEntry (offsetof( Unused. + char buf[MaxBlocktableEntrySize] = {0}; Zeroing this buffer is probably going to be expensive. Also see this pre-existing comment: /* WIP: slow, since it writes to memory for every bit */ page->words[wordnum] |= ((bitmapword) 1 << bitnum); For this function (which will be vacuum-only, so we can assume ordering), in the loop we can: * declare the local bitmapword variable to be zero * set the bits on it * write it out to the right location when done. Let's fix both of these at once. + if (TidStoreIsShared(ts)) + shared_rt_set(ts->tree.shared, blkno, (void *) page, page_len); + else + local_rt_set(ts->tree.local, blkno, (void *) page, page_len); Is there a reason for "void *"? The declared parameter is "RT_VALUE_TYPE *value_p" in 0014. Also, since this function is for vacuum (and other uses will need a new function), let's assert the returned bool is false. Does iteration still work? If so, it's not too early to re-wire this up with vacuum and see how it behaves. Lastly, my compiler has a warning that CI doesn't have: In file included from ../src/test/modules/test_radixtree/test_radixtree.c:121: ../src/include/lib/radixtree.h: In function ‘rt_find.isra’: ../src/include/lib/radixtree.h:2142:24: warning: ‘slot’ may be used uninitialized [-Wmaybe-uninitialized] 2142 | return (RT_VALUE_TYPE*) slot; | ^~~~~~~~~~~~~~~~~~~~~ ../src/include/lib/radixtree.h:2112:23: note: ‘slot’ was declared here 2112 | RT_PTR_ALLOC *slot; | ^~~~
pgsql-hackers by date: