Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoAWhHSDeNdVEmBpOcVLMav0mz4AJfMN2wURc+h85q7PUQ@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <john.naylor@enterprisedb.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
Hi, On Tue, May 23, 2023 at 7:17 PM John Naylor <john.naylor@enterprisedb.com> wrote: > > I wrote: > > the current insert/delete paths are quite complex. Using bitmap heap scan as a motivating use case, I hope to refocuscomplexity to where it's most needed, and aggressively simplify where possible. > > Sometime in the not-too-distant future, I will start a new thread focusing on bitmap heap scan, but for now, I just wantto share some progress on making the radix tree usable not only for that, but hopefully a wider range of applications,while making the code simpler and the binary smaller. The attached patches are incomplete (e.g. no iteration)and quite a bit messy, so tar'd and gzip'd for the curious (should apply on top of v32 0001-03 + 0007-09 ). > Thank you for making progress on this. I agree with these directions overall. I have some comments and questions: > - With the latter in mind, searching the child within a node now returns the address of the slot. This allows the sameinterface whether the slot contains a child pointer or a value. Probably we can apply similar changes to the iteration as well. > * Other nodes will follow in due time, but only after I figure out how to do it nicely (ideas welcome!) -- currently node32'stwo size classes work fine for growing, but the code should be simplified before extending to other cases.) Within the size class, we just alloc a new node of lower size class and do memcpy(). I guess it will be almost same as what we do for growing. It might be a good idea to support node shrinking within the size class for node32 (and node125 if we support). I don't think shrinking class-3 to class-1 makes sense. > > Limited support for "combined pointer-value slots". At compile-time, choose either that or "single-value leaves" basedon the size of the value type template parameter. Values that are pointer-sized or less can fit in the last-level childslots of nominal "inner nodes" without duplicated leaf-node code. Node256 now must act like the previous 'node256 leaf',since zero is a valid value. Aside from that, this was a small change. Yes, but it also means that we use pointer-sized value anyway even if the value size is less than that, which wastes the memory, no? > > What I've shared here could work (in principal, since it uses uint64 values) for tidstore, possibly faster (untested) becauseof better code density, but as mentioned I want to shoot for higher. For tidbitmap.c, I want to extend this idea andbranch at run-time on a per-value basis, so that a page-table entry that fits in a pointer can go there, and if not, it'llbe a full leaf. (This technique enables more flexibility in lossifying pages as well.) Run-time info will require e.g.an additional bit per slot. Since the node header is now 3 bytes, we can spare one more byte in the node3 case. In addition,we can and should also bump it back up to node4, still keeping the metadata within 8 bytes (no struct padding). Sounds good. > I've started in this patchset to refer to the node kinds as "4/16/48/256", regardless of their actual fanout. This is forreadability (by matching the language in the paper) and maintainability (should *not* ever change again). The size classes(including multiple classes per kind) could be determined by macros and #ifdef's. For example, in non-SIMD architectures,it's likely slow to search an array of 32 key chunks, so in that case the compiler should choose size classessimilar to these four nominal kinds. If we want to use the node kinds used in the paper, I think we should change the number in RT_NODE_KIND_X too. Otherwise, it would be confusing when reading the code without referring to the paper. Particularly, this part is very confusing: case RT_NODE_KIND_3: RT_ADD_CHILD_4(tree, ref, node, chunk, child); break; case RT_NODE_KIND_32: RT_ADD_CHILD_16(tree, ref, node, chunk, child); break; case RT_NODE_KIND_125: RT_ADD_CHILD_48(tree, ref, node, chunk, child); break; case RT_NODE_KIND_256: RT_ADD_CHILD_256(tree, ref, node, chunk, child); break; Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: