Home > mailing lists

Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: [PoC] Improve dead tuple storage for lazy vacuum
Date	June 5, 2023 10:31:41
Msg-id	CAD21AoAWhHSDeNdVEmBpOcVLMav0mz4AJfMN2wURc+h85q7PUQ@mail.gmail.com Whole thread
In response to	Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <john.naylor@enterprisedb.com>)
Responses	Re: [PoC] Improve dead tuple storage for lazy vacuum
List	pgsql-hackers

Tree view

Hi,

On Tue, May 23, 2023 at 7:17 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
> I wrote:
> > the current insert/delete paths are quite complex. Using bitmap heap scan as a motivating use case, I hope to
refocuscomplexity to where it's most needed, and aggressively simplify where possible. 
>
> Sometime in the not-too-distant future, I will start a new thread focusing on bitmap heap scan, but for now, I just
wantto share some progress on making the radix tree usable not only for that, but hopefully a wider range of
applications,while making the code simpler and the binary smaller. The attached patches are incomplete (e.g. no
iteration)and quite a bit messy, so tar'd and gzip'd for the curious (should apply on top of v32 0001-03 + 0007-09 ). 
>

Thank you for making progress on this. I agree with these directions
overall. I have some comments and questions:

> - With the latter in mind, searching the child within a node now returns the address of the slot. This allows the
sameinterface whether the slot contains a child pointer or a value. 

Probably we can apply similar changes to the iteration as well.

> * Other nodes will follow in due time, but only after I figure out how to do it nicely (ideas welcome!) -- currently
node32'stwo size classes work fine for growing, but the code should be simplified before extending to other cases.) 

Within the size class, we just alloc a new node of lower size class
and do memcpy(). I guess it will be almost same as what we do for
growing. It might be a good idea to support node shrinking within the
size class for node32 (and node125 if we support). I don't think
shrinking class-3 to class-1 makes sense.

>
> Limited support for "combined pointer-value slots". At compile-time, choose either that or "single-value leaves"
basedon the size of the value type template parameter. Values that are pointer-sized or less can fit in the last-level
childslots of nominal "inner nodes" without duplicated leaf-node code. Node256 now must act like the previous 'node256
leaf',since zero is a valid value. Aside from that, this was a small change. 

Yes, but it also means that we use pointer-sized value anyway even if
the value size is less than that, which wastes the memory, no?

>
> What I've shared here could work (in principal, since it uses uint64 values) for tidstore, possibly faster (untested)
becauseof better code density, but as mentioned I want to shoot for higher. For tidbitmap.c, I want to extend this idea
andbranch at run-time on a per-value basis, so that a page-table entry that fits in a pointer can go there, and if not,
it'llbe a full leaf. (This technique enables more flexibility in lossifying pages as well.) Run-time info will require
e.g.an additional bit per slot. Since the node header is now 3 bytes, we can spare one more byte in the node3 case. In
addition,we can and should also bump it back up to node4, still keeping the metadata within 8 bytes (no struct
padding).

Sounds good.

> I've started in this patchset to refer to the node kinds as "4/16/48/256", regardless of their actual fanout. This is
forreadability (by matching the language in the paper) and maintainability (should *not* ever change again). The size
classes(including multiple classes per kind) could be determined by macros and #ifdef's. For example, in non-SIMD
architectures,it's likely slow to search an array of 32 key chunks, so in that case the compiler should choose size
classessimilar to these four nominal kinds. 

If we want to use the node kinds used in the paper, I think we should
change the number in RT_NODE_KIND_X too. Otherwise, it would be
confusing when reading the code without referring to the paper.
Particularly, this part is very confusing:

        case RT_NODE_KIND_3:
            RT_ADD_CHILD_4(tree, ref, node, chunk, child);
            break;
        case RT_NODE_KIND_32:
            RT_ADD_CHILD_16(tree, ref, node, chunk, child);
            break;
        case RT_NODE_KIND_125:
            RT_ADD_CHILD_48(tree, ref, node, chunk, child);
            break;
        case RT_NODE_KIND_256:
            RT_ADD_CHILD_256(tree, ref, node, chunk, child);
            break;

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: shveta malik
Date: 05 June 2023, 09:30:01
Subject: Re: Support logical replication of DDLs

From: Ranier Vilela
Date: 05 June 2023, 11:06:26
Subject: Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c)

Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

Previous

Next