Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoDg8S1QP-YR4132oPazPSvhdn-uArDF1AGEQiAYKhHfqA@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum
List pgsql-hackers
On Wed, Sep 28, 2022 at 3:18 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
> On Wed, Sep 28, 2022 at 10:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > BTW We need to consider not only aset/slab but also DSA since we
> > allocate dead tuple TIDs on DSM in parallel vacuum cases. FYI DSA uses
> > the following size classes:
> >
> > static const uint16 dsa_size_classes[] = {
> > [...]
>
> Thanks for that info -- I wasn't familiar with the details of DSA. For the non-parallel case, I plan to at least
benchmarkusing aset because I gather it's the most heavily optimized. I'm thinking that will allow other problem areas
tobe more prominent. I'll also want to compare total context size compared to slab to see if possibly less
fragmentationmakes up for other wastage. 

Thanks!

>
> Along those lines, one thing I've been thinking about is the number of size classes. There is a tradeoff between
memoryefficiency and number of branches when searching/inserting. My current thinking is there is too much coupling
betweensize class and data type. Each size class currently uses a different data type and a different algorithm to
searchand set it, which in turn requires another branch. We've found that a larger number of size classes leads to poor
branchprediction [1] and (I imagine) code density. 
>
> I'm thinking we can use "flexible array members" for the values/pointers, and keep the rest of the control data in
thestruct the same. That way, we never have more than 4 actual "kinds" to code and branch on. As a bonus, when
migratinga node to a larger size class of the same kind, we can simply repalloc() to the next size. 

Interesting idea. Using flexible array members for values would be
good also for the case in the future where we want to support other
value types than uint64.

With this idea, we can just repalloc() to grow to the larger size in a
pair but I'm slightly concerned that the more size class we use, the
more frequent the node needs to grow. If we want to support node
shrink, the deletion is also affected.

> To show what I mean, consider this new table:
>
> node2:   5 +  6       +(5)+  2*8 =   32 bytes
> node6:   5 +  6       +(5)+  6*8 =   64
>
> node12:  5 + 27       +     12*8 =  128
> node27:  5 + 27       +     27*8 =  248(->256)
>
> node91:  5 + 256 + 28 +(7)+ 91*8 = 1024
> node219: 5 + 256 + 28 +(7)+219*8 = 2048
>
> node256: 5 + 32       +(3)+256*8 = 2088(->4096)
>
> Seven size classes are grouped into the four kinds.
>
> The common base at the front is here 5 bytes because there is a new uint8 field for "capacity", which we can ignore
fornode256 since we assume we can always insert/update that node. The control data is the same in each pair, and so the
offsetto the pointer/value array is the same. Thus, migration would look something like: 

I think we can use a bitfield for capacity. That way, we can pack
count (9bits), kind (2bits)and capacity (4bits) in uint16.

> Somewhat unrelated, we could still implement Andres' idea [1] to dispense with the isset array in inner nodes of the
indirectarray type (now node128), since we can just test if the pointer is null. 

Right. I didn't do that to use the common logic for inner node128 and
leaf node128.

Regards,

--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH v1] [meson] add a default option prefix=/usr/local/pgsql
Next
From: Vik Fearing
Date:
Subject: Re: Tracking last scan time