Home > mailing lists

Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: [PoC] Improve dead tuple storage for lazy vacuum
Date	March 14, 2024 05:05:31
Msg-id	CAD21AoAUiYYw7MmEc9KcsjbN3SDKiQp8N7z-=3BxoPEtqT09Qw@mail.gmail.com Whole thread Raw
In response to	Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <johncnaylorls@gmail.com>)
Responses	Re: [PoC] Improve dead tuple storage for lazy vacuum Re: [PoC] Improve dead tuple storage for lazy vacuum
List	pgsql-hackers

Tree view

On Thu, Mar 14, 2024 at 1:29 PM John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Thu, Mar 14, 2024 at 8:53 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Mar 14, 2024 at 9:59 AM John Naylor <johncnaylorls@gmail.com> wrote:
> > > > BTW do we still want to test the tidstore by using a combination of
> > > > SQL functions? We might no longer need to input TIDs via a SQL
> > > > function.
> > >
> > > I'm not sure. I stopped short of doing that to get feedback on this
> > > much. One advantage with SQL functions is we can use generate_series
> > > to easily input lists of blocks with different numbers and strides,
> > > and array literals for offsets are a bit easier. What do you think?
> >
> > While I'm not a fan of the following part, I agree that it makes sense
> > to use SQL functions for test data generation:
> >
> > -- Constant values used in the tests.
> > \set maxblkno 4294967295
> > -- The maximum number of heap tuples (MaxHeapTuplesPerPage) in 8kB block is 291.
> > -- We use a higher number to test tidstore.
> > \set maxoffset 512
>
> I'm not really a fan of these either, and could be removed a some
> point if we've done everything else nicely.
>
> > It would also be easier for developers to test the tidstore with their
> > own data set. So I agreed with the current approach; use SQL functions
> > for data generation and do the actual tests inside C functions.
>
> Okay, here's an another idea: Change test_lookup_tids() to be more
> general and put the validation down into C as well. First we save the
> blocks from do_set_block_offsets() into a table, then with all those
> blocks lookup a sufficiently-large range of possible offsets and save
> found values in another array. So the static items structure would
> have 3 arrays: inserts, successful lookups, and iteration (currently
> the iteration output is private to check_set_block_offsets(). Then
> sort as needed and check they are all the same.

That's a promising idea. We can use the same mechanism for randomized
tests too. If you're going to work on this, I'll do other tests on my
environment in the meantime.

>
> Further thought: We may not really need to test block numbers that
> vigorously, since the radix tree tests should cover keys/values pretty
> well.

Agreed. Probably boundary block numbers: 0, 1, MaxBlockNumber - 1, and
MaxBlockNumber, would be sufficient.

>  The difference here is using bitmaps of tids and that should be
> well covered.

Right. We would need to test offset numbers vigorously instead.

>
> Locally (not CI), we should try big inputs to make sure we can
> actually go up to many GB -- it's easier and faster this way than
> having vacuum give us a large data set.

I'll do these tests.

>
> > Is it
> > convenient for developers if we have functions like generate_tids()
> > and generate_random_tids() to generate TIDs so that they can pass them
> > to do_set_block_offsets()?
>
> I guess I don't see the advantage of adding a layer of indirection at
> this point, but it could be useful at a later time.

Agreed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Evgeny Smirnov
Date: 14 March 2024, 05:05:25
Subject: Can Execute commands for different portals interleave?

From: "Tristan Partin"
Date: 14 March 2024, 05:13:18
Subject: Remove a FIXME and unused variables in Meson

Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

Previous

Next