Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoAUiYYw7MmEc9KcsjbN3SDKiQp8N7z-=3BxoPEtqT09Qw@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
Re: [PoC] Improve dead tuple storage for lazy vacuum |
List | pgsql-hackers |
On Thu, Mar 14, 2024 at 1:29 PM John Naylor <johncnaylorls@gmail.com> wrote: > > On Thu, Mar 14, 2024 at 8:53 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Thu, Mar 14, 2024 at 9:59 AM John Naylor <johncnaylorls@gmail.com> wrote: > > > > BTW do we still want to test the tidstore by using a combination of > > > > SQL functions? We might no longer need to input TIDs via a SQL > > > > function. > > > > > > I'm not sure. I stopped short of doing that to get feedback on this > > > much. One advantage with SQL functions is we can use generate_series > > > to easily input lists of blocks with different numbers and strides, > > > and array literals for offsets are a bit easier. What do you think? > > > > While I'm not a fan of the following part, I agree that it makes sense > > to use SQL functions for test data generation: > > > > -- Constant values used in the tests. > > \set maxblkno 4294967295 > > -- The maximum number of heap tuples (MaxHeapTuplesPerPage) in 8kB block is 291. > > -- We use a higher number to test tidstore. > > \set maxoffset 512 > > I'm not really a fan of these either, and could be removed a some > point if we've done everything else nicely. > > > It would also be easier for developers to test the tidstore with their > > own data set. So I agreed with the current approach; use SQL functions > > for data generation and do the actual tests inside C functions. > > Okay, here's an another idea: Change test_lookup_tids() to be more > general and put the validation down into C as well. First we save the > blocks from do_set_block_offsets() into a table, then with all those > blocks lookup a sufficiently-large range of possible offsets and save > found values in another array. So the static items structure would > have 3 arrays: inserts, successful lookups, and iteration (currently > the iteration output is private to check_set_block_offsets(). Then > sort as needed and check they are all the same. That's a promising idea. We can use the same mechanism for randomized tests too. If you're going to work on this, I'll do other tests on my environment in the meantime. > > Further thought: We may not really need to test block numbers that > vigorously, since the radix tree tests should cover keys/values pretty > well. Agreed. Probably boundary block numbers: 0, 1, MaxBlockNumber - 1, and MaxBlockNumber, would be sufficient. > The difference here is using bitmaps of tids and that should be > well covered. Right. We would need to test offset numbers vigorously instead. > > Locally (not CI), we should try big inputs to make sure we can > actually go up to many GB -- it's easier and faster this way than > having vacuum give us a large data set. I'll do these tests. > > > Is it > > convenient for developers if we have functions like generate_tids() > > and generate_random_tids() to generate TIDs so that they can pass them > > to do_set_block_offsets()? > > I guess I don't see the advantage of adding a layer of indirection at > this point, but it could be useful at a later time. Agreed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: