Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoDRXHb1ed7KpAPjBYiBs2KpiVJ+YnJD5MKWTw10Jz_zQQ@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Sun, Jan 14, 2024 at 10:43 PM John Naylor <johncnaylorls@gmail.com> wrote: > > On Fri, Jan 12, 2024 at 3:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Thu, Jan 11, 2024 at 9:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > So I agree to remove both max_bytes and num_items from the control > > > object.Also, as you mentioned, we can remove the tidstore control > > > object itself. TidStoreGetHandle() returns a radix tree handle, and we > > > can pass it to TidStoreAttach(). I'll try it. > > Thanks. It's worth looking closely here. > > > I realized that if we remove the whole tidstore control object > > including max_bytes, processes who attached the shared tidstore cannot > > use TidStoreIsFull() actually as it always returns true. > > I imagine that we'd replace that with a function (maybe an earlier > version had it?) to report the memory usage to the caller, which > should know where to find max_bytes. > > > Also they > > cannot use TidStoreReset() as well since it needs to pass max_bytes to > > RT_CREATE(). It might not be a problem in terms of lazy vacuum, but it > > could be problematic for general use. > > HEAD has no problem finding the necessary values, and I don't think > it'd be difficult to maintain that ability. I'm not actually sure what > "general use" needs to have, and I'm not sure anyone can guess. > There's the future possibility of parallel heap-scanning, but I'm > guessing a *lot* more needs to happen for that to work, so I'm not > sure how much it buys us to immediately start putting those two fields > in a special abstraction. The only other concrete use case mentioned > in this thread that I remember is bitmap heap scan, and I believe that > would never need to reset, only free the whole thing when finished. > > I spent some more time studying parallel vacuum, and have some > thoughts. In HEAD, we have > > -/* > - * VacDeadItems stores TIDs whose index tuples are deleted by index vacuuming. > - */ > -typedef struct VacDeadItems > -{ > - int max_items; /* # slots allocated in array */ > - int num_items; /* current # of entries */ > - > - /* Sorted array of TIDs to delete from indexes */ > - ItemPointerData items[FLEXIBLE_ARRAY_MEMBER]; > -} VacDeadItems; > > ...which has the tids, plus two fields that function _very similarly_ > to the two extra fields in the tidstore control object. It's a bit > strange to me that the patch doesn't have this struct anymore. > > I suspect if we keep it around (just change "items" to be the local > tidstore struct), the patch would have a bit less churn and look/work > more like the current code. I think it might be easier to read if the > v17 commits are suited to the current needs of vacuum, rather than try > to anticipate all uses. Richer abstractions can come later if needed. Just changing "items" to be the local tidstore struct could make the code tricky a bit, since max_bytes and num_items are on the shared memory while "items" is a local pointer to the shared tidstore. This is a reason why I abstract them behind TidStore. However, IIUC the current parallel vacuum can work with such VacDeadItems fields, fortunately. The leader process can use VacDeadItems allocated on DSM, and worker processes can use a local VacDeadItems of which max_bytes and num_items are copied from the shared one and "items" is a local pointer. Assuming parallel heap scan requires for both the leader and workers to update the shared VacDeadItems concurrently, we may need such richer abstractions. I've implemented this idea in the v52 patch set. Here is the summary of the updates: 0008: Remove the control object from tidstore. Also removed some unsupported functions such as TidStoreNumTids() 0009: Adjust lazy vacuum integration patch with the control object removal. I've not updated any locking code yet. Once we confirm this direction, I'll update the locking code too. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: