Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoAxXGf7FbuA9jvx+XUFrEJUz9JNOmkxF8SLppOOOvhBtg@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Mon, Mar 25, 2024 at 3:25 PM John Naylor <johncnaylorls@gmail.com> wrote: > > On Fri, Mar 22, 2024 at 12:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Thu, Mar 21, 2024 at 7:48 PM John Naylor <johncnaylorls@gmail.com> wrote: > > > > v77-0001 > > > > > > - dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items)); > > > - dead_items->max_items = max_items; > > > - dead_items->num_items = 0; > > > + vacrel->dead_items = TidStoreCreate(vac_work_mem, NULL, 0); > > > + > > > + dead_items_info = (VacDeadItemsInfo *) palloc(sizeof(VacDeadItemsInfo)); > > > + dead_items_info->max_bytes = vac_work_mem * 1024L; > > > > > > This is confusing enough that it looks like a bug: > > > > > > [inside TidStoreCreate()] > > > /* choose the maxBlockSize to be no larger than 1/16 of max_bytes */ > > > while (16 * maxBlockSize > max_bytes * 1024L) > > > maxBlockSize >>= 1; > > > > > > This was copied from CreateWorkExprContext, which operates directly on > > > work_mem -- if the parameter is actually bytes, we can't "* 1024" > > > here. If we're passing something measured in kilobytes, the parameter > > > is badly named. Let's use convert once and use bytes everywhere. > > > > True. The attached 0001 patch fixes it. > > v78-0001 and 02 are fine, but for 0003 there is a consequence that I > didn't see mentioned: I think that the fix done in 0001 patch can be merged into 0003 patch. > vac_work_mem now refers to bytes, where before > it referred to kilobytes. It seems pretty confusing to use a different > convention from elsewhere, especially if it has the same name but > different meaning across versions. Worse, this change is buried inside > a moving-stuff-around diff, making it hard to see. Maybe "convert only > once" is still possible, but I was actually thinking of > > + dead_items_info->max_bytes = vac_work_mem * 1024L; > + vacrel->dead_items = TidStoreCreate(dead_items_info->max_bytes, NULL, 0); > > That way it's pretty obvious that it's correct. That may require a bit > of duplication and moving around for shmem, but there is some of that > already. Agreed. > > More on 0003: > > - * The major space usage for vacuuming is storage for the array of dead TIDs > + * The major space usage for vacuuming is TidStore, a storage for dead TIDs > > + * autovacuum_work_mem) memory space to keep track of dead TIDs. If the > + * TidStore is full, we must call lazy_vacuum to vacuum indexes (and to vacuum > > I wonder if the comments here should refer to it using a more natural > spelling, like "TID store". > > - * items in the dead_items array for later vacuuming, count live and > + * items in the dead_items for later vacuuming, count live and > > Maybe "the dead_items area", or "the dead_items store" or "in dead_items"? > > - * remaining LP_DEAD line pointers on the page in the dead_items > - * array. These dead items include those pruned by lazy_scan_prune() > - * as well we line pointers previously marked LP_DEAD. > + * remaining LP_DEAD line pointers on the page in the dead_items. > + * These dead items include those pruned by lazy_scan_prune() as well > + * we line pointers previously marked LP_DEAD. > > Here maybe "into dead_items". > > Also, "we line pointers" seems to be a pre-existing typo. > > - (errmsg("table \"%s\": removed %lld dead item identifiers in %u pages", > - vacrel->relname, (long long) index, vacuumed_pages))); > + (errmsg("table \"%s\": removed " INT64_FORMAT "dead item identifiers > in %u pages", > + vacrel->relname, vacrel->dead_items_info->num_items, vacuumed_pages))); > > This is a translated message, so let's keep the message the same. > > /* > * Allocate dead_items (either using palloc, or in dynamic shared memory). > * Sets dead_items in vacrel for caller. > * > * Also handles parallel initialization as part of allocating dead_items in > * DSM when required. > */ > static void > dead_items_alloc(LVRelState *vacrel, int nworkers) > > This comment didn't change at all. It's not wrong, but let's consider > updating the specifics. Fixed above comments. > v78-0005: > > "Although commit XXX > allowed specifying the initial and maximum DSA segment sizes, callers > still needed to clamp their own limits, which was not consistent and > user-friendly." > > Perhaps s/still needed/would have needed/ ..., since we're preventing > that necessity. > > > > Did you try it with 1MB m_w_m? > > > > I've incorporated the above comments and test results look good to me. > > Could you be more specific about what the test was? > Does it work with 1MB m_w_m? If m_w_m is 1MB, both the initial and maximum segment sizes are 256kB. FYI other test cases I tested were: * m_w_m = 2199023254528 (maximum value) initial: 1MB max: 128GB * m_w_m = 64MB (default) initial: 1MB max: 8MB > > + /* > + * Choose the initial and maximum DSA segment sizes to be no longer > + * than 1/16 and 1/8 of max_bytes, respectively. If the initial > + * segment size is low, we end up having many segments, which risks > + * exceeding the total number of segments the platform can have. > > The second sentence is technically correct, but I'm not sure how it > relates to the code that follows. > > + while (16 * dsa_init_size > max_bytes) > + dsa_init_size >>= 1; > + while (8 * dsa_max_size > max_bytes) > + dsa_max_size >>= 1; > > I'm not sure we need a separate loop for "dsa_init_size". Can we just have : > > while (8 * dsa_max_size > max_bytes) > dsa_max_size >>= 1; > > if (dsa_max_size < DSA_MIN_SEGMENT_SIZE) > dsa_max_size = DSA_MIN_SEGMENT_SIZE; > > if (dsa_init_size > dsa_max_size) > dsa_init_size = dsa_max_size; Agreed. > > @@ -113,13 +113,10 @@ static void > tidstore_iter_extract_tids(TidStoreIter *iter, BlockNumber blkno, > * CurrentMemoryContext at the time of this call. The TID storage, backed > * by a radix tree, will live in its child memory context, rt_context. The > * TidStore will be limited to (approximately) max_bytes total memory > - * consumption. If the 'area' is non-NULL, the radix tree is created in the > - * DSA area. > - * > - * The returned object is allocated in backend-local memory. > + * consumption. > > The existing comment slipped past my radar, but max_bytes is not a > limit, it's a hint. Come to think of it, it never was a limit in the > normal sense, but in earlier patches it was the criteria for reporting > "I'm full" when asked. Updated the comment. > > void > TidStoreDestroy(TidStore *ts) > { > - /* Destroy underlying radix tree */ > if (TidStoreIsShared(ts)) > + { > + /* Destroy underlying radix tree */ > shared_ts_free(ts->tree.shared); > + > + dsa_detach(ts->area); > + } > else > local_ts_free(ts->tree.local); > > It's still destroyed in the local case, so not sure why this comment was moved? > > v78-0006: > > -#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2 > +/* 2 was PARALLEL_VACUUM_KEY_DEAD_ITEMS */ > > I don't see any use in core outside this module -- maybe it's possible > to renumber these? Fixed the above points. I've attached the latest patches. The 0004 and 0006 patches are updates from the previous version. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
- v79-0006-Address-review-comments-on-vacuum-integration.patch
- v79-0004-Address-review-comments-on-tidstore.patch
- v79-0005-Use-TidStore-for-dead-tuple-TIDs-storage-during-.patch
- v79-0002-Allow-specifying-initial-and-maximum-segment-siz.patch
- v79-0003-Rethink-create-and-attach-APIs-of-shared-TidStor.patch
- v79-0001-Fix-an-inconsistent-function-prototype-with-the-.patch
pgsql-hackers by date: