Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | John Naylor |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CANWCAZZYSeB3i+UeSBjihcT1FuaQLiCkQCjGHp8hvPpXp9mm5w@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Thu, Dec 14, 2023 at 7:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > In v45, 0001 - 0006 are from earlier versions but I've merged previous > updates. So the radix tree now has RT_SET() and RT_FIND() but not > RT_GET() and RT_SEARCH(). 0007 and 0008 are the updates from previous > versions that incorporated the above comments. 0009 patch integrates > tidstore with lazy vacuum. Excellent! I repeated a quick run of the small "test 1" with very low m_w_m from https://www.postgresql.org/message-id/CAFBsxsHrvTPUK%3DC1%3DxweJjGujja4Xjfgva3C8jnW3Shz6RBnFg%40mail.gmail.com ...and got similar results, so we still have good space-efficiency on this test: master: INFO: finished vacuuming "john.public.test": index scans: 9 system usage: CPU: user: 56.83 s, system: 9.36 s, elapsed: 119.62 s v45: INFO: finished vacuuming "john.public.test": index scans: 1 system usage: CPU: user: 6.82 s, system: 2.05 s, elapsed: 10.89 s More sparse TID distributions won't be as favorable, but we have ideas to improve that in the future. For my next steps, I will finish the node-shrinking behavior and save for a later patchset. Not needed for tid store, but needs to happen because of assumptions in the code. Also, some time ago, I think I commented out RT_FREE_RECURSE to get something working, so I'll fix it, and look at other fixmes and todos. > Note that DSA segment problem is not > resolved yet in this patch. I remember you started a separate thread about this, but I don't think it got any attention. Maybe reply with a "TLDR;" and share a patch to allow controlling max segment size. Some more comments: v45-0003: Since RT_ITERATE_NEXT_PTR works for tid store, do we even need RT_ITERATE_NEXT anymore? The former should handle fixed-length values just fine? If so, we should rename it to match the latter. + * The caller is responsible for locking/unlocking the tree in shared mode. This is not new to v45, but this will come up again below. This needs more explanation: Since we're returning a pointer (to support variable-length values), the caller needs to maintain control until it's finished with the value. v45-0005: + * Regarding the concurrency support, we use a single LWLock for the TidStore. + * The TidStore is exclusively locked when inserting encoded tids to the + * radix tree or when resetting itself. When searching on the TidStore or + * doing the iteration, it is not locked but the underlying radix tree is + * locked in shared mode. This is just stating facts without giving any reasons. Readers are going to wonder why it's inconsistent. The "why" is much more important than the "what". Even with that, this comment is also far from the relevant parts, and so will get out of date. Maybe we can just make sure each relevant function is explained individually. v45-0007: -RT_SCOPE RT_RADIX_TREE * RT_CREATE(MemoryContext ctx); +RT_SCOPE RT_RADIX_TREE * RT_CREATE(MemoryContext ctx, Size work_mem); Tid store calls this max_bytes -- can we use that name here, too? "work_mem" is highly specific. - RT_PTR_ALLOC *slot; + RT_PTR_ALLOC *slot = NULL; We have a macro for invalid pointer because of DSA. v45-0008: - if (off < 1 || off > MAX_TUPLES_PER_PAGE) + if (unlikely(off < 1 || off > MAX_TUPLES_PER_PAGE)) elog(ERROR, "tuple offset out of range: %u", off); This is a superfluous distraction, since the error path is located way off in the cold segment of the binary. v45-0009: (just a few small things for now) - * lazy_vacuum_heap_page() -- free page's LP_DEAD items listed in the - * vacrel->dead_items array. + * lazy_vacuum_heap_page() -- free page's LP_DEAD items. I think we can keep as "listed in the TID store". - * Allocate dead_items (either using palloc, or in dynamic shared memory). - * Sets dead_items in vacrel for caller. + * Allocate a (local or shared) TidStore for storing dead TIDs. Sets dead_items + * in vacrel for caller. I think we want to keep "in dynamic shared memory". It's still true. I'm not sure anything needs to change here, actually. parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes, - int nrequested_workers, int max_items, - int elevel, BufferAccessStrategy bstrategy) + int nrequested_workers, int vac_work_mem, + int max_offset, int elevel, + BufferAccessStrategy bstrategy) It seems very strange to me that this function has to pass the max_offset. In general, it's been simpler to assume we have a constant max_offset, but in this case that fact is not helping. Something to think about for later. - (errmsg("scanned index \"%s\" to remove %d row versions", + (errmsg("scanned index \"%s\" to remove " UINT64_FORMAT " row versions", This should be signed int64. v45-0010: Thinking about this some more, I'm not sure we need to do anything different for the *starting* segment size. (Controlling *max* size does seem important, however.) For the corner case of m_w_m = 1MB, it's fine if vacuum quits pruning immediately after (in effect) it finds the DSA has gone to 2MB. It's not worth bothering with, IMO. If the memory accounting starts >1MB because we're adding the trivial size of some struct, let's just stop doing that. The segment allocations are what we care about. v45-0011: + /* + * max_bytes is forced to be at least 64kB, the current minimum valid + * value for the work_mem GUC. + */ + max_bytes = Max(64 * 1024L, max_bytes); Why? I believe I mentioned months ago that copying a hard-coded value that can get out of sync is not maintainable, but I don't even see the point of this part.
pgsql-hackers by date: