Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoAyc1j=BCdUqZfk6qbdjZ68UgRx1Gkpk0oah4K7S0Ri9g@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Wed, Mar 20, 2024 at 11:19 PM John Naylor <johncnaylorls@gmail.com> wrote: > > On Wed, Mar 20, 2024 at 8:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I forgot to report the results. Yes, I did some tests where I inserted > > many TIDs to make the tidstore use several GB memory. I did two cases: > > > > 1. insert 100M blocks of TIDs with an offset of 100. > > 2. insert 10M blocks of TIDs with an offset of 2048. > > > > The tidstore used about 4.8GB and 5.2GB, respectively, and all lookup > > and iteration results were expected. > > Thanks for confirming! > > > While reviewing the codes again, the following two things caught my eyes: > > > > in check_set_block_offset() function, we don't take a lock on the > > tidstore while checking all possible TIDs. I'll add > > TidStoreLockShare() and TidStoreUnlock() as follows: > > > > + TidStoreLockShare(tidstore); > > if (TidStoreIsMember(tidstore, &tid)) > > ItemPointerSet(&items.lookup_tids[num_lookup_tids++], > > blkno, offset); > > + TidStoreUnlock(tidstore); > > In one sense, all locking in the test module is useless since there is > only a single process. On the other hand, it seems good to at least > run what we have written to run it trivially, and serve as an example > of usage. We should probably be consistent, and document at the top > that the locks are pro-forma only. Agreed. > > > Regarding TidStoreMemoryUsage(), IIUC the caller doesn't need to take > > a lock on the shared tidstore since dsa_get_total_size() (called by > > RT_MEMORY_USAGE()) does appropriate locking. I think we can mention it > > in the comment as follows: > > > > -/* Return the memory usage of TidStore */ > > +/* > > + * Return the memory usage of TidStore. > > + * > > + * In shared TidStore cases, since shared_ts_memory_usage() does appropriate > > + * locking, the caller doesn't need to take a lock. > > + */ > > > > What do you think? > > That duplicates the underlying comment on the radix tree function that > this calls, so I'm inclined to leave it out. At this level it's > probably best to document when a caller _does_ need to take an action. Okay, I didn't change it. > > One thing I forgot to ask about earlier: > > +-- Add tids in out of order. > > Are they (the blocks to be precise) really out of order? The VALUES > statement is ordered, but after inserting it does not output that way. > I wondered if this is platform independent, but CI and our dev > machines haven't failed this test, and I haven't looked into what > determines the order. It's easy enough to hide the blocks if we ever > need to, as we do elsewhere... It seems not necessary as such a test is already covered by test_radixtree. I've changed the query to hide the output blocks. I've pushed the tidstore patch after incorporating the above changes. In addition to that, I've added the following changes before the push: - Added src/test/modules/test_tidstore/.gitignore file. - Removed unnecessary #include from tidstore.c. The buildfarm has been all-green so far. I've attached the latest vacuum improvement patch. I just remembered that the tidstore cannot still be used for parallel vacuum with minimum maintenance_work_mem. Even when the shared tidstore is empty, its memory usage reports 1056768 bytes, a bit above 1MB (1048576 bytes). We need something discussed on another thread[1] in order to make it work. Regards, [1] https://www.postgresql.org/message-id/CAD21AoCVMw6DSmgZY9h%2BxfzKtzJeqWiwxaUD2T-FztVcV-XibQ%40mail.gmail.com -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: