Thread: GIN non-intrusive vacuum of posting tree
Hi hackers! Here is a patch with more concurrency-friendly vacuum of GIN. ===Short problem statement=== Currently GIN vacuum during cleaning posting tree can lock this tree for a long time without real need. ===Problem statement=== Vacuum of posting tree (function ginVacuumPostingTree() in ginvacuum.c ) is doing two passes through posting tree: 1. ginVacuumPostingTreeLeaves() takes LockBufferForCleanup, effectively excluding all inserts. Then it traverses down trough tree taking exclusive lock, effectively excluding all reads. On leaf level it calls ginVacuumPostingTreeLeaf(), which deletes all dead tuples. If there are any empty page, root lock is not relaesed, it passes to stage two. Interim: vacuum_delay_point(), which can hand for a while, holding CleanupLock on root. 2. If there are any empty pages, ginScanToDelete() scans through tree, deleting empty lead pages, then deleting empty inner pages, if they emerged after leaf page deletion. ===Proposed algorithm=== ginVacuumPostingTreeLeaves() takes SHARED locks on inner pages and EXCLUSIVE locks on leaf pages. On leaf pages it calls ginVacuumPostingTreeLeaf(). If ginVacuumPostingTreeLeaves() encounters subtree with all leafs empty, then it takes LockBufferForCleanup() on page P pointing to empty subtree. After taking lock on P, ginScanToDelete() is called for P. For every parent of P we can guarantee that this procedure will not be necessary: if P was empty subtree itself, we would pass this procedure to parent (unless P is root, than behavior is effectively equal to before-patched). ===Testing=== This patch solved a problem encountered by Evgeniy Efimkin and Vladimir Borodin from Yandex.Mail. They implemented testbed with GIN index CREATE TABLE BOX (uid bigint, lids integer[] NOT NULL CHECK (array_ndims(lids) = 1)); CREATE OR REPLACE FUNCTION ulids( i_uid bigint, i_lids integer[] ) RETURNS bigint[] AS $$ SELECT array_agg((i_uid << 32) | lid) FROM unnest(i_lids) lid; $$ LANGUAGE SQL IMMUTABLE STRICT; CREATE INDEX i_box_uid_lids ON box USING gin (ulids(uid, lids)) WITH (FASTUPDATE=OFF); Executing by a pgbench following query \setrandom uid 1 1500000 \setrandom lid 1 1000 \setrandom lid2 1 1000 \setrandom lid3 1 1000 BEGIN; insert into box values(:uid,'{:lid,:lid2,:lid3}'); insert into box values(:uid,'{}'); END; and eventually deleting some of data. This testbed showed VACUUM holding inserts for up to tenths of seconds. They claim that proposed patch made vacuum invisible in this test. Evgeniy, Vladimir, if I missed something or you have something to add, please join discussion. ===Known issues=== 1. Probably, there exists better algorithms. I could not adapt B-tree vacuum wizardy to GIN, but that does not mean it is impossible. 2. There may be more CleanUp locks taken. Under write-dense scenarios, this can lead to longer vacuum, but it should not consume more resources, just wait. 3. I have changed locks sequence during page deleteion. I think that it is safe, but comments there were in fear of inserts (excluded by CleanupLock). More details in a code of ginDeletePage(). 4. ginVacuumPostingTreeLeaves() traverses children pages after release of parent lock (event SHARED). This is safe if there is only one vacuum at a time. Is there a reason to be afraid of concurrent vacuums? I will be happy if someone points me were is a best place to read about B-tree vacuum process. Or if someone will explain me how it works in a few words. Dev process is here https://github.com/x4m/postgres_g/pull/2 Thank you for reading, I'm looking forward to hear your thought on the matter. Best regards, Andrey Borodin.
Attachment
28 нояб. 2016 г., в 20:31, Andrew Borodin <borodin@octonica.com> написал(а):
This patch solved a problem encountered by Evgeniy Efimkin and
Vladimir Borodin from Yandex.Mail.
and eventually deleting some of data. This testbed showed VACUUM
holding inserts for up to tenths of seconds. They claim that proposed
patch made vacuum invisible in this test.
Evgeniy, Vladimir, if I missed something or you have something to add,
please join discussion.
Yep, in our load environment the table is not so big (~ 50 GB), GIN index size is ~ 1 GB. And under our load profile we have seen 90 seconds of impossibility to do an insert into the table because of vacuuming this index. I confirm that with this patch we now don’t see any spikes of errors as it was previously.
Here is v1 of the patch. Now it has changes for README and contains more comments clarifying changes of locking model. Also I will elaborate some more on what is patched. Main portion of changes is made to function ginVacuumPostingTreeLeaves(). Before the patch it was traversing tree in depth-first fashion, acquiring exclusive lock on each page and removing dead tuples from leafs. Also this function used to hold cleanup lock. Now this function is doing same DFS, but without cleanup lock, acquiring only read locks on inner pages and exclusive lock on leafs before eliminating dead tuples. If this function finds empty leafs, it computes minimal subtree, containing only empty pages and start scan for empty pages from parent page pointing to found subtree. This scan acquires cleanup lock on root of scan (not necessarily root of posting tree). Cleanup lock ensures no inserts are inside subtree. Scan traverses subtree DF taking exclusive locks from left to right. For any page being deleted all leftmost pages were locked and unlocked before. New reads cannot enter subtree, all old readscans were excluded by lock\unlock. Thus there shall not be deadlocks with ginStepRight(). We get lock on page being deleted, then on a left page. ginStepRight() takes lock on left page, than on right page. But it’s presence is excluded by cleanup lock and DFS scan with locks of upper and left parts of tree. Thank you for reading this. Concurrency bothers me a lot. If you see that anything is wrong or suspicious, please do not hesitate to post your thoughts. Best regards, Andrey Borodin.
Attachment
On Wed, Nov 30, 2016 at 11:38 AM, Andrew Borodin <borodin@octonica.com> wrote: > This scan acquires cleanup lock on root of scan (not necessarily root > of posting tree). Cleanup lock ensures no inserts are inside subtree. > Scan traverses subtree DF taking exclusive locks from left to right. > For any page being deleted all leftmost pages were locked and unlocked > before. New reads cannot enter subtree, all old readscans were > excluded by lock\unlock. Thus there shall not be deadlocks with > ginStepRight(). > > We get lock on page being deleted, then on a left page. > ginStepRight() takes lock on left page, than on right page. But it’s > presence is excluded by cleanup lock and DFS scan with locks of upper > and left parts of tree. > > Thank you for reading this. Concurrency bothers me a lot. If you see > that anything is wrong or suspicious, please do not hesitate to post > your thoughts. I don't know much about GIN, but this seems like an interesting improvement. I hope somebody who knows more about GIN will step up to review it in depth. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company