GIN non-intrusive vacuum of posting tree - Mailing list pgsql-hackers
From | Andrew Borodin |
---|---|
Subject | GIN non-intrusive vacuum of posting tree |
Date | |
Msg-id | CAJEAwVEA4iLqBfdi3Qm=UwGcFX=h5x0FSbifo2+EE4G3D9e3wQ@mail.gmail.com Whole thread Raw |
Responses |
Re: GIN non-intrusive vacuum of posting tree
|
List | pgsql-hackers |
Hi hackers! Here is a patch with more concurrency-friendly vacuum of GIN. ===Short problem statement=== Currently GIN vacuum during cleaning posting tree can lock this tree for a long time without real need. ===Problem statement=== Vacuum of posting tree (function ginVacuumPostingTree() in ginvacuum.c ) is doing two passes through posting tree: 1. ginVacuumPostingTreeLeaves() takes LockBufferForCleanup, effectively excluding all inserts. Then it traverses down trough tree taking exclusive lock, effectively excluding all reads. On leaf level it calls ginVacuumPostingTreeLeaf(), which deletes all dead tuples. If there are any empty page, root lock is not relaesed, it passes to stage two. Interim: vacuum_delay_point(), which can hand for a while, holding CleanupLock on root. 2. If there are any empty pages, ginScanToDelete() scans through tree, deleting empty lead pages, then deleting empty inner pages, if they emerged after leaf page deletion. ===Proposed algorithm=== ginVacuumPostingTreeLeaves() takes SHARED locks on inner pages and EXCLUSIVE locks on leaf pages. On leaf pages it calls ginVacuumPostingTreeLeaf(). If ginVacuumPostingTreeLeaves() encounters subtree with all leafs empty, then it takes LockBufferForCleanup() on page P pointing to empty subtree. After taking lock on P, ginScanToDelete() is called for P. For every parent of P we can guarantee that this procedure will not be necessary: if P was empty subtree itself, we would pass this procedure to parent (unless P is root, than behavior is effectively equal to before-patched). ===Testing=== This patch solved a problem encountered by Evgeniy Efimkin and Vladimir Borodin from Yandex.Mail. They implemented testbed with GIN index CREATE TABLE BOX (uid bigint, lids integer[] NOT NULL CHECK (array_ndims(lids) = 1)); CREATE OR REPLACE FUNCTION ulids( i_uid bigint, i_lids integer[] ) RETURNS bigint[] AS $$ SELECT array_agg((i_uid << 32) | lid) FROM unnest(i_lids) lid; $$ LANGUAGE SQL IMMUTABLE STRICT; CREATE INDEX i_box_uid_lids ON box USING gin (ulids(uid, lids)) WITH (FASTUPDATE=OFF); Executing by a pgbench following query \setrandom uid 1 1500000 \setrandom lid 1 1000 \setrandom lid2 1 1000 \setrandom lid3 1 1000 BEGIN; insert into box values(:uid,'{:lid,:lid2,:lid3}'); insert into box values(:uid,'{}'); END; and eventually deleting some of data. This testbed showed VACUUM holding inserts for up to tenths of seconds. They claim that proposed patch made vacuum invisible in this test. Evgeniy, Vladimir, if I missed something or you have something to add, please join discussion. ===Known issues=== 1. Probably, there exists better algorithms. I could not adapt B-tree vacuum wizardy to GIN, but that does not mean it is impossible. 2. There may be more CleanUp locks taken. Under write-dense scenarios, this can lead to longer vacuum, but it should not consume more resources, just wait. 3. I have changed locks sequence during page deleteion. I think that it is safe, but comments there were in fear of inserts (excluded by CleanupLock). More details in a code of ginDeletePage(). 4. ginVacuumPostingTreeLeaves() traverses children pages after release of parent lock (event SHARED). This is safe if there is only one vacuum at a time. Is there a reason to be afraid of concurrent vacuums? I will be happy if someone points me were is a best place to read about B-tree vacuum process. Or if someone will explain me how it works in a few words. Dev process is here https://github.com/x4m/postgres_g/pull/2 Thank you for reading, I'm looking forward to hear your thought on the matter. Best regards, Andrey Borodin.
Attachment
pgsql-hackers by date: