Re: PATCH: optimized DROP of multiple tables within a transaction - Mailing list pgsql-hackers
From | Shigeru HANADA |
---|---|
Subject | Re: PATCH: optimized DROP of multiple tables within a transaction |
Date | |
Msg-id | 507E89B0.2040108@gmail.com Whole thread Raw |
In response to | PATCH: optimized DROP of multiple tables within a transaction (Tomas Vondra <tv@fuzzy.cz>) |
Responses |
Re: PATCH: optimized DROP of multiple tables within a
transaction
|
List | pgsql-hackers |
Hi Tomas, Sorry to be late. On Sat, Aug 25, 2012 at 7:36 AM, Tomas Vondra <tv@fuzzy.cz> wrote: > attached is a patch that improves performance when dropping multiple > tables within a transaction. Instead of scanning the shared buffers for > each table separately, the patch removes this and evicts all the tables > in a single pass through shared buffers. Here are my review comments. Submission ========== The patch is in unified diff format, and can be applied to the head of master. It doesn't include any test nor document, but it seems ok because the patch doesn't change visible behavior. Usability ========= The patch intends to improve performance of bulk DROP TABLEs which are in a transaction. Such improvement would useful in cases such as 1) dropping set of partitioned tables as periodic maintenance work, and 2) dropping lot of work tables. The patch doesn't change any SQL syntax or built-in objects, but just change internal behavior. Feature test ============ The patch doesn't provide no visible functionality, and existing regression tests passed. Performance test ================ I tested 1000 tables case (each is copy of pgbench_branches with 100000 rows) on 1GB shared_buffers server. Please note that I tested on MacBook air, i.e. storage is not HDD but SSD. Here is the test procedure: 1) loop 1000 times 1-1) create copy table of pgbench_accounts as accounts$i 1-2) load 100000 rows 1-3) add primary key 1-4) select all rows to cache pages in shared buffer 2) BEGIN 3) loop 1000 times 3-1) DROP TABLE accounts$i 4) COMMIT The numbers below are average of 5 trials. ---------+----------+------------- Build | DROP * | COMMIT ---------+----------+------------- Master | 0.239 ms | 1220.421 ms Patched | 0.240 ms | 432.209 ms ---------+----------+------------- * time elapsed for one DROP TABLE statement IIUC the patch's target is improving COMMIT performance by avoiding repeated buffer search loop, so this results show that the patch obtained its goal. Coding review ============= I have some comments about coding. * Some cosmetic changes are necessary. * Variable j in DropRelFileNodeAllBuffersList seems redundant. * RelFileNodeBackendIsTemp() macro is provided for checking whether the relation is local, so using it would be better. Please see attached patch for changes above. * As Robert commented, this patch adds DropRelFileNodeAllBuffersList by copying code from DropRelFileNodeAllBuffers. Please refactor it to avoid code duplication. * In smgrDoPendingDeletes, you free srels explicitly. Can't we leave them to memory context stuff? Even it is required, at least pfree must be called in the case nrels == 0 too. * In smgrDoPendingDeletes, the buffer srels is expanded in every iteration. This seems a bit inefficient. How about doubling the capacity when used it up? This requires additional variable, but reduces repalloc call considerably. * Just an idea, but if we can remove entries for local relations from rnodes array before buffer loop in DropRelFileNodeAllBuffersList, following bsearch might be more efficient, though dropping many temporary tables might be rare. > Our system creates a lot of "working tables" (even 100.000) and we need > to perform garbage collection (dropping obsolete tables) regularly. This > often took ~ 1 hour, because we're using big AWS instances with lots of > RAM (which tends to be slower than RAM on bare hw). After applying this > patch and dropping tables in groups of 100, the gc runs in less than 4 > minutes (i.e. a 15x speed-up). Hm, my environment seems very different from yours. Could you show the setting of shared_buffers in your environment? I'd like to make my test environment as similar as possible to yours. > This is not likely to improve usual performance, but for systems like > ours, this patch is a significant improvement. I'll test the performance of bare DROP TABLEs (not surrounded by BEGIN and COMMIT) tomorrow to confirm that the patch doesn't introduce performance degradation. Regards, -- Shigeru HANADA
Attachment
pgsql-hackers by date: