Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] - Mailing list pgsql-hackers
From | Jeff Janes |
---|---|
Subject | Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] |
Date | |
Msg-id | CAMkU=1zf8s7HJd+tzp_BPD4X8UYLitd+E3Q6mDDOe7jhRkv6GQ@mail.gmail.com Whole thread Raw |
In response to | Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: TODO : Allow parallel cores to be used by vacuumdb [
WIP ]
|
List | pgsql-hackers |
On Mon, Jun 30, 2014 at 3:17 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Jeff Janes wrote: > >> In particular, pgpipe is almost an exact duplicate between them, >> except the copy in vac_parallel.c has fallen behind changes made to >> parallel.c. (Those changes would have fixed the Windows warnings). I >> think that this function (and perhaps other parts as >> well--"exit_horribly" for example) need to refactored into a common >> file that both files can include. I don't know where the best place >> for that would be, though. (I haven't done this type of refactoring >> myself.) > > I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos. > Maybe we should move pgpipe back to src/port and have pg_dump and this > new thing use that. I'm not sure about the rest of duplication in > vac_parallel.c; there might be a lot in common with what > pg_dump/parallel.c does too. Having two copies of code is frowned upon > for good reasons. This patch introduces 1200 lines of new code in > vac_parallel.c, ugh. > > If we really require 1200 lines to get parallel vacuum working for > vacuumdb, I would question the wisdom of this effort. To me, it seems > better spent improving autovacuum to cover whatever it is that this > patch is supposed to be good for --- or maybe just enable having a shell > script that launches multiple vacuumdb instances in parallel ... I would only envision using the parallel feature for vacuumdb after a pg_upgrade or some other major maintenance window (that is the only time I ever envision using vacuumdb at all). I don't think autovacuum can be expected to handle such situations well, as it is designed to be a smooth background process. I guess the ideal solution would be for manual VACUUM to have a PARALLEL option, then vacuumdb could just invoke that one table at a time. That way you would get within-table parallelism which would be important if one table dominates the entire database cluster. But I don't foresee that happening any time soon. I don't know how to calibrate the number of lines that is worthwhile. If you write in C and need to have cross-platform compatibility and robust error handling, it seems to take hundreds of lines to do much of anything. The code duplication is a problem, but I don't think just raw line count is, especially since it has already been written. The trend in this project seems to be for shell scripts to eventually get converted into C programs. In fact, src/bin/scripts now has no scripts at all. Also it is important to vacuum/analyze tables in the same database at the same time, otherwise you will not get much speed-up in the ordinary case where there is only one meaningful database. Doing that in a shell script would be fairly hard. It should be pretty easy in Perl (at least for me--I'm sure others disagree), but that also doesn't seem to be the way we do things for programs intended for end users. Cheers, Jeff
pgsql-hackers by date: