Re: Better way of dealing with pgstat wait timeout during buildfarm runs? - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Better way of dealing with pgstat wait timeout during buildfarm runs? |
Date | |
Msg-id | CA+TgmoZHEg1aHAewYpV9yCbFFj8sDOFufMnPgyT_2jkj2nU89A@mail.gmail.com Whole thread Raw |
In response to | Re: Better way of dealing with pgstat wait timeout during buildfarm runs? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
|
List | pgsql-hackers |
On Sat, Dec 27, 2014 at 7:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: >> On 12/27/2014 12:16 AM, Alvaro Herrera wrote: >>> Tom Lane wrote: >>>> The argument that autovac workers need fresher stats than anything else >>>> seems pretty dubious to start with. Why shouldn't we simplify that down >>>> to "they use PGSTAT_STAT_INTERVAL like everybody else"? > >>> The point of wanting fresher stats than that, eons ago, was to avoid a >>> worker vacuuming a table that some other worker vacuumed more recently >>> than PGSTAT_STAT_INTERVAL. ... >>> Nowadays we can probably disregard the whole issue, since starting a new >>> vacuum just after the prior one finished should not cause much stress to >>> the system thanks to the visibility map. > >> Vacuuming is far from free, even if the visibility map says that most >> pages are visible to all: you still scan all indexes, if you remove any >> dead tuples at all. > > With typical autovacuum settings, I kinda doubt that there's much value in > reducing the window for this problem from 500ms to 10ms. As Alvaro says, > this was just a partial, kluge solution from the start --- if we're > worried about such duplicate vacuuming, we should undertake a real > solution that closes the window altogether. In any case, timeouts > occurring inside autovacuum are not directly causing the buildfarm > failures, since autovacuum's log entries don't reflect into regression > outputs. (It's possible that autovacuum's tight tolerance is contributing > to the failures by increasing the load on the stats collector, but I'm > not sure I believe that.) > > To get back to that original complaint about buildfarm runs failing, > I notice that essentially all of those failures are coming from "wait > timeout" warnings reported by manual VACUUM commands. Now, VACUUM itself > has no need to read the stats files. What's actually causing these > messages is failure to get a timely response in pgstat_vacuum_stat(). > So let me propose a drastic solution: let's dike out this bit in vacuum.c: > > /* > * Send info about dead objects to the statistics collector, unless we are > * in autovacuum --- autovacuum.c does this for itself. > */ > if ((vacstmt->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess()) > pgstat_vacuum_stat(); > > This would have the effect of transferring all responsibility for > dead-stats-entry cleanup to autovacuum. For ordinary users, I think > that'd be just fine. It might be less fine though for people who > disable autovacuum, if there still are any. -1. I don't think it's a good idea to inflict pain on people who want to schedule their vacuums manually (and yes, there are some) to get clean buildfarm runs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: