Re: Heavily modified big table bloat even in auto vacuum is running - Mailing list pgsql-hackers
From | Haribabu kommi |
---|---|
Subject | Re: Heavily modified big table bloat even in auto vacuum is running |
Date | |
Msg-id | 8977CB36860C5843884E0A18D8747B0372BF0B5B@szxeml558-mbs.china.huawei.com Whole thread Raw |
In response to | Re: Heavily modified big table bloat even in auto vacuum is running (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Heavily modified big table bloat even in auto vacuum is running
|
List | pgsql-hackers |
On 29 November 2013 12:00 Amit Kapila wrote: > On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi > <haribabu.kommi@huawei.com> wrote: > > On 25 November 2013 10:43 Amit Kapila wrote: > >> On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi > >> <haribabu.kommi@huawei.com> wrote: > >> > On 19 November 2013 10:33 Amit Kapila wrote: > >> >> If I understood correctly, then your patch's main intention is to > >> >> correct the estimate of dead tuples, so that it can lead to > Vacuum > >> >> cleaning the table/index which otherwise is not happening as per > >> >> configuration value (autovacuum_vacuum_threshold) in some of the > >> >> cases, also it is not reducing the complete bloat (Unpatched - > >> 1532MB > >> >> ~Patched - 1474MB), as the main reason of bloat is extra space > in > >> >> index which can be reclaimed by reindex operation. > >> >> > >> >> So if above is correct then this patch has 3 advantages: > >> >> a. Extra Vacuum on table/index due to better estimation of dead > >> tuples. > >> >> b. Space reclaim due to this extra vacuum c. may be some > >> >> performance advantage as it will avoid the delay in cleaning dead > >> >> tuples > >> >> > >> >> I think better way to test the patch is to see how much benefit > is > >> >> there due to above (a and b points) advantages. Different values > >> >> of autovacuum_vacuum_threshold can be used to test. > >> > > >> > > >> > The performance effect of the patch is not much visible as I think > >> the > >> > analyze on the table estimates the number of dead tuples of the > >> > table > >> with some estimation. > >> > >> Yes, that seems to be the reason why you are not seeing any > >> performance benefit, but still I think this is useful optimization > to > >> do, as > >> analyze updates both the livetuples and dead tuples and similarly > >> vacuum should also update both the counts. Do you see any reason > >> why Vacuum should only update live tuples and not deadtuples? > > > > As vacuum touches all the pages where the dead tuples are present. > > This is not the Same with analyzer. Because of this reason, the > analyzer estimates the dead tuples also. > > With the proposed patch the vacuum also estimates the dead tuples. > > Few questions about your latest patch: > a. Is there any reason why you are doing estimation of dead tuples only > for Autovacuum and not for Vacuum. No, changed. > /* clear and get the new stats for calculating proper dead tuples */ > pgstat_clear_snapshot(); tabentry = > pgstat_fetch_stat_tabentry(RelationGetRelid(onerel)); > b. In the above code, to get latest data you are first clearing > snapshot and then calling pgstat function. It will inturn perform I/O > (read of stats file) and send/receive message from stats collector > to ensure it can read latest data. I think it will add overhead > to Vacuum, especially if 'nkeep' calculated in function > lazy_scan_heap() can serve the purpose. In my simple test[1], I > observed > that value of keep can serve the purpose. > > Can you please once try the test on 'nkeep' approach patch. Using the nkeep and snapshot approach, I ran the test for 40 mins with a high analyze_threshold and results are below. Auto vacuum count Bloat size Master 11 220MB Patched_nkeep 14 215MB Patched_snapshot 18 198MB Both the approaches are showing good improvement in the test. Updated patches, test script and configuration is attached in the mail. Regards, Hari babu.
Attachment
pgsql-hackers by date: