Re: Heavily modified big table bloat even in auto vacuum is running - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Heavily modified big table bloat even in auto vacuum is running |
Date | |
Msg-id | CAA4eK1Kh1OXrGo20Na=3ZiRrui7KfPseXBxJ67pAmH6CueVBng@mail.gmail.com Whole thread Raw |
In response to | Re: Heavily modified big table bloat even in auto vacuum is running (Haribabu kommi <haribabu.kommi@huawei.com>) |
Responses |
Re: Heavily modified big table bloat even in auto vacuum
is running
|
List | pgsql-hackers |
On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi <haribabu.kommi@huawei.com> wrote: > On 25 November 2013 10:43 Amit Kapila wrote: >> On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi >> <haribabu.kommi@huawei.com> wrote: >> > On 19 November 2013 10:33 Amit Kapila wrote: >> >> If I understood correctly, then your patch's main intention is to >> >> correct the estimate of dead tuples, so that it can lead to Vacuum >> >> cleaning the table/index which otherwise is not happening as per >> >> configuration value (autovacuum_vacuum_threshold) in some of the >> >> cases, also it is not reducing the complete bloat (Unpatched - >> 1532MB >> >> ~Patched - 1474MB), as the main reason of bloat is extra space in >> >> index which can be reclaimed by reindex operation. >> >> >> >> So if above is correct then this patch has 3 advantages: >> >> a. Extra Vacuum on table/index due to better estimation of dead >> tuples. >> >> b. Space reclaim due to this extra vacuum c. may be some performance >> >> advantage as it will avoid the delay in cleaning dead tuples >> >> >> >> I think better way to test the patch is to see how much benefit is >> >> there due to above (a and b points) advantages. Different values of >> >> autovacuum_vacuum_threshold can be used to test. >> > >> > >> > The performance effect of the patch is not much visible as I think >> the >> > analyze on the table estimates the number of dead tuples of the table >> with some estimation. >> >> Yes, that seems to be the reason why you are not seeing any >> performance benefit, but still I think this is useful optimization to >> do, as >> analyze updates both the livetuples and dead tuples and similarly >> vacuum should also update both the counts. Do you see any reason >> why Vacuum should only update live tuples and not deadtuples? > > As vacuum touches all the pages where the dead tuples are present. This is not the > Same with analyzer. Because of this reason, the analyzer estimates the dead tuples also. > With the proposed patch the vacuum also estimates the dead tuples. Few questions about your latest patch: a. Is there any reason why you are doing estimation of dead tuples only for Autovacuum and not for Vacuum. /* clear and get the new stats for calculating proper dead tuples */ pgstat_clear_snapshot(); tabentry = pgstat_fetch_stat_tabentry(RelationGetRelid(onerel)); b. In the above code, to get latest data you are first clearing snapshot and then calling pgstat function. It will inturn perform I/O (read of stats file) and send/receive message fromstats collector to ensure it can read latest data. I think it will add overhead to Vacuum, especially if 'nkeep' calculated in function lazy_scan_heap() can serve the purpose. In my simple test[1], I observed that value of keep can serve the purpose. Can you please once try the test on 'nkeep' approach patch. >> > Because of this reason not much performance improvement is not >> visible >> > as the missed dead tuple calculation in vacuum is covered by the >> analyze. >> >> Yeah, so might be we can check once by configuring >> analyze_threshold/scalefactor in a way that analyze doesn't get trigger >> during your test. > > I ran the test for one hour with a high analyze_threshold and results are below. > > Auto vacuum count Bloat size > Master 15 155MB > Patched 23 134MB > > Updated test script and configuration is attached in the mail. I just had a brief look on your test, please check if you can simplify your script file and make the test results to come in 15~20 mins. Don't put too much effort on it, if you can do it easily then it is okay. [1] Simple test case to verify the value of dead tuples: Session-1 ----------------- a. Create table t1(c1 int); b. insert into t1 values(generate_series(1,1000)); c. delete from t1; d. Vacuum t1; -- here I stopped in debugger, after fetching dead tuple count first time (line 235, vacuumlazy.c, after applying your patch) as per your code (modified a bit so that I can get the value for Vacuum) Session-2 ----------------- a. insert into t1 values (generate_series(1000,1500)); b. delete from t1; Session -1 ----------------- b. Verified the value of nkeep in lazy_scan_heap(), it is 501 which is what we expect. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: