Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) - Mailing list pgsql-bugs
From | Thomas Munro |
---|---|
Subject | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |
Date | |
Msg-id | CAEepm=2W9LyVy5ASPtH7cLnKsFnw5KGo3vRyteLXYhe=C+6ywQ@mail.gmail.com Whole thread Raw |
In response to | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Re: BUG #12990: Missing pg_multixact/members files
(appears to have wrapped, then truncated)
Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |
List | pgsql-bugs |
On Wed, Apr 29, 2015 at 11:41 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > 1. It seems that you are using > autovacuum_multixact_freeze_max_age_adjusted() > only at couple of places, like it is not used in below calc: > > vacuum_set_xid_limits() > { > .. > mxid_freezemin = Min(mxid_freezemin, > autovacuum_multixact_freeze_max_age / 2); > .. > } > > What is the reason of using this calculation at some places and > not at other places? You're right, I will review the other places where that variable is used and come back on this point. > 2. > @@ -2684,8 +2719,8 @@ relation_needs_vacanalyze(Oid relid, > : autovacuum_freeze_max_age; > > multixact_freeze_max_age = (relopts && relopts->multixact_freeze_max_age >>= 0) > - ? Min(relopts->multixact_freeze_max_age, > autovacuum_multixact_freeze_max_age) > - : autovacuum_multixact_freeze_max_age; > + ? Min(relopts->multixact_freeze_max_age, > autovacuum_multixact_freeze_max_age_adjusted()) > + : autovacuum_multixact_freeze_max_age_adjusted(); > > > It seems that it will try to read from offset file for each > relation which might or might not be good, shall we try to > cache the oldestMultiXactMemberOffset? True. I will look into this. > 3. currently there is some minimum limit of autovacuum_multixact_freeze_age > (10000000) > which might not be honored by this calculation, so not sure if that can > impact the > system performance in some cases where it is currently working sane. The reason why we need to be able to set the effective freeze age below that minimum in cases of high member data consumption rates is that you could hit the new member space wraparound prevention error before you consume anywhere near that many multixact IDs. That minimum may well be entirely reasonable if the only thing you're worried about is multixact ID wraparound prevention. For example, my test program eats an average of 250 members per multixact ID when run with 500 sessions (each loop creates 500 multixact IDs having 1, 2, 3, ..., 500 members). At that rate, you'll run out of addressable member space after 2^32 / 250 = 17,179,869 multixact IDs. To prevent an error condition using only the existing multixact ID wraparound prevention machinery, we need to have an effective max table age (so that autovacuum wakes up and scans all tables) and min freeze age (so that it actually freezes the tuples) below that number. So we have to ignore the GUC minimum in this situation. > 4. Can you please share results that can show improvement > with current patch versus un-patched master? Here is my test procedure: initdb, default settings except max_connections = 600 and shared_buffers = 1280MB, "CREATE TABLE foo AS SELECT 42 AS id;", start monitor.sh (attached) and then run explode_mxact_members.c (attached to my first message in this thread) with arguments 500 100000. Here are the edited highlights of the result for the patched server (the complete results are attached): time segments usage_fraction usage_kb oldest_mxid next_mxid next_offset 10:37:01 1 0 16 1 1 0 10:38:01 656 .0079 168368 1 1 0 10:39:01 1426 .0173 366112 1 190619 47845119 10:40:01 2201 .0268 565080 1 384730 96566980 10:41:01 2979 .0363 764760 1 578342 145163592 ... 11:58:57 60455 .7368 15529256 1 12601248 3162912998 11:59:59 61184 .7457 15717760 1 12601248 3162912998 12:01:01 61918 .7547 15906288 1 12794860 3211509610 12:02:03 2178 .0265 559360 12610230 12988472 3260106222 <-- oldest_mxid moved 12:03:03 2936 .0357 753936 12610230 13182583 3308828083 12:04:03 3702 .0451 950704 12610230 13376694 3357549944 ... 13:18:23 58236 .7098 14956680 12610230 24623156 1885444610 13:19:23 58965 .7187 15143520 12610230 24817267 1934166471 13:20:24 59690 .7275 15329864 12610230 25011378 1982888332 13:21:24 59709 .7278 15334664 12758932 25204990 2031484944 <-- oldest_mxid moved 13:22:25 60447 .7367 15524464 12758932 25204990 2031484944 13:23:25 60836 .7415 15624192 12833782 25399101 2080206805 <-- oldest_mxid moved 13:24:26 25068 .3055 6438312 20447005 25592713 2128803417 13:25:26 25826 .3147 6632888 20447005 25786824 2177525278 13:26:26 26583 .3240 6827456 20447005 25980935 2226247139 ... 14:10:42 59115 .7205 15182368 20447005 32767834 3929758788 14:11:42 59836 .7293 15367912 20447005 32767834 3929758788 14:12:43 60571 .7383 15556680 20447005 32961446 3978355400 14:13:43 38557 .4699 9902472 25189521 33155557 4027077261 <-- oldest_mxid moved 14:14:44 39267 .4786 10084408 25189521 33349169 4075673873 14:15:44 39975 .4872 10266816 25189521 33349169 4075673873 ... 14:43:56 59805 .7289 15359072 25189521 37615619 851585527 14:44:56 60492 .7373 15536024 25189521 37615619 851585527 14:45:57 60468 .7370 15529536 25342215 37809231 900182139 14:46:58 48309 .5888 12406904 28023821 38003342 948904000 <-- oldest_mxid moved 14:47:58 49010 .5973 12586784 28023821 38196954 997500612 14:48:58 49716 .6059 12768152 28023821 38391065 1046222473 ... 15:02:04 58882 .7177 15122600 28023821 40136068 1484218226 15:03:04 59588 .7263 15303856 28023821 40330179 1532940087 15:04:05 60293 .7349 15484592 28023821 40523791 1581536699 15:05:06 24657 .3005 6332800 35601635 40717902 1630258560 <-- oldest_mxid moved 15:06:06 25397 .3095 6522664 35601635 40717902 1630258560 15:07:06 26140 .3186 6713296 35601635 40912013 1678980421 ... 15:52:22 58685 .7153 15071912 35601635 47698912 3382492070 15:53:23 59403 .7240 15255936 35601635 47892524 3431088682 15:54:23 60118 .7327 15439648 35601635 48086635 3479810543 <-- oldest_mxid moved 15:55:24 50415 .6145 12947712 37776297 48280247 3528407155 15:56:24 51144 .6234 13135568 37776297 48280247 3528407155 15:57:25 51869 .6322 13321872 37776297 48474358 3577129016 ... Observations: 1. Sometimes the values don't change from minute to minute, presumably because there hasn't been a checkpoint to update pg_controldata on disk, but hopefully we can still see what's going on here despite the slight lag in the data. 2. We get to somewhere in the 73-75% SLRU used range before wraparound vacuums are triggered. We probably need to spread things out more that that. 3. When the autovacuum runs, it advances oldest_mxid by different amounts each time; that's because I'm using the adjusted freeze max age (the max age of a table before it gets a wraparound vacuum) as our freeze min age (the max age for individual tuples before they're frozen) here: @@ -1931,7 +1964,9 @@ do_autovacuum(void) { default_freeze_min_age = vacuum_freeze_min_age; default_freeze_table_age = vacuum_freeze_table_age; - default_multixact_freeze_min_age = vacuum_multixact_freeze_min_age; + default_multixact_freeze_min_age = + Min(vacuum_multixact_freeze_min_age, + autovacuum_multixact_freeze_max_age_adjusted()); default_multixact_freeze_table_age = vacuum_multixact_freeze_table_age; } Without that change, autovacuum would trigger repeatedly as we got near 75% SLRU usage but not freeze anything, because default_multixact_freeze_min_age was higher than the age of any tuples (which had only made it to an age of around ~12 million; actually it's not exactly the tuple age per se... I don't fully understand the treatment of locker and updater multixact IDs in the vacuum code, HeapTupleSatisfiesVacuum and heap_freeze_tuple etc yet so I'm not sure exactly how that value translates into vacuum work, but I can see experimentally that a low multixact freeze min age is needed to get relminxmid moved forward). It's good that freeze table age ramps down so that the autovacuum launcher trigger point jumps around a bit and we spread the autovacuum launches over time, but it's not great that we finish up truncating different amounts of multixacts and associated SLRU each time. We could instead use a freeze min age of 0 to force freezing of *all* tuples if this is a member-space-wraparound-prevention vacuum (that is, if autovacuum_multixact_freeze_max_age != autovacuum_multixact_freeze_max_age_adjusted()). IIUC that'd be like running VACUUM FREEZE and would space out our wraparound vacuums as far apart as possible since it would trims SLRU space as much possible when a wraparound vacuum runs, but it would presumably generate more writes. Maybe it doesn't matter that we drop different amounts of multixact history each time, as long as the long term behaviour works. Thoughts? There is less to say about the results with an unpatched server: it drives in a straight line for a while, and then crashes into a wall (ie the new error preventing member wraparound), which I see you have also reproduced. It's used up all of the circular member space, but only has around 17 million multixacts so autovacuum can't help you (it's not even possible to set autovacuum_multixact_freeze_max_age below 100 million), so to get things moving again you need to manually VACUUM FREEZE all databases including template databases. > 5. > + /* > + * TODO: In future, could oldestMultiXactMemberOffset be stored in shmem, > + * > pg_controdata, alongside oldestMultiXactId? > + */ > > You might want to write the comment as: > XXX: We can store oldestMultiXactMemberOffset in shmem, pg_controldata > alongside oldestMultiXactId? Thanks, done. > 6. > + * Returns vacuum_multixact_freeze_max_age, adjusted down to prevent > excessive use > + * of addressable > multixact member space if required. > > I think here you mean autovacuum_multixact_freeze_max_age? Thanks, done. FWIW, in some future release, I think we should consider getting a bigger multixact member address space that wraps around at 2^48 or 2^64 instead of 2^32, so that we can sidestep the whole business and go back to having just xid and mxid wraparounds to worry about. pg_multixact/offsets would be 50% or 100% bigger (an extra byte or two per multixact), but it's not very big. pg_multiact/members would be no bigger for any workload that currently works without hitting the wraparound error, but could grow bigger if needed. -- Thomas Munro http://www.enterprisedb.com
Attachment
pgsql-bugs by date: