Thread: RE: Actually it's a bufmgr issue (was Re: Another pg_listener issue)
> -----Original Message----- > From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On > Behalf Of Tom Lane [snip] > > regression=# vacuum foo; > NOTICE: FlushRelationBuffers(foo, 0): block 0 is dirty (private > 1, global 1) > VACUUM > > This is being caused by the buffer manager changes I recently made > to avoid doing unnecessary writes at commit. After the DELETE, foo's > buffer is written to disk, but at that point its (single) tuple is > marked with an uncommitted xmax transaction. The SELECT verifies the > tuple is dead and changes its on-row status to xmax-committed, but > *that change is not forced to disk* (the rationale being that even if > we crashed without writing the change, it could be done again later). > Now VACUUM comes along, finds no live tuples, and decides to truncate > the relation to zero blocks. During the truncation, > FlushRelationBuffers sees that the buffer it's flushing is still marked > dirty, and hence emits the above notice. > This means vacuum doesn't necessarily flush all dirty buffers of the target table. Doesn't this break the assumption of pg_upgrade ? > > Comments anyone? > Some changes around bufmgr seems to be needed. 1) Provide a function which could flush all dirty buffers and vacuum calls the function for example. or 2) SetBufferCommitInfoNeedsSave() sets SharedBuffer Changed and BufferDirtiedByMe. and/or 3) Flush dirty buffers even in case of abort. Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >> Now VACUUM comes along, finds no live tuples, and decides to truncate >> the relation to zero blocks. During the truncation, >> FlushRelationBuffers sees that the buffer it's flushing is still marked >> dirty, and hence emits the above notice. > This means vacuum doesn't necessarily flush all dirty buffers of > the target table. Doesn't this break the assumption of pg_upgrade ? No, because it does still flush the buffer. It's only emitting a warning, because it thinks this condition suggests a bug in VACUUM. But with the way bufmgr behaves now, the condition is actually fairly normal, and so the warning is no longer of any value. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > >> Now VACUUM comes along, finds no live tuples, and decides to truncate > >> the relation to zero blocks. During the truncation, > >> FlushRelationBuffers sees that the buffer it's flushing is still marked > >> dirty, and hence emits the above notice. > > > This means vacuum doesn't necessarily flush all dirty buffers of > > the target table. Doesn't this break the assumption of pg_upgrade ? > > No, because it does still flush the buffer. Yes FlushRelationBuffers notices and flushes dirty buffers >= the specified block. But doesn't it notice dirty buffers < the specified block ? Or does vacuum flush all pages < the specified block while processing ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >>>> This means vacuum doesn't necessarily flush all dirty buffers of >>>> the target table. Doesn't this break the assumption of pg_upgrade ? >> >> No, because it does still flush the buffer. > Yes FlushRelationBuffers notices and flushes dirty buffers >= > the specified block. But doesn't it notice dirty buffers < the > specified block ? Or does vacuum flush all pages < the > specified block while processing ? Oh, I see what you mean: even after a VACUUM, it's possible for there to be unwritten dirty buffers for a relation, containing either changes entered by an aborted transaction, or updates of on-row status values made by non-VACUUM transactions. Hmm. It's barely possible that the second case could break pg_upgrade, if something before VACUUM had updated the on-row status values in a page and then VACUUM itself had no reason to dirty the page. If those status values never get written then pg_upgrade fails. Maybe it would be a good idea for VACUUM to force out all dirty pages for the relation even when theu're only dirty because of on-row status updates. Normally we wouldn't care about risking losing those updates, but for pg_upgrade we would. I was about to change FlushRelationBuffers anyway to get rid of the bogus warning message. What I propose to do is give it two behaviors: (1) write out dirty buffers at or beyond the specified block, but don't remove buffers from pool; or (2) remove buffers at or beyond the specified block from pool, after writing them out if dirty. VACUUM should apply case (2) beginning at the proposed truncation point, and then apply case (1) starting at block 0. Sound good? regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] [snip] > Maybe it would be a good idea for VACUUM to force out all dirty > pages for the relation even when theu're only dirty because of > on-row status updates. Normally we wouldn't care about risking > losing those updates, but for pg_upgrade we would. > > I was about to change FlushRelationBuffers anyway to get rid of > the bogus warning message. What I propose to do is give it two > behaviors: > (1) write out dirty buffers at or beyond the specified block, > but don't remove buffers from pool; or > (2) remove buffers at or beyond the specified block from pool, > after writing them out if dirty. > > VACUUM should apply case (2) beginning at the proposed truncation point, > and then apply case (1) starting at block 0. > > Sound good? > Agreed. Regards. Hiroshi Inoue Inoue@tpf.co.jp
>> I was about to change FlushRelationBuffers anyway to get rid of >> the bogus warning message. What I propose to do is give it two >> behaviors: >> (1) write out dirty buffers at or beyond the specified block, >> but don't remove buffers from pool; or >> (2) remove buffers at or beyond the specified block from pool, >> after writing them out if dirty. >> >> VACUUM should apply case (2) beginning at the proposed truncation point, >> and then apply case (1) starting at block 0. >> >> Sound good? > Agreed. OK, I've committed a fix for this. After looking at the uses of FlushRelationBuffers, I gave it just one behavior: flush *all* dirty buffers of the relation, and remove from the cache those that are at or beyond the specified block number. This allows VACUUM's needs to be met in one buffer-cache scan instead of two. I also cleaned up ReleaseRelationBuffers, which should have but did not remove the relation's buffers from the cache. This left us with "valid" buffers for a deleted relation after any DROP TABLE. No known bug there, but clearly trouble waiting to happen. Likewise for DropBuffers (same thing for a whole database). Finally, the "removal" of the deleted buffers in these routines consisted of calling BufTableDelete(), which removes the buffer from the shared-buffer hashtable, so it will not be found by a lookup for a specific block --- but the various routines that scan the whole shared-buffer array would still think the buffer belongs to its former relation! That can't be good either, so I made BufTableDelete() clear the tag field for the buffer. regards, tom lane