Thread: Incorrect XLogRegisterBuffer flag for revmapbuf in brin
Hi all, In brin_doupdate(line 290), REGBUF_STANDARD is used to register revmap buffer reference in WAL record. But, revmap buffer page doesn't have a standard page layout and it doesn't update pd_upper and pd_lower as well. Either we should register revmapbuf as following: XLogRegisterBuffer(1, revmapbuf, 0); Or, we can update the pd_upper and pd_lower in brin_page_init() as follows: if (BRIN_IS_REVMAP_PAGE(page)) p->pd_lower = p->upper. To fix this, I've attached a small patch which follows the first approach. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
Attachment
On Wed, Nov 9, 2016 at 9:33 PM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > In brin_doupdate(line 290), REGBUF_STANDARD is used to register > revmap buffer reference in WAL record. But, revmap buffer page doesn't > have a standard page layout and it doesn't update pd_upper and > pd_lower as well. > > Either we should register revmapbuf as following: > XLogRegisterBuffer(1, revmapbuf, 0); As this is not a standard buffer, let's do it this way. This issue has been captured by the WAL consistency patch. -- Michael
Added to next commitfest for tracking. I should've done it earlier. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
Kuntal Ghosh wrote: > Added to next commitfest for tracking. I should've done it earlier. Yeah -- I hadn't noticed this thread until last week. I intend to apply this very soon. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Kuntal Ghosh wrote: > Hi all, > > In brin_doupdate(line 290), REGBUF_STANDARD is used to register > revmap buffer reference in WAL record. But, revmap buffer page doesn't > have a standard page layout and it doesn't update pd_upper and > pd_lower as well. Hmm. This bug should be causing WAL replay to zero out the revmap page contents, since essentially the whole page is covered by the "hole" in standard pages. I can't see what is causing that not to happen, but evidently it isn't, since the index works in a replica. What am I missing? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote: > Kuntal Ghosh wrote: > > Hi all, > > > > In brin_doupdate(line 290), REGBUF_STANDARD is used to register > > revmap buffer reference in WAL record. But, revmap buffer page doesn't > > have a standard page layout and it doesn't update pd_upper and > > pd_lower as well. > > Hmm. This bug should be causing WAL replay to zero out the revmap page > contents, since essentially the whole page is covered by the "hole" in > standard pages. I can't see what is causing that not to happen, but > evidently it isn't, since the index works in a replica. What am I > missing? Ah, I figured it out, and I can reproduce that this bug loses the BRIN data, effectively corrupting the index -- but AFAICS user queries would not return corrupted results, because since all the revmap entries become zeroes, BRIN interprets this as "the range is not summarized" and so all ranges become lossy for the bitmap scan. Also, the bug is very low probability. You need to cause an UPDATE xlog record, which only occurs when a range gets a new index tuple that doesn't fit in the existing index page (so the index tuple is wider than the original -- something that doesn't happen with fixed-width datatypes). And you need to have a backup block for the revmap page -- IOW that revmap page update needs to be the first after a checkpoint (and not be running full_page_writes=off). If you examine the revmap in a replica after running the script below, you'd observe it's different than the one in the master. I confirmed that the proposed patch fixes the problem. DROP TABLE IF EXISTS brin_iso; CREATE TABLE brin_iso (value text) WITH (fillfactor = 10); CREATE INDEX brinidx ON brin_iso USING brin (value) WITH (pages_per_range=1); DO $$ DECLARE curtid tid; BEGIN LOOP INSERT INTO brin_iso VALUES ('a'); PERFORM brin_summarize_new_values('brinidx'); SELECT max(pages) INTO curtid FROM brin_revmap_data(get_raw_page('brinidx', 1)) WHERE pages <> '(0,0)'; EXIT WHEN curtid > tid '(3, 0)'; END LOOP; END; $$ ; DELETE FROM brin_iso WHERE ctid < '(0,99)'; VACUUM brin_iso ; CHECKPOINT; INSERT INTO brin_iso VALUES (repeat('xyzxxz', 24)); (I'm not sure if it's possible that revmap page 1 can be filled before the first regular page is full; if that happens, this will loop forever.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote: > If you examine the revmap in a replica after running the script below, > you'd observe it's different than the one in the master. I confirmed > that the proposed patch fixes the problem. Pushed. Thanks for the report! -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services