Re: WAL logging problem in 9.4.3? - Mailing list pgsql-hackers
From | Martijn van Oosterhout |
---|---|
Subject | Re: WAL logging problem in 9.4.3? |
Date | |
Msg-id | 20150703195736.GB2841@svana.org Whole thread Raw |
In response to | Re: WAL logging problem in 9.4.3? (Andres Freund <andres@anarazel.de>) |
Responses |
Re: WAL logging problem in 9.4.3?
|
List | pgsql-hackers |
On Fri, Jul 03, 2015 at 07:21:21PM +0200, Andres Freund wrote: > On 2015-07-03 19:14:26 +0200, Martijn van Oosterhout wrote: > > Am I missing something. ISTM that if the truncate record was simply not > > logged at all everything would work fine. The whole point is that the > > table was created in this transaction and so if it exists the table on > > disk must be the correct representation. > > That'd not work either. Consider: > > BEGIN; > CREATE TABLE ... > INSERT; > TRUNCATE; > INSERT; > COMMIT; > > If you replay that without a truncation wal record the second INSERT > will try to add stuff to already occupied space. And they can have > different lengths and stuff, so you cannot just ignore that fact. I was about to disagree with you by suggesting that if the table was created in this transaction then WAL logging is skipped. But testing shows that inserts are indeed logged, as you point out. With inserts the WAL records look as follows (relfilenodes changed): martijn@martijn-jessie:~/git/ctm/docker$ sudo /usr/lib/postgresql/9.4/bin/pg_xlogdump -p /tmp/pgtest/postgres/pg_xlog/ 000000010000000000000001|grep -wE '16386|16384|16390' rmgr: Storage len (rec/tot): 16/ 48, tx: 0, lsn: 0/016A79C8, prev 0/016A79A0, bkp: 0000, desc: file create:base/12139/16384 rmgr: Sequence len (rec/tot): 158/ 190, tx: 683, lsn: 0/016B4258, prev 0/016B2508, bkp: 0000, desc: log: rel1663/12139/16384 rmgr: Storage len (rec/tot): 16/ 48, tx: 683, lsn: 0/016B4318, prev 0/016B4258, bkp: 0000, desc: file create:base/12139/16386 rmgr: Storage len (rec/tot): 16/ 48, tx: 683, lsn: 0/016B9468, prev 0/016B9418, bkp: 0000, desc: file create:base/12139/16390 rmgr: Sequence len (rec/tot): 158/ 190, tx: 683, lsn: 0/016BC938, prev 0/016BC880, bkp: 0000, desc: log: rel1663/12139/16384 rmgr: Sequence len (rec/tot): 158/ 190, tx: 683, lsn: 0/016BCAF0, prev 0/016BCAA0, bkp: 0000, desc: log: rel1663/12139/16384 rmgr: Heap len (rec/tot): 35/ 67, tx: 683, lsn: 0/016BCBB0, prev 0/016BCAF0, bkp: 0000, desc: insert(init):rel 1663/12139/16386; tid 0/1 rmgr: Btree len (rec/tot): 20/ 52, tx: 683, lsn: 0/016BCBF8, prev 0/016BCBB0, bkp: 0000, desc: newroot:rel 1663/12139/16390; root 1 lev 0 rmgr: Btree len (rec/tot): 34/ 66, tx: 683, lsn: 0/016BCC30, prev 0/016BCBF8, bkp: 0000, desc: insert:rel 1663/12139/16390; tid 1/1 rmgr: Storage len (rec/tot): 16/ 48, tx: 683, lsn: 0/016BCC78, prev 0/016BCC30, bkp: 0000, desc: file truncate:base/12139/16386 to 0 blocks rmgr: Storage len (rec/tot): 16/ 48, tx: 683, lsn: 0/016BCCA8, prev 0/016BCC78, bkp: 0000, desc: file truncate:base/12139/16390 to 0 blocks rmgr: Heap len (rec/tot): 35/ 67, tx: 683, lsn: 0/016BCCD8, prev 0/016BCCA8, bkp: 0000, desc: insert(init):rel 1663/12139/16386; tid 0/1 rmgr: Btree len (rec/tot): 20/ 52, tx: 683, lsn: 0/016BCD20, prev 0/016BCCD8, bkp: 0000, desc: newroot:rel 1663/12139/16390; root 1 lev 0 rmgr: Btree len (rec/tot): 34/ 66, tx: 683, lsn: 0/016BCD58, prev 0/016BCD20, bkp: 0000, desc: insert:rel 1663/12139/16390; tid 1/1 relname | relfilenode -------------+-------------test | 16386test_id_seq | 16384test_pkey | 16390 (3 rows) And amazingly, the database cluster successfuly recovers and there's no error now. So the problem is *only* because there is no data in the table at commit time. Which indicates that it's the 'newroot" record that saves the day normally. And it's apparently generated by the first insert. > Agreed. I think the problem is something else though. Namely that we > reuse the relfilenode for heap_truncate_one_rel(). That's just entirely > broken afaics. We need to allocate a new relfilenode and write stuff > into that. Then we can forgo WAL logging the truncation record. Would that properly initialise the index though? Anyway, this is way outside my expertise, so I'll bow out now. Let me know if I can be of more assistance. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
pgsql-hackers by date: