Re: Failed to delete old ReorderBuffer spilled files - Mailing list pgsql-hackers
From | atorikoshi |
---|---|
Subject | Re: Failed to delete old ReorderBuffer spilled files |
Date | |
Msg-id | d5dddea9-4182-a7e4-f368-156f5470d244@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Failed to delete old ReorderBuffer spilled files (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Failed to delete old ReorderBuffer spilled files
Re: Failed to delete old ReorderBuffer spilled files |
List | pgsql-hackers |
Thanks for reviewing! On 2017/11/21 18:12, Masahiko Sawada wrote: > On Tue, Nov 21, 2017 at 3:48 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Mon, Nov 20, 2017 at 7:35 PM, atorikoshi >> <torikoshi_atsushi_z2@lab.ntt.co.jp> wrote: >>> Hi, >>> >>> I put many queries into one transaction and made ReorderBuffer spill >>> data to disk, and sent SIGKILL to postgres before the end of the >>> transaction. >>> >>> After starting up postgres again, I observed the files spilled to >>> data wasn't deleted. >>> >>> I think these files should be deleted because its transaction was no >>> more valid, so no one can use these files. >>> >>> >>> Below is a reproduction instructions. >>> >>> ------------------------------------------------ >>> 1. Create table and publication at publiser. >>> >>> @pub =# CREATE TABLE t1( >>> id INT PRIMARY KEY, >>> name TEXT); >>> >>> @pub =# CREATE PUBLICATION pub FOR TABLE t1; >>> >>> 2. Create table and subscription at subscriber. >>> >>> @sub =# CREATE TABLE t1( >>> id INT PRIMARY KEY, >>> name TEXT >>> ); >>> >>> @sub =# CREATE SUBSCRIPTION sub >>> CONNECTION 'host=[hostname] port=[port] dbname=[dbname]' >>> PUBLICATION pub; >>> >>> 3. Put many queries into one transaction. >>> >>> @pub =# BEGIN; >>> INSERT INTO t1 >>> SELECT >>> i, >>> 'aaaaaaaaaa' >>> FROM >>> generate_series(1, 1000000) as i; >>> >>> 4. Then we can see spilled files. >>> >>> @pub $ ls -1 ${PGDATA}/pg_replslot/sub/ >>> state >>> xid-561-lsn-0-1000000.snap >>> xid-561-lsn-0-2000000.snap >>> xid-561-lsn-0-3000000.snap >>> xid-561-lsn-0-4000000.snap >>> xid-561-lsn-0-5000000.snap >>> xid-561-lsn-0-6000000.snap >>> xid-561-lsn-0-7000000.snap >>> xid-561-lsn-0-8000000.snap >>> xid-561-lsn-0-9000000.snap >>> >>> 5. Kill publisher's postgres process before COMMIT. >>> >>> @pub $ kill -s SIGKILL [pid of postgres] >>> >>> 6. Start publisher's postgres process. >>> >>> @pub $ pg_ctl start -D ${PGDATA} >>> >>> 7. After a while, we can see the files remaining. >>> (Immediately after starting publiser, we can not see these files.) >>> >>> @pub $ pg_ctl start -D ${PGDATA} >>> >>> When I configured with '--enable-cassert', below assertion error >>> was appeared. >>> >>> TRAP: FailedAssertion("!(txn->final_lsn != 0)", File: "reorderbuffer.c", >>> Line: 2576) >>> ------------------------------------------------ >>> >>> Attached patch sets final_lsn to the last ReorderBufferChange if >>> final_lsn == 0. >> >> Thank you for the report. I could reproduce this issue with the above >> step. My analysis is, the cause of that a serialized reorder buffer >> isn't cleaned up is that the aborted transaction without an abort WAL >> record has no chance to set ReorderBufferTXN->final_lsn. So if there >> is such serialized transactions ReorderBufferRestoreCleanup cleanups >> no files, which is cause of the assertion failure (or a file being >> orphaned). What do you think? >> >> On detail of your patch, I'm not sure it's safe if we set the lsn of >> other than commit record or abort record to final_lsn. The comment in >> reorderbuffer.h says, >> >> typedef trcut ReorderBufferTXN >> { >> (snip) >> >> /* ---- >> * LSN of the record that lead to this xact to be committed or >> * aborted. This can be a >> * * plain commit record >> * * plain commit record, of a parent transaction >> * * prepared transaction commit >> * * plain abort record >> * * prepared transaction abort >> * * error during decoding >> * ---- >> */ >> XLogRecPtr final_lsn; >> >> But with your patch, we could set a lsn of a record that is other than >> what listed above to final_lsn. One way I came up with is to make I added some comments on final_lsn. >> ReorderBufferRestoreCleanup accept an invalid value of final_lsn and >> regards it as a aborted transaction that doesn't has a abort WAL >> record. So we can cleanup all serialized files if final_lsn of a >> transaction is invalid. > > After more thought, since there are some codes cosmetically setting > final_lsn when the fate of transaction is determined possibly we > should not accept a invalid value of final_lsn even in the case. > My new patch keeps setting final_lsn, but changed its location to the top of ReorderBufferCleanupTXN(). I think it's a kind of preparation, so doing it at the top improves readability. > Anyway I think you should register this patch to the next commit fest so as not forget. Thanks for you advice, I've registered this issue as a bug. Regards, -- Atsushi Torikoshi NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
pgsql-hackers by date: