Home > mailing lists

Re: WAL logging problem in 9.4.3? - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: WAL logging problem in 9.4.3?
Date	July 10, 2015 10:39:04
Msg-id	559FA0BA.3080808@iki.fi Whole thread Raw
In response to	Re: WAL logging problem in 9.4.3? (Andres Freund <andres@anarazel.de>)
Responses	Re: WAL logging problem in 9.4.3?
List	pgsql-hackers

Tree view

On 07/10/2015 12:14 PM, Andres Freund wrote:
> On 2015-07-10 11:50:33 +0300, Heikki Linnakangas wrote:
>> On 07/10/2015 02:06 AM, Tom Lane wrote:
>>> cab9a0656c36739f was based on an actual user complaint, so we have good
>>> evidence that there are people out there who care about the cost of
>>> truncating a table many times in one transaction.
>>
>> Yeah, if we specifically made that case cheap, in response to a complaint,
>> it would be a regression to make it expensive again. We might get away with
>> it in a major version, but would hate to backpatch that.
>
> Sure. But making COPY slower would also be one. Of a longer standing
> behaviour, with massively bigger impact if somebody relies on it? I mean
> a new relfilenode includes a couple heap and storage options. Missing
> the skip wal optimization can easily double or triple COPY durations.

Completely disabling the skip-WAL optimization is not acceptable either, 
IMO. It's a false dichotomy that we have to choose between those two 
options. We'll have to consider the exact scenarios where we'd have to 
disable the optimization vs. using a new relfilenode.

>>>> My tentative guess is that the best course is to
>>>> a) Make heap_truncate_one_rel() create a new relfeilnode. That fixes the
>>>>     truncation replay issue.
>>>> b) Force new pages to be used when using the heap_sync mode in
>>>>     COPY. That avoids the INIT danger you found. It seems rather
>>>>     reasonable to avoid using pages that have already been the target of
>>>>     WAL logging here in general.
>>>
>>> And what reason is there to think that this would fix all the problems?
>>> We know of those two, but we've not exactly looked hard for other cases.
>>
>> Hmm. Perhaps that could be made to work, but it feels pretty fragile.
>
> It does. I'm not very happy about this mess.
>
>> For
>> example, you could have an insert trigger on the table that inserts
>> additional rows to the same table, and those inserts would be intermixed
>> with the rows inserted by COPY.
>
> That should be fine? As long as copy only uses new pages INSERT can use
> the same ones without problem. I think...
>
>> Full-page images in general are a problem.
>
> With the above rules I don't think it'd be. They'd contain the previous
> contents, and we'll not target them again with COPY.

Well, you really have to ensure that COPY never uses a page that any 
other operation (INSERT, DELETE, UPDATE, hint-bit-update) has ever 
touched and created a FPW for. The naive approach, where you just reset 
the target block at beginning of COPY and use the HEAP_INSERT_SKIP_FSM 
option is not enough. It's possible, but requires a lot more bookkeeping 
than might seem at first glance.

>> I think we should
>> 1. reliably and explicitly keep track of whether we've WAL-logged any
>> TRUNCATE, INSERT/UPDATE+INIT, or any other full-page-logging operations on
>> the relation, and
>> 2. make sure we never skip WAL-logging again if we have.
>>
>> Let's add a flag, rd_skip_wal_safe, to RelationData that's initially set
>> when a new relfilenode is created, i.e. whenever rd_createSubid or
>> rd_newRelfilenodeSubid is set. Whenever a TRUNCATE or a full-page image
>> (including INSERT/UPDATE+INIT) is WAL-logged, clear the flag. In copy.c,
>> only skip WAL-logging if the flag is still set. To deal with the case that
>> the flag gets cleared in the middle of COPY, also check the flag whenever
>> we're about to skip WAL-logging in heap_insert, and if it's been cleared,
>> ignore the HEAP_INSERT_SKIP_WAL option and WAL-log anyway.
>
> Am I missing something or will this break the BEGIN; TRUNCATE; COPY;
> pattern we use ourselves and have suggested a number of times ?

Sorry, I was imprecise above. I meant "whenever an XLOG_SMGR_TRUNCATE 
record is WAL-logged", rather than a "whenever a TRUNCATE [command] is 
WAL-logged". TRUNCATE on a table that wasn't created in the same 
transaction doesn't emit an XLOG_SMGR_TRUNCATE record, because it 
creates a whole new relfilenode. So that's OK.

In the long-term, I'd like to refactor this whole thing so that we never 
WAL-log any operations on a relation that's created in the same 
transaction (when wal_level=minimal). Instead, at COMMIT, we'd fsync() 
the relation, or if it's smaller than some threshold, WAL-log the 
contents of the whole file at that point. That would move all that 
more-difficult-than-it-seems-at-first-glance logic from COPY and 
indexam's to a central location, and it would allow the same 
optimization for all operations, not just COPY. But that probably isn't 
feasible to backpatch.

- Heikki

pgsql-hackers by date:

From: Andres Freund
Date: 10 July 2015, 10:29:16
Subject: Re: WAL logging problem in 9.4.3?

From: Andres Freund
Date: 10 July 2015, 10:50:53
Subject: Re: Re: Removing SSL renegotiation (Was: Should we back-patch SSL renegotiation fixes?)

Re: WAL logging problem in 9.4.3? - Mailing list pgsql-hackers

Previous

Next