Home > mailing lists

Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Performance Improvement by reducing WAL for Update Operation
Date	February 5, 2014 11:43:41
Msg-id	52F223E0.6030306@vmware.com Whole thread Raw
In response to	Re: Performance Improvement by reducing WAL for Update Operation (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Performance Improvement by reducing WAL for Update Operation Re: Performance Improvement by reducing WAL for Update Operation
List	pgsql-hackers

Tree view

On 02/05/2014 07:54 AM, Amit Kapila wrote:
> On Tue, Feb 4, 2014 at 11:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Feb 4, 2014 at 12:39 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> Now there is approximately 1.4~5% CPU gain for
>>> "hundred tiny fields, half nulled" case
>
>> Assuming that the logic isn't buggy, a point in need of further study,
>> I'm starting to feel like we want to have this.  And I might even be
>> tempted to remove the table-level off switch.
>
> I have tried to stress on worst case more, as you are thinking to
> remove table-level switch and found that even if we increase the
> data by approx. 8 times ("ten long fields, all changed", each field contains
> 80 byte data), the CPU overhead is still < 5% which clearly shows that
> the overhead doesn't increase much even if the length of unmatched data
> is increased by much larger factor.
> So the data for worst case adds more weight to your statement
> ("remove table-level switch"), however there is no harm in keeping
> table-level option with default as 'true' and if some users are really sure
> the updates in their system will have nothing in common, then they can
> make this new option as 'false'.
>
> Below is data for the new case " ten long fields, all changed" added
> in attached script file:

That's not the worst case, by far.

First, note that the skipping while scanning new tuple is only performed
in the first loop. That means that as soon as you have a single match,
you fall back to hashing every byte. So for the worst case, put one
4-byte field as the first column, and don't update it.

Also, I suspect the runtimes in your test were dominated by I/O. When I
scale down the number of rows involved so that the whole test fits in
RAM, I get much bigger differences with and without the patch. You might
also want to turn off full_page_writes, to make the effect clear with
less data.

So, I came up with the attached worst case test, modified from your
latest test suite.

unpatched:

                testname               | wal_generated |     duration
--------------------------------------+---------------+------------------
  ten long fields, all but one changed |     343385312 | 2.20806908607483
  ten long fields, all but one changed |     336263592 | 2.18997097015381
  ten long fields, all but one changed |     336264504 | 2.17843413352966
(3 rows)

pgrb_delta_encoding_v8.patch:

                testname               | wal_generated |     duration
--------------------------------------+---------------+------------------
  ten long fields, all but one changed |     338356944 | 3.33501315116882
  ten long fields, all but one changed |     344059272 | 3.37364101409912
  ten long fields, all but one changed |     336257840 | 3.36244201660156
(3 rows)

So with this test, the overhead is very significant.

With the skipping logic, another kind of "worst case" case is that you
have a lot of similarity between the old and new tuple, but you miss it
because you skip. For example, if you change the first few columns, but
leave a large text column at the end of the tuple unchanged.

- Heikki

Attachment

wal-update-testsuite.sh

pgsql-hackers by date:

From: Alexander Korotkov
Date: 05 February 2014, 10:48:58
Subject: Re: Fix picksplit with nan values

From: Heikki Linnakangas
Date: 05 February 2014, 12:00:04
Subject: Re: Performance Improvement by reducing WAL for Update Operation

Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

Attachment

Previous

Next