Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility - Mailing list pgsql-hackers

From Chapman Flack
Subject Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
Date
Msg-id 5A92F0C0.7070706@anastigmatix.net
Whole thread Raw
In response to Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
List pgsql-hackers
On 07/17/17 11:29, Michael Paquier wrote:
> On Thu, Jul 6, 2017 at 3:48 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> On 07/03/2017 06:30 PM, Chapman Flack wrote:
>>> Although it's moot in the straightforward approach of re-zeroing in
>>> the loop, it would still help my understanding of the system to know
>>> if there is some subtlety that would have broken what I proposed
>>> earlier, which was an extra flag to AdvanceXLInsertBuffer() that
>>> ...
>>
>> Yeah, I suppose that would work, too.
> 
> FWIW, I would rather see any optimization done in
> AdvanceXLInsertBuffer() instead of seeing a second memset re-zeroing
> the WAL page header after its data has been initialized by
> AdvanceXLInsertBuffer() once.

Here is a patch implementing the simpler approach Heikki suggested
(though my original leaning had been to wrench on AdvanceXLInsertBuffer
as Michael suggests). The sheer simplicity of the smaller change
eventually won me over, unless there's a strong objection.

As noted before, the cost of the extra small MemSet is proportional
to the number of unused pages in the segment, and that is an indication
of how busy the system *isn't*. I don't have a time benchmark of the
patch's impact; if I should, what would be a good methodology?

Before the change, what some common compression tools can achieve on
a mostly empty (select pg_switch_wal()) segment on my hardware are:

gzip  27052 in 0.145s
xz     5852 in 0.678s
lzip   5747 in 1.254s
bzip2  1445 in 0.261s

bzip2 is already the clear leader (I don't have lz4 handy to include in
the comparison) at around 1/20th size gzip can achieve, and that's before
this patch. After:

gzip 16393 in 0.143s
xz    2640 in 0.520s
lzip  2535 in 1.198s
bzip2  147 in 0.238s

The patch gives gzip almost an extra factor of two, and about the same
for xz and lzip, and bzip2 gains nearly another order of magnitude.

I considered adding a regression test for unfilled-segment compressibility,
but wasn't sure where it would most appropriately go. I assume a TAP test
would be the way to do it.

-Chap

Attachment

pgsql-hackers by date:

Previous
From: Amirouche Boubekki
Date:
Subject: neon: a functional database, git for structured data
Next
From: Lætitia Avrot
Date:
Subject: VACUUM FULL name is very confusing to some people (or to most nonexpert people)