Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Performance Improvement by reducing WAL for Update Operation |
Date | |
Msg-id | CAA4eK1Kyr-fGVwMK9W1p_h_k-k9m7Ci9bNf8uxRjLwLzppe5GA@mail.gmail.com Whole thread Raw |
In response to | Re: Performance Improvement by reducing WAL for Update Operation (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: Performance Improvement by reducing WAL for Update
Operation
|
List | pgsql-hackers |
On Wed, Feb 5, 2014 at 5:29 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > I'm pretty sure the overhead of that would be negligible, so we could always > enable it. There are certainly a lot of scenarios where prefix/suffix > detection alone wouldn't help, but so what. > > Attached is a quick patch for that, if you want to test it. I have updated the patch to correct few problems, addressed review comments by Andres and done some optimizations to improve CPU overhead for worst case. Let me first start with performance data for this patch. Performance Data ----------------------------- Non-Default settings autovacuum = off checkpoitnt_segments = 256 checkpoint_timeout =15min full_page_writes = off Unpatched testname | wal_generated | duration -----------------------------------------+---------------+------------------ one short and one long field, no change | 573502736 | 9.70863103866577 one short and one long field, no change | 573504920 | 10.1023359298706 one short and one long field, no change | 573498936 | 9.84194612503052 hundred tiny fields, all changed | 364891128 | 13.9618089199066 hundred tiny fields, all changed | 364888088 | 13.4061119556427 hundred tiny fields, all changed | 367753480 | 13.433109998703 hundred tiny fields, half changed | 364892928 | 13.5090639591217 hundred tiny fields, half changed | 364890384 | 13.5632100105286 hundred tiny fields, half changed | 364888136 | 13.6033401489258 hundred tiny fields, half nulled | 300702272 | 13.7366359233856 hundred tiny fields, half nulled | 300703656 | 14.5007920265198 hundred tiny fields, half nulled | 300705216 | 13.9954152107239 9 short and 1 long, short changed | 396987760 | 9.5885021686554 9 short and 1 long, short changed | 396988864 | 9.11789703369141 9 short and 1 long, short changed | 396985728 | 9.52586102485657 (15 rows) wal-update-prefix-suffix-encode-4.patch testname | wal_generated | duration -----------------------------------------+---------------+------------------ one short and one long field, no change | 156854192 | 6.74417304992676 one short and one long field, no change | 156279384 | 6.61455297470093 one short and one long field, no change | 156277824 | 6.84297394752502 hundred tiny fields, all changed | 364893056 | 13.9131588935852 hundred tiny fields, all changed | 364890912 | 13.1628270149231 hundred tiny fields, all changed | 364889424 | 13.7095680236816 hundred tiny fields, half changed | 364895592 | 13.6322529315948 hundred tiny fields, half changed | 365172160 | 14.0036828517914 hundred tiny fields, half changed | 364892400 | 13.5247440338135 hundred tiny fields, half nulled | 206833992 | 12.4429869651794 hundred tiny fields, half nulled | 208443760 | 12.1079058647156 hundred tiny fields, half nulled | 205858280 | 12.7899498939514 9 short and 1 long, short changed | 236516832 | 8.36392688751221 9 short and 1 long, short changed | 236515744 | 8.46648907661438 9 short and 1 long, short changed | 236518336 | 8.02749991416931 (15 rows) There is major WAL reduction and CPU gain for best and average cases and for cases where there is no WAL reduction (as updated tuple has different data), there is no CPU overhead. Test script (wal-update-testsuite.sh) to collect above data is attached. Now for the worst case where the tuple has same data till compression ratio, I have tried to keep the compression rate at 25 and 30%, and observed that there is quite minimal overhead at 30%. Performance data for same is as below: Case - 1 : Change some bytes just after 30% of tuple Unpatched testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 352055792 | 5.4274320602417 ten long fields, 8 bytes changed | 352050536 | 6.44699001312256 ten long fields, 8 bytes changed | 352057880 | 5.78993391990662 (3 rows) wal-update-prefix-suffix-encode-4.patch testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 281447616 | 5.79180097579956 ten long fields, 8 bytes changed | 281451096 | 5.63260507583618 ten long fields, 8 bytes changed | 281445728 | 5.56671595573425 (3 rows) Case - 2 : Change some bytes just before 30% of tuple Unpatched testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 350873408 | 6.44963002204895 ten long fields, 8 bytes changed | 348842888 | 6.33179092407227 ten long fields, 8 bytes changed | 348848488 | 6.66787099838257 (3 rows) wal-update-prefix-suffix-encode-4.patch testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 352660656 | 8.03470301628113 ten long fields, 8 bytes changed | 348843208 | 6.36861610412598 ten long fields, 8 bytes changed | 348844728 | 6.56955599784851 (3 rows) Keeping the compression rate at 30% for case when match is till ~29%, there is about ~2% CPU overhead (considering median data) and when there is a match till ~31%, there is a WAL reduction of 20% and no CPU overhead. Now if we keep compression rate at 25%, it will perform better when match till 24%, but will perform bad when match till 26%. I have attached separate scripts for both (25% & 30%) boundary tests (wal-update-testsuite-ten-long-8-bytes-changed-at-30-percent-boundary.sh & wal-update-testsuite-ten-long-8-bytes-changed-at-25-percent-boundary.sh). You can change value of PGDE_MIN_COMP_RATE in patch to run the test, current it is 30, if you want to run 25% boundary test, then change it to 25. Note - Performance data for worst case was fluctuating, so I have took 5 times and get the difference of data which occurred most number of times. About the main changes in this patch: 1. An un-necessary Tag was getting added to encoded tuple, even when there is no match for prefix/suffix. 2. Maximum Tag length was not considered, done changes to split the tag if the length is greater than 273 bytes (max tag length supported by format). 3. Check for whether prefix/suffix length has achieved compression ratio was wrong. Changed the code for same. 4. Even after we decide after prefix/suffix match has achieved compression ratio, at the end of encode function it was returning based on max size, which I think is not required as buffer has sufficient space and it was causing overhead for worst cases. If we just want to be extra careful, then we might want to have a check for max buffer size passed to encode function. 5. Change file names to pg_decompress.c/.h(de - delta encoding) Fixes for review comments by Andres > + > + if (RelationIsEnabledForWalCompression(reln) && > + (oldbuf == newbuf) && > + !XLogCheckBufferNeedsBackup(newbuf)) > + { > + uint32 enclen; >You should note that thew check for RelationIsEnabledForWalCompression() >here is racy and that that's ok because the worst that can happen is >that a uselessly generated delta. I think we might not even keep this switch, as performance data seems to be okay, but yes even if we keep it might not do any harm. > + if (compressed) > + xlrec.flags |= XLOG_HEAP_DELTA_ENCODED; > I think this also needs to unset XLOG_HEAP_CONTAINS_NEW_TUPLE and > conditional on !need_tuple_data. I could not understand this point, from above sentence it seems you want to unset XLOG_HEAP_CONTAINS_NEW_TUPLE when !need_tuple_data, but not sure, could you explain bit more on it. > +bool > +XLogCheckBufferNeedsBackup(Buffer buffer) > +{ > Should note very, very boldly that this can only be used in contexts > where a race is acceptable. Yes, this is racy, however in worst case it will do encoding when it is not required or won't do encoding when it can save WAL, but both cases doesn't do any harm. > + * > + * Copyright (c) 1999-2014, PostgreSQL Global Development Group > + * > + * src/backend/utils/adt/pg_rbcompress.c > + * ---------- > + */ > This needs significantly more explanations about the algorithm and the > reasoning behind it. Agreed. I have added more explanations and reasoning to choose algorithm. > +static const PGRB_Strategy strategy_default_data = { > + 32, /* Data chunks less than 32 bytes are not > + * compressed */ > + INT_MAX, /* No upper limit on what we'll try to > + * compress */ > + 35, /* Require 25% compression rate, or not worth > + * it */ > +}; > compression rate looks like it's mismatch between comment and code. Corrected. Now I have removed this strategy structure itself and instead do define for the values, as we don't require multiple strategies for this encoding. > +/* ---------- > + * pgrb_out_ctrl - > + * > + * Outputs the last and allocates a new control byte if needed. > + * ---------- > + */ > +#define pgrb_out_ctrl(__ctrlp,__ctrlb,__ctrl,__buf) \ > +do { \ > + if ((__ctrl & 0xff) == 0) \ > + { \ > + *(__ctrlp) = __ctrlb; \ > + __ctrlp = (__buf)++; \ > + __ctrlb = 0; \ > + __ctrl = 1; \ > + } \ > +} while (0) > + > double underscore variables are reserved for the compiler and os. These macro's are mostly same as pg_lz as we have not changed the format for encoded buffer. There are couple of other places like like.c and valid.h where double underscores are used in macro's. However I think there is no compelling need to use it and neither is recommended way for variables. I am not sure why this is used originally in pg_lzcompress.c, so for now lets keep it as it is, I will change these names if we decide to go with this version of patch. Right now the main decision is about performance data, If that is done, I will change it along with some other similar changes. > +#define pgrb_out_literal(_ctrlp,_ctrlb,_ctrl,_buf,_byte) \ > +do { \ > + pgrb_out_ctrl(_ctrlp,_ctrlb,_ctrl,_buf); \ > + *(_buf)++ = (unsigned char)(_byte); \ > + _ctrl <<= 1; \ > +} while (0) > What's the reason to use macros here? Just use inline functions when > dealing with file-local stuff. Again same reason as above. Basically copied from pg_lz as the encoding format is same. > + > + /* > + * Tuples of length greater than PGRB_HISTORY_SIZE are not allowed for > + * delta encode as this is the maximum size of history offset. > + * XXX: still true? > + */ > Why didn't you define a maximum tuple size in the strategy definition > above then? Now I have removed this strategy structure itself and instead do define for the values, as we don't require multiple strategies for this encoding. > + need_rate = strategy->min_comp_rate; > + if (need_rate < 0) > + need_rate = 0; > + else if (need_rate > 99) > + need_rate = 99; > Is there really need for all this stuff here? This is so specific to the > usecase that I have significant doubts that all the pglz boiler plate > makes much sense. Agreed. I have removed these extra checks. > + else > + { > + result_max = (slen * (100 - need_rate)) / 100; > + }*/ > err? Fixed. > +-- > +-- Test to update continuos and non continuos columns > +-- > *continuous Fixed. >> Previously we have tried to do at column boundaries, but the main problem >> turned out to be in worst cases where we spend time in extracting values >> from tuples based on column boundaries and later found that data is not >> compressible. > I think that hugely depends on how you implement it. I think you'd need > to have a loop traversing over the both tuples at the same time on the > level of heap_deform_tuple(). If you'd use the result to get rid of > HeapSatisfiesHOTandKeyUpdate() at the same time I am pretty sure you > wouldn't see very high overhead The case where it can have more overhead is, let us say you compress and later found that its not HOT update, then you have to go and log new tuple as it is which will waste cycles for doing compression. We have to always find whether it is HOT update or not, but we might choose to give up on tuple compression in between based on compression ratio in which case it might have overhead. It sounds that for best and average cases, this strategy might work even better than current methods tried, but we can't be sure about negative scenarios. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: