Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Performance Improvement by reducing WAL for Update Operation |
Date | |
Msg-id | 20140215145109.GC19470@alap3.anarazel.de Whole thread Raw |
In response to | Re: Performance Improvement by reducing WAL for Update Operation (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Performance Improvement by reducing WAL for Update Operation
|
List | pgsql-hackers |
Hi, Some quick review comments: On 2014-02-13 18:14:54 +0530, Amit Kapila wrote: > + /* > + * EWT can be generated for all new tuple versions created by Update > + * operation. Currently we do it when both the old and new tuple versions > + * are on same page, because during recovery if the page containing old > + * tuple is corrupt, it should not cascade that corruption to other pages. > + * Under the general assumption that for long runs most updates tend to > + * create new tuple version on same page, there should not be significant > + * impact on WAL reduction or performance. > + * > + * We should not generate EWT when we need to backup the whole block in > + * WAL as in that case there is no saving by reduced WAL size. > + */ > + > + if (RelationIsEnabledForWalCompression(reln) && > + (oldbuf == newbuf) && > + !XLogCheckBufferNeedsBackup(newbuf)) > + { > + uint32 enclen; You should note that thew check for RelationIsEnabledForWalCompression() here is racy and that that's ok because the worst that can happen is that a uselessly generated delta. > xlrec.target.node = reln->rd_node; > xlrec.target.tid = oldtup->t_self; > xlrec.old_xmax = HeapTupleHeaderGetRawXmax(oldtup->t_data); > @@ -6619,6 +6657,8 @@ log_heap_update(Relation reln, Buffer oldbuf, > xlrec.newtid = newtup->t_self; > if (new_all_visible_cleared) > xlrec.flags |= XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED; > + if (compressed) > + xlrec.flags |= XLOG_HEAP_DELTA_ENCODED; I think this also needs to unset XLOG_HEAP_CONTAINS_NEW_TUPLE and conditional on !need_tuple_data. > /* > + * Determine whether the buffer referenced has to be backed up. Since we don't > + * yet have the insert lock, fullPageWrites and forcePageWrites could change > + * later, but will not cause any problem because this function is used only to > + * identify whether EWT is required for update. > + */ > +bool > +XLogCheckBufferNeedsBackup(Buffer buffer) > +{ Should note very, very boldly that this can only be used in contexts where a race is acceptable. > diff --git a/src/backend/utils/adt/pg_rbcompress.c b/src/backend/utils/adt/pg_rbcompress.c > new file mode 100644 > index 0000000..877ccd7 > --- /dev/null > +++ b/src/backend/utils/adt/pg_rbcompress.c > @@ -0,0 +1,355 @@ > +/* ---------- > + * pg_rbcompress.c - > + * > + * This is a delta encoding scheme specific to PostgreSQL and designed > + * to compress similar tuples. It can be used as it is or extended for > + * other purpose in PostgrSQL if required. > + * > + * Currently, this just checks for a common prefix and/or suffix, but > + * the output format is similar to the LZ format used in pg_lzcompress.c. > + * > + * Copyright (c) 1999-2014, PostgreSQL Global Development Group > + * > + * src/backend/utils/adt/pg_rbcompress.c > + * ---------- > + */ This needs significantly more explanations about the algorithm and the reasoning behind it. > +static const PGRB_Strategy strategy_default_data = { > + 32, /* Data chunks less than 32 bytes are not > + * compressed */ > + INT_MAX, /* No upper limit on what we'll try to > + * compress */ > + 35, /* Require 25% compression rate, or not worth > + * it */ > +}; compression rate looks like it's mismatch between comment and code. > +/* ---------- > + * pgrb_out_ctrl - > + * > + * Outputs the last and allocates a new control byte if needed. > + * ---------- > + */ > +#define pgrb_out_ctrl(__ctrlp,__ctrlb,__ctrl,__buf) \ > +do { \ > + if ((__ctrl & 0xff) == 0) \ > + { \ > + *(__ctrlp) = __ctrlb; \ > + __ctrlp = (__buf)++; \ > + __ctrlb = 0; \ > + __ctrl = 1; \ > + } \ > +} while (0) > + double underscore variables are reserved for the compiler and os. > +/* ---------- > + * pgrb_out_literal - > + * > + * Outputs a literal byte to the destination buffer including the > + * appropriate control bit. > + * ---------- > + */ > +#define pgrb_out_literal(_ctrlp,_ctrlb,_ctrl,_buf,_byte) \ > +do { \ > + pgrb_out_ctrl(_ctrlp,_ctrlb,_ctrl,_buf); \ > + *(_buf)++ = (unsigned char)(_byte); \ > + _ctrl <<= 1; \ > +} while (0) > + > + > +/* ---------- > + * pgrb_out_tag - > + * > + * Outputs a backward reference tag of 2-4 bytes (depending on > + * offset and length) to the destination buffer including the > + * appropriate control bit. > + * ---------- > + */ > +#define pgrb_out_tag(_ctrlp,_ctrlb,_ctrl,_buf,_len,_off) \ > +do { \ > + pgrb_out_ctrl(_ctrlp,_ctrlb,_ctrl,_buf); \ > + _ctrlb |= _ctrl; \ > + _ctrl <<= 1; \ > + if (_len > 17) \ > + { \ > + (_buf)[0] = (unsigned char)((((_off) & 0xf00) >> 4) | 0x0f); \ > + (_buf)[1] = (unsigned char)(((_off) & 0xff)); \ > + (_buf)[2] = (unsigned char)((_len) - 18); \ > + (_buf) += 3; \ > + } else { \ > + (_buf)[0] = (unsigned char)((((_off) & 0xf00) >> 4) | ((_len) - 3)); \ > + (_buf)[1] = (unsigned char)((_off) & 0xff); \ > + (_buf) += 2; \ > + } \ > +} while (0) > + What's the reason to use macros here? Just use inline functions when dealing with file-local stuff. > +/* ---------- > + * pgrb_delta_encode - find common prefix/suffix between inputs and encode. > + * > + * source is the input data to be compressed > + * slen is the length of source data > + * history is the data which is used as reference for compression > + * hlen is the length of history data > + * The encoded result is written to dest, and its length is returned in > + * finallen. > + * The return value is TRUE if compression succeeded, > + * FALSE if not; in the latter case the contents of dest > + * are undefined. > + * ---------- > + */ > +bool > +pgrb_delta_encode(const char *source, int32 slen, > + const char *history, int32 hlen, > + char *dest, uint32 *finallen, > + const PGRB_Strategy *strategy) > +{ > + unsigned char *bp = ((unsigned char *) dest); > + unsigned char *bstart = bp; > + const char *dp = source; > + const char *dend = source + slen; > + const char *hp = history; > + unsigned char ctrl_dummy = 0; > + unsigned char *ctrlp = &ctrl_dummy; > + unsigned char ctrlb = 0; > + unsigned char ctrl = 0; > + int32 result_size; > + int32 result_max; > + int32 need_rate; > + int prefixlen; > + int suffixlen; > + > + /* > + * Tuples of length greater than PGRB_HISTORY_SIZE are not allowed for > + * delta encode as this is the maximum size of history offset. > + * XXX: still true? > + */ Why didn't you define a maximum tuple size in the strategy definition above then? > + if (hlen >= PGRB_HISTORY_SIZE || hlen < PGRB_MIN_MATCH) > + return false; > + > + /* > + * Our fallback strategy is the default. > + */ > + if (strategy == NULL) > + strategy = PGRB_strategy_default; > > + /* > + * If the strategy forbids compression (at all or if source chunk size out > + * of range), fail. > + */ > + if (slen < strategy->min_input_size || > + slen > strategy->max_input_size) > + return false; > + > + need_rate = strategy->min_comp_rate; > + if (need_rate < 0) > + need_rate = 0; > + else if (need_rate > 99) > + need_rate = 99; Is there really need for all this stuff here? This is so specific to the usecase that I have significant doubts that all the pglz boiler plate makes much sense. > + /* > + * Compute the maximum result size allowed by the strategy, namely the > + * input size minus the minimum wanted compression rate. This had better > + * be <= slen, else we might overrun the provided output buffer. > + */ > + /*if (slen > (INT_MAX / 100)) > + { > + /* Approximate to avoid overflow */ > + /*result_max = (slen / 100) * (100 - need_rate); > + } > + else > + { > + result_max = (slen * (100 - need_rate)) / 100; > + }*/ err? > +-- > +-- Test to update continuos and non continuos columns > +-- *continuous I have to admit, I have serious doubts about this approach. I have a very hard time believing this won't cause performance regression in many common cases... More importantly I don't think doing the compression on this level is that interesting. I know Heikki argued for it, but I think extending the bitmap that's computed for HOT to cover all columns and doing this on a column level sounds much more sensible to me. Greetings, Andres Freund
pgsql-hackers by date: