Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Performance Improvement by reducing WAL for Update Operation |
Date | |
Msg-id | CAA4eK1+853pjPyL3kxKfwi-Tow57sO5L4purbAxBhaC-Czih+Q@mail.gmail.com Whole thread Raw |
In response to | Re: Performance Improvement by reducing WAL for Update Operation (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Performance Improvement by reducing WAL for Update
Operation
|
List | pgsql-hackers |
On Thu, Feb 13, 2014 at 10:31 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Feb 10, 2014 at 10:02 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> I think if we want to change LZ format, it will be bit more work and >> verification for decoding has to be done much more strenuously. > > I don't think it'll be that big of a deal. And anyway, the evidence > here suggests that we still need more speed. Okay. I did one small hack (for unmatched part directly copy it to destination buffer, instead of getting it through LZ i.e memcpy unchanged data in destination buffer) in patch to find if format change or doing memcpy instead of byte-by-byte can give us any benefit and found that it can give benefit, but may not be very high. We cannot change it like this if we have to do some change in format, but this is just a quick hack to see if such a change can give us benefit. The data is fluctuating as it is purely CPU based test, so what I have done is that run the same test five times and took the best data for all 3 the patches. Explanation of changes in 2 patches other than master is given after data: Performance Data ----------------------------- Non-Default settings checkpoint_segments = 128 checkpoint_timeout = 15 min full_page_writes = off Unpatched testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 348847264 | 5.30486917495728 ten long fields, 8 bytes changed | 348848384 | 5.42504191398621 ten long fields, 8 bytes changed | 348841384 | 5.59665489196777 (3 rows) wal-update-prefix-suffix-encode-2.patch testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 300706992 | 5.83324003219604 ten long fields, 8 bytes changed | 303039200 | 5.8794629573822 ten long fields, 8 bytes changed | 300707256 | 6.04627680778503 (3 rows) wal-update-prefix-suffix-encode-3.patch testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 271815824 | 4.74523997306824 ten long fields, 8 bytes changed | 273221608 | 5.36515283584595 ten long fields, 8 bytes changed | 271818664 | 5.76620006561279 (3 rows) Changes in wal-update-prefix-suffix-encode-2.patch 1. Remove the check at end of pgrb_delta_encode() that checks if encoded buffer has more than 75% of tuple data, as before starting for copying literal bytes we have already ensured that the compressed data is greater than >25%, so there should not be any harm in avoiding this check. Changes in wal-update-prefix-suffix-encode-3.patch 1. Kept change of wal-update-prefix-suffix-encode-2.patch 2. Changed copying of unmatched literal bytes to memcpy Considering median data for all patches, there is a CPU overhead of 8.37% with version-2 and there is a CPU gain of 1.11% with version-3 of patch. Now here there is a small catch that even if we want to change the LZ format for prefix-suffix encoding, the CPU data shown above with memcpy might not be same, rather it will depend on whether we can come up with good format which can give us same benefit as direct memcpy is giving. One of the ideas for change in format: Tag for prefix/suffix match 12 bits - offset 12 bits - length Value for unmatched data 1 or 2 bytes for length depending on length of data (first bit can indicate whether we need 1 byte or 2 bytes) data Now considering above format let us see how much difference in data would it create as compare to LZ format. For example, consider the data of current worst case: Suffix match ~ 200 bytes unmatched data ~600 bytes To represent suffix match, both formats will take same amount of bytes, For unmatched data, LZ format would take 10 extra bytes (it uses 1bit to indicate 1 uncompressed byte) where as above changed format will take 2 bytes, also more the uncompressed data more extra bytes it can take in LZ format. However for few unchanged bytes (<64), I think LZ format will use lesser number of bits, but in that case anyway we will get compression, so loosing few bits should not matter. I think that CPU overhead less than 5% for worst case could have been considered acceptable and this is on bit higher side, but do you think that it is so high that it deserves change in format? One more idea, I have in mind but still not tried for prefix-suffix match i.e to try with minimum compression ratio as 30% rather than 25%, not sure if it can reduce overhead to less than 5% for worst case without loosing on any other case. Test used is same as provided in mail: http://www.postgresql.org/message-id/CAA4eK1+k5-Jo3SLHFuSK2Y59TL+zctVVBFGwXawH6KhrLnW6=w@mail.gmail.com Patch for v-2 and v-3 are attached Below is the data for 5 runs with all the patches, this is just to show the fluctuation in data: Unpatched testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 348844424 | 6.0697078704834 ten long fields, 8 bytes changed | 348845440 | 6.25980114936829 ten long fields, 8 bytes changed | 348846632 | 6.28065395355225 (3 rows) testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 352182832 | 7.78950119018555 ten long fields, 8 bytes changed | 348841592 | 6.33335590362549 ten long fields, 8 bytes changed | 348842592 | 5.47767996788025 (3 rows) testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 352481368 | 6.10013723373413 ten long fields, 8 bytes changed | 348845216 | 6.23139500617981 ten long fields, 8 bytes changed | 348846328 | 7.20329117774963 (3 rows) testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 352780032 | 5.71489500999451 ten long fields, 8 bytes changed | 348848256 | 6.01294183731079 ten long fields, 8 bytes changed | 348845640 | 5.97938108444214 (3 rows) testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 348847264 | 5.30486917495728 ten long fields, 8 bytes changed | 348848384 | 5.42504191398621 ten long fields, 8 bytes changed | 348841384 | 5.59665489196777 (3 rows) wal-update-prefix-suffix-encode-2.patch testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 300706992 | 5.83324003219604 ten long fields, 8 bytes changed | 303039200 | 5.8794629573822 ten long fields, 8 bytes changed | 300707256 | 6.04627680778503 (3 rows) testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 300703744 | 7.27797102928162 ten long fields, 8 bytes changed | 300701984 | 7.3160879611969 ten long fields, 8 bytes changed | 300700360 | 7.88055396080017 (3 rows) testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 300705024 | 7.86505889892578 ten long fields, 8 bytes changed | 300702544 | 7.78658819198608 ten long fields, 8 bytes changed | 300700128 | 6.14991092681885 (3 rows) testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 300700520 | 6.61981701850891 ten long fields, 8 bytes changed | 301010008 | 6.38593101501465 ten long fields, 8 bytes changed | 300705136 | 6.31078720092773 (3 rows) testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 300705512 | 5.61318206787109 ten long fields, 8 bytes changed | 300703776 | 6.2267439365387 ten long fields, 8 bytes changed | 300701240 | 6.4169659614563 (3 rows) wal-update-prefix-suffix-encode-3.patch testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 271821064 | 6.24568295478821 ten long fields, 8 bytes changed | 271818992 | 6.68939399719238 6.86% overhead. ten long fields, 8 bytes changed | 271816880 | 6.63792490959167 (3 rows) testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 271819992 | 5.78784203529358 ten long fields, 8 bytes changed | 271822232 | 4.71433019638062 ten long fields, 8 bytes changed | 271820128 | 5.84002709388733 (3 rows) testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 271815824 | 4.74523997306824 ten long fields, 8 bytes changed | 273221608 | 5.36515283584595 ten long fields, 8 bytes changed | 271818664 | 5.76620006561279 (3 rows) testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 271818872 | 5.49491405487061 ten long fields, 8 bytes changed | 271816776 | 6.59977793693542 ten long fields, 8 bytes changed | 271822752 | 5.1178731918335 (3 rows) testname | wal_generated | duration ----------------------------------+---------------+------------------ ten long fields, 8 bytes changed | 275747216 | 6.48244714736938 ten long fields, 8 bytes changed | 274589280 | 5.66005206108093 ten long fields, 8 bytes changed | 271818400 | 5.08064913749695 (3 rows) With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: