Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC] - Mailing list pgsql-hackers
From | Anastasia Lubennikova |
---|---|
Subject | Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC] |
Date | |
Msg-id | ac2d0b41-84e2-153c-d41a-3456fdeb5785@postgrespro.ru Whole thread Raw |
In response to | Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC] (Andrew Borodin <borodin@octonica.com>) |
Responses |
Re: Re: GiST optimizing memmoves in gistplacetopage for
fixed-size updates [PoC]
|
List | pgsql-hackers |
09.08.2016 19:45, Andrew Borodin: > Here is new version of the patch, now it includes recommendations from > Anastasia Lubennikova. > > > I've investigated anamalous index size decrease. Most probable version > appeared to be true. > Cube extension, as some others, use Guttman's polynomial time split > algorithm. It is known to generate "needle-like" MBBs (MBRs) for > sorted data due to imbalanced splits (splitting 100 tuples as 98 to > 2). Due to previous throw-to-the-end behavior of GiST this imbalance > was further amplificated (most of inserts were going to bigger part > after split). So GiST inserts were extremely slow for sorted data. > There is no need to do exact sorting to trigger this behavior. > This behavior can be fused by implementation of small-m restriction in > split (AFAIR this is described in original R-tree paper from 84), > which prescribes to do a split in a parts no smaller than m, where m > is around 20% of a page capacity (in tuples number). After application > of this patch performance for sorted data is roughly the same as > performance for randomized data. Thank you for explanation. Now it's clear to me. I did some more testing and found nothing special. The declared feature is implemented correctly. It passes all regression tests and improves performance. I still have a few minor nitpicks about the patch style. You can find a style guide on https://www.postgresql.org/docs/9.6/static/source.html 1) remove extra whitespace in README 2) add whitespace: if(ntup == 1) 3) fix comments in accordance with coding conventions /* In case of single tuple update GiST calls overwrite * In all other cases function gistplacetopage deletes * old tuplesand place updated at the end * */ + /* Normally here was following assertion + * Assert(ItemIdHasStorage(ii)); + * This assertion was introduced in PageIndexTupleDelete + * But if this function will be used from BRIN index + * this assertion will fail. Thus, here we do not check that + * item has the storage. + * */ 4) remove unrelated changes - data += sizeof(OffsetNumber) * xldata->ntodelete; + data += sizeof(OffsetNumber) *xldata->ntodelete; 5) If the comment is correct, maxalignment is not necessary. + /* tuples on a page are always maxaligned */ + oldsize = MAXALIGN(oldsize); 6) I'd rather use alignednewsize here. + ItemIdSetNormal(tupid, offset + size_diff, newsize); After the cleanup you can change status to "Ready for Committer" without waiting for the response. > If data is randomized this effect of Guttman poly-time split is not in > effect; test from the start of the thread will show no statistically > confident improvement by the patch. > To prove performance increase in randomized case I've composed > modified test https://gist.github.com/x4m/856b2e1a5db745f8265c76b9c195f2e1 > This test with five runs shows following: > Before patch > Insert Time AVG 78 seconds STD 9.5 > Afer patch > Insert Time AVG 68 seconds STD 7.7 > This is marginal but statistically viable performance improvement. > > For sorted data performance is improved by a factor of 3. > Best regards, Andrey Borodin, Octonica & Ural Federal University. > -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: