Re: Improving btree performance through specializing by key shape, take 2 - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Improving btree performance through specializing by key shape, take 2 |
Date | |
Msg-id | CAFiTN-t1jfttH7MEZ8JsqfE7mguq8gg07hS0o3EHA2zAFCWBsQ@mail.gmail.com Whole thread Raw |
In response to | Re: Improving btree performance through specializing by key shape, take 2 (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Improving btree performance through specializing by key shape, take 2
|
List | pgsql-hackers |
On Tue, Jun 27, 2023 at 9:42 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Jun 23, 2023 at 8:16 PM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > > On Fri, 23 Jun 2023 at 11:26, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > and I have one confusion in the > > > below hunk in the _bt_moveright function, basically, if the parent > > > page's right key is exactly matching the HIGH key of the child key > > > then I suppose while doing the "_bt_compare" with the HIGH_KEY we can > > > use the optimization right, i.e. column number from where we need to > > > start the comparison should be used what is passed by the caller. But > > > in the below hunk, you are always passing that as 'cmpcol' which is 1. > > > I think this should be '*comparecol' because '*comparecol' will either > > > hold the value passed by the parent if high key data exactly match > > > with the parent's right tuple or it will hold 1 in case it doesn't > > > match. Am I missing something? > > > > We can't carry _bt_compare prefix results across pages, because the > > key range of a page may shift while we're not holding a lock on that > > page. That's also why the code resets the prefix to 1 every time it > > accesses a new page ^1: it cannot guarantee correct results otherwise. > > See also [0] and [1] for why that is important. > > Yeah that makes sense > > > ^1: When following downlinks, the code above your quoted code tries to > > reuse the _bt_compare result of the parent page in the common case of > > a child page's high key that is bytewise equal to the right separator > > tuple of the parent page's downlink to this page. However, if it > > detects that this child high key has changed (i.e. not 100% bytewise > > equal), we can't reuse that result, and we'll have to re-establish all > > prefix info on that page from scratch. > > In any case, this only establishes the prefix for the right half of > > the page's keyspace, the prefix of the left half of the data still > > needs to be established separetely. > > > > I hope this explains the reasons for why we can't reuse comparecol as > > _bt_compare argument. > > Yeah got it, thanks for explaining this. Now I see you have explained > this in comments as well above the memcmp() statement. At high level 0001 looks fine to me, just some suggestions 1. +Notes about dynamic prefix truncation +------------------------------------- I feel instead of calling it "dynamic prefix truncation" should we can call it "dynamic prefix skipping", I mean we are not really truncating anything right, we are just skipping those attributes in comparison? 2. I think we should add some comments in the _bt_binsrch() function where we are having main logic around maintaining highcmpcol and lowcmpcol. I think the notes section explains that very clearly but adding some comments here would be good and then reference to that section in the README. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: