Re: GiST kNN search queue (Re: KNN-GiST with recheck) - Mailing list pgsql-hackers
From | Arthur Silva |
---|---|
Subject | Re: GiST kNN search queue (Re: KNN-GiST with recheck) |
Date | |
Msg-id | CAO_YK0UWipBifVgNY1zqha8DJi84Kr=WEhDFLvneKo9Wzj2fBw@mail.gmail.com Whole thread Raw |
In response to | GiST kNN search queue (Re: KNN-GiST with recheck) (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: GiST kNN search queue (Re: KNN-GiST with recheck)
|
List | pgsql-hackers |
<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">On Wed, Dec 10, 2014 at 1:50 PM, Heikki Linnakangas<span dir="ltr"><<a href="mailto:hlinnakangas@vmware.com" target="_blank">hlinnakangas@vmware.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px#ccc solid;padding-left:1ex">On 01/28/2014 04:12 PM, Alexander Korotkov wrote:<br /><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> >3. A binary heap would be abetter data structure to buffer the rechecked<br /> >values. A Red-Black tree allows random insertions and deletions,but in<br /> >this case you need to insert arbitrary values but only remove the minimum<br /> >item. That'sexactly what a binary heap excels at. We have a nice binary<br /> >heap implementation in the backend that you canuse, see<br /> >src/backend/lib/binaryheap.c.<br /> ><br /></blockquote> Hmm. For me binary heap would be a betterdata structure for KNN-GiST at<br /> all :-)<br /></blockquote><br /> I decided to give this a shot, replacing thered-black tree in GiST with the binary heap we have in lib/binaryheap.c. It made the GiST code somewhat simpler, as thebinaryheap interface is simpler than the red-black tree one. Unfortunately, performance was somewhat worse. That was quitesurprising, as insertions and deletions are both O(log N) in both data structures, but the red-black tree implementationis more complicated.<br /><br /> I implemented another data structure called a Pairing Heap. It's also a fairlysimple data structure, but insertions are O(1) instead of O(log N). It also performs fairly well in practice.<br /><br/> With that, I got a small but measurable improvement. To test, I created a table like this:<br /><br /> create tablegisttest (id integer, p point);<br /> insert into gisttest select id, point(random(), random()) from generate_series(1,1000000) id;<br /> create index i_gisttest on gisttest using gist (p);<br /><br /> And I ran this querywith pgbench:<br /><br /> select id from gisttest order by p <-> '(0,0)' limit 1000;<br /><br /> With unpatchedmaster, I got about 650 TPS, and with the patch 720 TPS. That's a nice little improvement, but perhaps more importantly,the pairing heap implementation consumes less memory. To measure that, I put a MemoryContextStats(so-><u></u>queueCtx)call into gistendscan. With the above query, but without the "limit" clause, onmaster I got:<br /><br /> GiST scan context: 2109752 total in 10 blocks; 2088456 free (24998 chunks); 21296 used<br /><br/> And with the patch:<br /><br /> GiST scan context: 1061160 total in 9 blocks; 1040088 free (12502 chunks); 21072used<br /><br /> That's 2MB vs 1MB. While that's not much in absolute terms, it'd be nice to reduce that memory consumption,as there is no hard upper bound on how much might be needed. If the GiST tree is really disorganized for somereason, a query might need a lot more.<br /><br /><br /> So all in all, I quite like this patch, even though it doesn'tdo anything too phenomenal. It adds a some code, in the form of the new pairing heap implementation, but it makesthe GiST code a little bit simpler. And it gives a small performance gain, and reduces memory usage a bit.<span><fontcolor="#888888"><br /><br /> - Heikki<br /><br /></font></span><br /><br /> --<br /> Sent via pgsql-hackersmailing list (<a href="mailto:pgsql-hackers@postgresql.org" target="_blank">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your subscription:<br /><a href="http://www.postgresql.org/mailpref/pgsql-hackers" target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/><br /></blockquote></div><br /></div><div class="gmail_extra">Itmay be better to replace the lib/binaryheap altogether as it offers comparable/better performance.<br/></div></div>
pgsql-hackers by date: