Home > mailing lists

Re: GIN improvements part2: fast scan - Mailing list pgsql-hackers

From	Alexander Korotkov
Subject	Re: GIN improvements part2: fast scan
Date	March 12, 2014 17:42:58
Msg-id	CAPpHfduJZiJAiR7qfOt2mH6kNbuBoQMbYJb9+7LgEb5pv53eHA@mail.gmail.com Whole thread Raw
In response to	Re: GIN improvements part2: fast scan (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses	Re: GIN improvements part2: fast scan
List	pgsql-hackers

Tree view

<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Mar 12, 2014 at 8:29 PM, Heikki Linnakangas
<spandir="ltr"><<a href="mailto:hlinnakangas@vmware.com" target="_blank">hlinnakangas@vmware.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div
class="">On03/12/2014 12:09 AM, Tomas Vondra wrote:<br /><blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi all,<br
/><br/> a quick question that just occured to me - do you plan to tweak the cost<br /> estimation fot GIN indexes, in
thispatch?<br /><br /> IMHO it would be appropriate, given the improvements and gains, but it<br /> seems to me
gincostestimate()was not touched by this patch.<br /></blockquote><br /></div> Good point. We have done two major
changesto GIN in this release cycle: changed the data page format and made it possible to skip items without fetching
allthe keys ("fast scan"). gincostestimate doesn't know about either change.<br /><br /> Adjusting gincostestimate for
themore compact data page format seems easy. When I hacked on that, I assumed all along that gincostestimate doesn't
needto be changed as the index will just be smaller, which will be taken into account automatically. But now that I
lookat gincostestimate, it assumes that the size of one item on a posting tree page is a constant 6 bytes
(SizeOfIptrData),which is no longer true. I'll go fix that.<br /><br /> Adjusting for the effects of skipping is
harder.gincostestimate needs to do the same preparation steps as startScanKey: sort the query keys by frequency, and
callconsistent function to split the keys intao "required" and "additional" sets. And then model that the "additional"
entriesonly need to be fetched when the other keys match. That's doable in principle, but requires a bunch of extra
code.<br/><br /> Alexander, any thoughts on that? It's getting awfully late to add new code for that, but it sure would
benice somehow take fast scan into account.</blockquote><div class="gmail_quote"><br /></div><div class="gmail_quote">
Preparationwe do in startScanKey requires knowledge of estimate size of posting lists/trees. We do this estimate by
traversalto leaf pages. I think gincostestimate is expected to be way more cheap. So, we probably need so more rough
estimatethere, don't we?</div><br />------<br />With best regards,<br />Alexander Korotkov.<br /></div></div></div>

pgsql-hackers by date:

From: Robert Haas
Date: 12 March 2014, 17:41:11
Subject: Re: The case against multixact GUCs

From: Robert Haas
Date: 12 March 2014, 17:44:35
Subject: Re: COPY table FROM STDIN doesn't show count tag

Re: GIN improvements part2: fast scan - Mailing list pgsql-hackers

Previous

Next