Re: Collect frequency statistics for arrays - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: Collect frequency statistics for arrays |
Date | |
Msg-id | CAPpHfdtwyn7QFy-1mo+fHHof=Bf8BTdbOO84a_aH9gGLXTmb0w@mail.gmail.com Whole thread Raw |
In response to | Re: Collect frequency statistics for arrays (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Collect frequency statistics for arrays
|
List | pgsql-hackers |
<div class="gmail_quote">On Mon, Mar 5, 2012 at 1:11 AM, Tom Lane <span dir="ltr"><<a href="mailto:tgl@sss.pgh.pa.us"target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br /><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> BTW, one other thing about thecount histogram: seems like we are<br /> frequently generating uselessly large ones. For instance, do ANALYZE<br /> inthe regression database and then run<br /><br /> select tablename,attname,elem_count_histogram from pg_stats<br /> whereelem_count_histogram is not null;<br /><br /> You get lots of entries that look like this:<br /><br /> pg_proc | proallargtypes | {1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,6,6,6,2.80556}<br /> pg_proc | proargmodes | {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1.61111}<br /> pg_proc | proargnames | {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,7,7,7,7,8,8,8,14,14,15,16,3.8806}<br /> pg_proc | proconfig | {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}<br /> pg_class | reloptions | {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}<br /><br/> which seems to me to be a rather useless expenditure of space.<br /> Couldn't we reduce the histogram size when therearen't many<br /> different counts?<br /><br /> It seems fairly obvious to me that we could bound the histogram<br />size with (max count - min count + 1), but maybe something even<br /> tighter would work; or maybe I'm missing somethingand this would<br /> sacrifice accuracy.<br /></blockquote><div class="gmail_quote"><br /></div><div class="gmail_quote">True.If (max count - min count + 1) is small, enumerating of frequencies is both more compact and moreprecise representation. Simultaneously, if (max count - min count + 1) is large, we can run out of statistics_targetwith such representation. We can use same representation of count distribution as for scalar column value: MCVand HISTOGRAM, but it would require additional statkind and statistics slot. Probably, you've better ideas?</div><br/>------<br />With best regards,<br />Alexander Korotkov.</div>
pgsql-hackers by date: