Alexander Korotkov <aekorotkov@gmail.com> writes: > On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I am starting to look at this patch now. I'm wondering exactly why the >> decision was made to continue storing btree-style statistics for arrays,
> Probably, btree statistics really does matter for some sort of arrays? For > example, arrays representing paths in the tree. We could request a subtree > in a range query on such arrays.
That seems like a pretty narrow, uncommon use-case. Also, to get accurate stats for such queries that way, you'd need really enormous histograms. I doubt that the existing parameters for histogram size will permit meaningful estimation of more than the first array entry (since we don't make the histogram any larger than we do for a scalar column).
The real point here is that the fact that we're storing btree-style stats for arrays is an accident, backed into by having added btree comparators for arrays plus analyze.c's habit of applying default scalar-oriented analysis functions to any type without an explicit typanalyze entry. I don't recall that we ever thought hard about it or showed that those stats were worth anything.
OK. I don't object to removing btree stats from arrays.
What do you thinks about pg_stats view in this case? Should it combine values histogram and array length histogram in single column like do for MCV and MCELEM?