Re: multivariate statistics v14 - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: multivariate statistics v14 |
Date | |
Msg-id | 89341a68-4729-ad28-bb39-cef31849aedb@2ndquadrant.com Whole thread Raw |
In response to | Re: multivariate statistics v14 (Tatsuo Ishii <ishii@postgresql.org>) |
Responses |
Re: multivariate statistics v14
|
List | pgsql-hackers |
Hello, On 03/22/2016 09:13 AM, Tatsuo Ishii wrote: >>> Do you have any other missing parts in this work? I am asking >>> because I wonder if you want to push this into 9.6 or rather 9.7. >> >> I think the first few parts of the patch series, namely: >> >> * shared infrastructure (0002) >> * functional dependencies (0003) >> * MCV lists (0004) >> * histograms (0005) >> >> might make it into 9.6. I believe the code for building and storing >> the different kinds of stats is reasonably solid. What probably needs >> more thorough review are the changes in clauselist_selectivity(), but >> the code in these parts is reasonably simple as it only supports using >> a single multi-variate statistics per relation. >> >> The part (0006) that allows using multiple statistics (i.e. selects >> which of the available stats to use and in what order) is probably the >> most complex part of the whole patch, and I myself do have some >> questions about some aspects of it. I don't think this part might get >> into 9.6 at this point (although it'd be nice if we managed to do >> that). > > Hum. So without 0006 or beyond, there's not much benefit for the > PostgreSQL users, and you are not too confident about 0006 or > beyond. Then I would think it is a little bit hard to justify in > putting 000[2-5] into 9.6. I really like this feature and would like > to see in PostgreSQL someday, but I'm not sure if we should put the > patches (0002-0005) into PostgreSQL now. Please let me know if there's > some reaons we should put the patches into PostgreSQL now. I don't think so. While being able to combine multiple statistics is certainly useful, I'm convinced that the initial patched add enough value on their own, even if the 0006 patch gets committed later. A lot of queries will be just fine with the "single multivariate statistics" limitation, either because it's using less than 8 columns, or because only 8 columns are actually correlated. (FWIW the 8 column limit is mostly arbitrary, it may get increased if needed.) I haven't really mentioned the aspects of 0006 that I think need more discussion, but it's mostly about the question whether combining the statistics by using the overlapping clauses as "conditions" is the right thing to do (or whether a more expensive approach is needed). None of that however invalidates the preceding patches. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: