Home > mailing lists

Re: Subquery flattening causing sequential scan - Mailing list pgsql-performance

From	Ondrej Ivanič
Subject	Re: Subquery flattening causing sequential scan
Date	December 27, 2011 18:21:18
Msg-id	CAM6mieL3XY25gGQacD7EYnWg9z-P2=kAEN_15xAQvic=LQTa7w@mail.gmail.com Whole thread
In response to	Re: Subquery flattening causing sequential scan (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Subquery flattening causing sequential scan
List	pgsql-performance

Tree view

Hi,

On 28 December 2011 05:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Possibly raising the stats target on emsg_messages would help.

In the function std_typanalyze() is this comment:

       /*--------------------
         * The following choice of minrows is based on the paper
         * "Random sampling for histogram construction: how much is enough?"
         * by Surajit Chaudhuri, Rajeev Motwani and Vivek Narasayya, in
         * Proceedings of ACM SIGMOD International Conference on Management
         * of Data, 1998, Pages 436-447.  Their Corollary 1 to Theorem 5
         * says that for table size n, histogram size k, maximum relative
         * error in bin size f, and error probability gamma, the minimum
         * random sample size is
         *      r = 4 * k * ln(2*n/gamma) / f^2
         * Taking f = 0.5, gamma = 0.01, n = 10^6 rows, we obtain
         *      r = 305.82 * k
         * Note that because of the log function, the dependence on n is
         * quite weak; even at n = 10^12, a 300*k sample gives <= 0.66
         * bin size error with probability 0.99.  So there's no real need to
         * scale for n, which is a good thing because we don't necessarily
         * know it at this point.
         *--------------------
         */

The question is why the parameter f is not exposed as a GUC? Sometimes
it could make sense to have few bins with better estimation (for same
r).

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

pgsql-performance by date:

From: Pavel Stehule
Date: 27 December 2011, 18:21:09
Subject: Re: Performance costs of various PL languages

From: Merlin Moncure
Date: 27 December 2011, 18:54:33
Subject: Re: Performance costs of various PL languages

Re: Subquery flattening causing sequential scan - Mailing list pgsql-performance

Previous

Next