Re: Slow query with planner row strange estimation - Mailing list pgsql-performance
From | phb07 |
---|---|
Subject | Re: Slow query with planner row strange estimation |
Date | |
Msg-id | 4C3B7C0D.6070404@apra.asso.fr Whole thread Raw |
In response to | Re: Slow query with planner row strange estimation (Dimitri <dimitrik.fr@gmail.com>) |
Responses |
Re: Slow query with planner row strange estimation
Re: Slow query with planner row strange estimation |
List | pgsql-performance |
Dimitri a écrit : > It's probably one of the cases when having HINTS in PostgreSQL may be > very helpful.. > > SELECT /*+ enable_nestloop=off */ ... FROM ... > > will just fix this query without impacting other queries and without > adding any additional instructions into the application code.. > > So, why there is a such resistance to implement hints withing SQL > queries in PG?.. > > Rgds, > -Dimitri > > +1. Another typical case when it would be helpful is with setting the cursor_tuple_fraction GUC variable for a specific statement, without being obliged to issue 2 SET statements, one before the SELECT and the other after. > On 7/9/10, Robert Haas <robertmhaas@gmail.com> wrote: > >> On Fri, Jul 9, 2010 at 6:13 AM, damien hostin <damien.hostin@axege.com> >> wrote: >> >>>> Have you tried running ANALYZE on the production server? >>>> >>>> You might also want to try ALTER TABLE ... SET STATISTICS to a large >>>> value on some of the join columns involved in the query. >>>> >>> Hello, >>> >>> Before comparing the test case on the two machines, I run analyse on the >>> whole and look at pg_stats table to see if change occurs for the columns. >>> but on the production server the stats never became as good as on the >>> desktop computer. I set statistic at 10000 on column used by the join, run >>> analyse which take a 3000000 row sample then look at the stats. The stats >>> are not as good as on the desktop. Row number is nearly the same but only >>> 1 >>> or 2 values are found. >>> >>> The data are not balanced the same way on the two computer : >>> - Desktop is 12000 rows with 6000 implicated in the query (50%), >>> - "Production" (actually a dev/test server) is 6 million rows with 6000 >>> implicated in the query (0,1%). >>> Columns used in the query are nullable, and in the 5994000 other rows that >>> are not implicated in the query these columns are null. >>> >>> I don't know if the statistic target is a % or a number of value to >>> obtain, >>> >> It's a number of values to obtain. >> >> >>> but event set at max (10000), it didn't managed to collect good stats (for >>> this particular query). >>> >> I think there's a cutoff where it won't collect values unless they >> occur significantly more often than the average frequency. I wonder >> if that might be biting you here: without the actual values in the MCV >> table, the join selectivity estimates probably aren't too good. >> >> >>> As I don't know what more to do, my conclusion is that the data need to be >>> better balanced to allow the analyse gather better stats. But if there is >>> a >>> way to improve the stats/query with this ugly balanced data, I'm open to >>> it >>> ! >>> >>> I hope that in real production, data will never be loaded this way. If >>> this >>> appened we will maybe set enable_nestloop to off, but I don't think it's a >>> good solution, other query have a chance to get slower. >>> >> Yeah, that usually works out poorly. >> >> -- >> Robert Haas >> EnterpriseDB: http://www.enterprisedb.com >> The Enterprise Postgres Company >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance >> >> Regards. Philippe Beaudoin.
pgsql-performance by date: