Re: More stable query plans via more predictable column statistics - Mailing list pgsql-hackers
From | Joel Jacobson |
---|---|
Subject | Re: More stable query plans via more predictable column statistics |
Date | |
Msg-id | CAASwCXdpk5648Jo-QCmnya42x8a-4zm6+qm6=pZguxOTKJDsvw@mail.gmail.com Whole thread Raw |
In response to | Re: More stable query plans via more predictable column statistics ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>) |
Responses |
Re: More stable query plans via more predictable column statistics
|
List | pgsql-hackers |
On Wed, Mar 9, 2016 at 1:25 AM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote: > Thank you for spending your time to run these :-) n/p, it took like 30 seconds :-) > I don't want to be asking for too much here, but is there a chance you could > try the effects of the proposed patch on an offline copy of your database? Yes, I think that should be possible. > Do you envision or maybe have experienced problems with query plans > referring to the columns that are near the top of the above hist_ratio > report? In other words: what are the practical implications for you with > the values being duplicated rather badly throughout the histogram like in > the example you shown? I don't know much about the internals of query planner, I just read the "57.1. Row Estimation Examples" to get a basic understanding. If I understand it correctly, if the histogram_bounds contains a lot of duplicated values, then the row estimation will be inaccurate, which in turn will trick the query planner into a sub-optimal plan? We've had some problems lately with the query planner, or actually we've always had them but never noticed them nor cared about them, but now during peak times we've had short periods where we haven't been able to fully cope up with the traffic. I tracked down the most self_time-consuming functions and quickly saw how to optimize them. Many of them where on the form: SELECT .. FROM SomeBigTable WHERE Col1 = [some dynamic value] AND Col2 = [some constant value] AND Col3 = [some other constant value] The number of rows matching the WHERE clause were very tiny, perfect match for a partial index: CREATE INDEX .. ON SomeBigTable USING btree (Col1) WHERE Col2 = [some constant value] AND Col3 = [some other constant value]; Even though the new partial index matched the query perfectly, the query planner didn't want to use it. Instead it continued to use some other sub-optimal index. The only way to force it to use the correct index was to use the "+0"-trick which I recently learned from one of my colleagues: SELECT .. FROM SomeBigTable WHERE Col1 = [some dynamic value] AND Col2+0 = [some constant value] AND Col3+0 = [some other constant value] CREATE INDEX .. ON SomeBigTable USING btree (Col1) WHERE Col2+0 = [some constant value] AND Col3+0 = [some other constant value]; By adding +0 to the columns, the query planner will as I understand it be extremely motivated to use the correct index, as otherwise it would have to do a seq scan on the entire big table, which would be very costly. I'm glad the trick worked, now the system is fast again. We're still on 9.1, so maybe these problems will go away once we upgrade to 9.5. I don't know if these problems I described can be fixed by your patch, but I wanted to share this story since I know our systems (Trustly's and Zalando's) are quite similar in design, so maybe you have experienced something similar. (Side note: My biggest wish would be some way to specify explicitly on a per top-level function level a list of indexes the query planner is allowed to consider or is NOT allowed to consider.)
pgsql-hackers by date: