Re: [HACKERS] Perfomance bug in v10 - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: [HACKERS] Perfomance bug in v10 |
Date | |
Msg-id | CAKJS1f-LzkUsFxdJ_-Luy38orQ+AdEXM5o+vANR+-pHAWPSecg@mail.gmail.com Whole thread Raw |
In response to | [HACKERS] Perfomance bug in v10 (Teodor Sigaev <teodor@sigaev.ru>) |
Responses |
Re: [HACKERS] Perfomance bug in v10
Re: [HACKERS] Perfomance bug in v10 |
List | pgsql-hackers |
On 1 June 2017 at 04:16, Teodor Sigaev <teodor@postgrespro.ru> wrote: > I found an example where v10 chooses extremely non-optimal plan: > select > i::int as a, > i::int + 1 as b, > 0 as c > into t > from > generate_series(1,32) as i; > > create unique index i on t (c, a); > > explain analyze > SELECT > t1.a, t1.b, > t2.a, t2.b, > t3.a, t3.b, > t4.a, t4.b, > t5.a, t5.b, > t6.a, t6.b > /* > , > t7.a, t7.b, > t8.a, t8.b, > t9.a, t9.b, > t10.a, t10.b > */ > FROM t T1 > LEFT OUTER JOIN t T2 > ON T1.b = T2.a AND T2.c = 0 > LEFT OUTER JOIN t T3 > ON T2.b = T3.a AND T3.c = 0 > LEFT OUTER JOIN t T4 > ON T3.b = T4.a AND T4.c = 0 > LEFT OUTER JOIN t T5 > ON T4.b = T5.a AND T5.c = 0 > LEFT OUTER JOIN t T6 > ON T5.b = T6.a AND T6.c = 0 > LEFT OUTER JOIN t T7 > ON T6.b = T7.a AND T7.c = 0 > LEFT OUTER JOIN t T8 > ON T7.b = T8.a AND T8.c = 0 > LEFT OUTER JOIN t T9 > ON T8.b = T9.a AND T9.c = 0 > LEFT OUTER JOIN t T10 > ON T9.b = T10.a AND T10.c = 0 > WHERE T1.c = 0 AND T1.a = 5 > ; That's pretty unfortunate. It only chooses this plan due to lack of any useful stats on the table. The planner thinks that a seqscan on t6 with Filter (c = 0) will return 1 row, which is not correct. In the good plan t1 is the outer rel of the inner most loop. Here the filter is c = 0 and a = 5, which *does* actually return only 1 row, which means that all of the other nested loops only execute once, as predicted. This is all caused by get_variable_numdistinct() deciding that all values are distinct because ntuples < DEFAULT_NUM_DISTINCT. I see that if the example is increased to use 300 tuples instead of 32, then that's enough for the planner to estimate 2 rows instead of clamping to 1, and the bad plan does not look so good anymore since the planner predicts that those nested loops need to be executed more than once. I really think the planner is too inclined to take risks by nesting Nested loops like this, but I'm not all that sure the best solution to fix this, and certainly not for beta1. So, I'm a bit unsure exactly how best to deal with this. It seems like we'd better make some effort, as perhaps this could be a case that might occur when temp tables are used and not ANALYZED, but the only way I can think to deal with it is not to favour unique inner nested loops in the costing model. The unfortunate thing about not doing this is that the planner will no longer swap the join order of a 2-way join to put the unique rel on the inner side. This is evident by the regression test failures caused by patching with the attached, which changes the cost model for nested loops back to what it was before unique joins. My other line of thought is just not to bother doing anything about this. There's plenty more queries you could handcraft to trick the planner into generating a plan that'll blow up like this. Is this a realistic enough one to bother accounting for? Did it come from a real world case? else, how did you stumble upon it? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: