Thread: Optimizer regression
Just upgraded to 8.4 (I know, I know…) and ran across this. Unfortunately I have no way to test this on 9.x, so I don't knowif it's been fixed or not. I'm hoping that someone *cough*Tom*cough* would quickly recognize whether this push into subqueryissue has been fixed or not, so I haven't included full details or a test case. I have a work-around so I don't careabout this in 8.4, but if this regression still exists it would be nice if it were fixed. CREATE VIEW loans.payday AS SELECT * FROM loans WHERE loan_type_cd IN ( 'payday', 'other' ); CREATE VIEW loans.payday_defaulted AS SELECT * FROM loans.payday p WHERE EXISTS( SELECT * FROM loan_statuses ls WHERE ls.loan_id= p.id AND ls.status_cd = 'in_default' ); This query is fast: SELECT defaulted_then_paid_loans , ( SELECT count(*) FROM loans.payday WHERE ROW( customer_id, status_cd )= ROW( d.customer_id, d.status_cd ) AND id > coalesce( max_defaulted_loan_id, 0 ) ) AS number_of_loans_since_last_default FROM ( SELECT p.customer_id, p.status_cd, count( d.id ) AS defaulted_then_paid_loans , max( d.id ) AS max_defaulted_loan_id FROM loans.payday p LEFT JOIN loans.payday_defaulted d USING( id ) WHERE d.customer_id= ? GROUP BY p.customer_id, p.status_cd ) d WHERE status_cd = 'paid_off'; This query is not (but was fine on 8.3): SELECT defaulted_then_paid_loans , ( SELECT count(*) FROM loans.payday WHERE ROW( customer_id, status_cd )= ROW( d.customer_id, d.status_cd ) AND id > coalesce( max_defaulted_loan_id, 0 ) ) AS number_of_loans_since_last_default FROM ( SELECT p.customer_id, p.status_cd, count( d.id ) AS defaulted_then_paid_loans , max( d.id ) AS max_defaulted_loan_id FROM loans.payday p LEFT JOIN loans.payday_defaulted d USING( id ) GROUP BYp.customer_id, p.status_cd ) d WHERE status_cd = 'paid_off' AND customer_id = ? ; Plan from the "bad" query on 8.3: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------Subquery Scand (cost=0.00..438.00 rows=2 width=162) (actual time=4883.286..4883.286 rows=1 loops=1) -> GroupAggregate (cost=0.00..421.91rows=2 width=17) (actual time=4883.181..4883.181 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..421.75rows=13 width=17) (actual time=314.426..4883.082 rows=31 loops=1) -> Index Scan using loans_m13on loans (cost=0.00..36.72 rows=13 width=17) (actual time=52.209..561.240 rows=31 loops=1) IndexCond: ((customer_id = 10287151) AND ((status_cd)::text = 'paid_off'::text)) Filter: ((loan_type_cd)::text= ANY ('{payday,cso}'::text[])) -> Index Scan using loans_pkey on loans (cost=0.00..29.61rows=1 width=4) (actual time=139.410..139.410 rows=0 loops=31) Index Cond: (cnu.loans.id= cnu.loans.id) Filter: (((cnu.loans.loan_type_cd)::text = ANY ('{payday,cso}'::text[]))AND (subplan)) SubPlan -> Index Scan using loan_status_u1on loan_statuses ls (cost=0.00..23.43 rows=1 width=88) (actual time=109.521..109.521 rows=0 loops=31) Index Cond: (loan_id = $3) Filter: ((status_cd)::text = 'in_default'::text) SubPlan -> Aggregate (cost=8.03..8.04 rows=1 width=0) (actual time=0.100..0.100 rows=1 loops=1) -> Index Scan using loans_m13 on loans (cost=0.00..8.02 rows=1 width=0) (actual time=0.041..0.084 rows=31loops=1) Index Cond: ((customer_id = $0) AND ((status_cd)::text = ($1)::text)) Filter:(((loan_type_cd)::text = ANY ('{payday,cso}'::text[])) AND (id > COALESCE($2, 0)))Total runtime: 4883.439 ms (19 rows) And from 8.4… QUERY PLAN ----------------------------------------------------------------------------------------------------------------------Subquery Scand (cost=3003014.53..3027074.69 rows=2 width=162) -> GroupAggregate (cost=3003014.53..3027059.89 rows=2 width=17) -> Hash Left Join (cost=3003014.53..3027059.73 rows=13 width=17) Hash Cond: (cnu.loans.id= cnu.loans.id) -> Index Scan using loans_m13 on loans (cost=0.00..36.01 rows=13 width=17) Index Cond: ((customer_id = 10287151) AND ((status_cd)::text = 'paid_off'::text)) Filter:((loan_type_cd)::text = ANY ('{payday,cso}'::text[])) -> Hash (cost=2902187.44..2902187.44 rows=6145607width=4) -> Hash Join (cost=2027941.10..2902187.44 rows=6145607 width=4) Hash Cond: (cnu.loans.id = ls.loan_id) -> Seq Scan on loans (cost=0.00..688340.03 rows=10783881width=4) Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[])) -> Hash (cost=2015760.83..2015760.83 rows=974422 width=4) -> HashAggregate (cost=2006016.61..2015760.83 rows=974422 width=4) -> Seq Scan on loan_statusesls (cost=0.00..1984621.11 rows=8558199 width=4) Filter: ((status_cd)::text= 'in_default'::text) SubPlan 1 -> Aggregate (cost=7.38..7.39 rows=1 width=0) -> IndexScan using loans_m13 on loans (cost=0.00..7.37 rows=1 width=0) Index Cond: ((customer_id = $0) AND ((status_cd)::text= ($1)::text)) Filter: (((loan_type_cd)::text = ANY ('{payday,cso}'::text[])) AND (id >COALESCE($2, 0))) (21 rows) -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Jim Nasby <jim@nasby.net> writes: > Just upgraded to 8.4 (I know, I know�) and ran across this. Unfortunately I have no way to test this on 9.x, so I don'tknow if it's been fixed or not. I'm hoping that someone *cough*Tom*cough* would quickly recognize whether this pushinto subquery issue has been fixed or not, so I haven't included full details or a test case. I have a work-around soI don't care about this in 8.4, but if this regression still exists it would be nice if it were fixed. It's hard to be sure with such an incomplete example, but I think 8.4 is flattening the EXISTS to a semijoin and then getting trapped by join order constraints into doing something less than optimal for this particular use-case. It was this type of example that motivated the "parameterized path" stuff I've been working on for the past couple of years. In short, 9.2 should produce at least as good a plan as 8.3 for this example, but 8.4 through 9.1 might not. BTW, your workaround looks wrong --- you need to constrain the outside of the left join not the inside, no? regards, tom lane
On 10/13/12 2:45 PM, Tom Lane wrote: > BTW, your workaround looks wrong --- you need to constrain the outside > of the left join not the inside, no? Ugh, yes, you're correct. :( -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 10/13/12 2:45 PM, Tom Lane wrote: > Jim Nasby <jim@nasby.net> writes: >> Just upgraded to 8.4 (I know, I know…) and ran across this. Unfortunately I have no way to test this on 9.x, so I don'tknow if it's been fixed or not. I'm hoping that someone *cough*Tom*cough* would quickly recognize whether this pushinto subquery issue has been fixed or not, so I haven't included full details or a test case. I have a work-around soI don't care about this in 8.4, but if this regression still exists it would be nice if it were fixed. > > It's hard to be sure with such an incomplete example, but I think 8.4 is > flattening the EXISTS to a semijoin and then getting trapped by join > order constraints into doing something less than optimal for this > particular use-case. It was this type of example that motivated the > "parameterized path" stuff I've been working on for the past couple > of years. > > In short, 9.2 should produce at least as good a plan as 8.3 for this > example, but 8.4 through 9.1 might not. FWIW, it's definitely an issue of not being able to push down past the GROUP BY: cnuapp_prod@postgres10.obr=# explain WITH default_stats AS (select customer_id, status_cd, count(*), max(id) from loans.payday_defaultedgroup by customer_id, status_cd) SELECT * FROM default_stats where customer_id=10287151; QUERY PLAN ------------------------------------------------------------------------------------------------------------ CTE Scan ondefault_stats (cost=2980046.56..3004313.73 rows=5393 width=162) Filter: (customer_id = 10287151) CTE default_stats -> HashAggregate (cost=2963868.44..2980046.56 rows=1078541 width=17) -> Hash Join (cost=2028045.22..2902409.22rows=6145922 width=17) Hash Cond: (loans.id = ls.loan_id) -> Seq Scan on loans (cost=0.00..688437.25 rows=10785404 width=17) Filter: ((loan_type_cd)::text =ANY ('{payday,cso}'::text[])) -> Hash (cost=2015864.33..2015864.33 rows=974471 width=4) -> HashAggregate (cost=2006119.62..2015864.33 rows=974471 width=4) -> Seq Scan onloan_statuses ls (cost=0.00..1984723.02 rows=8558638 width=4) Filter: ((status_cd)::text= 'in_default'::text) (12 rows) cnuapp_prod@postgres10.obr=# explain analyze select customer_id, status_cd, count(*), max(id) from loans.payday_defaultedwhere customer_id=10287151 group by customer_id, status_cd; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=202.16..202.19 rows=2 width=17) (actual time=0.422..0.422 rows=0 loops=1) -> Nested Loop Semi Join (cost=0.00..202.07 rows=9 width=17) (actual time=0.422..0.422 rows=0 loops=1) -> Index Scan using loans_m12on loans (cost=0.00..41.48 rows=16 width=17) (actual time=0.028..0.121 rows=31 loops=1) Index Cond:(customer_id = 10287151) Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[])) -> IndexScan using loan_statuses__loan_id__status on loan_statuses ls (cost=0.00..10.17 rows=1 width=4) (actual time=0.009..0.009rows=0 loops=31) Index Cond: ((ls.loan_id = loans.id) AND ((ls.status_cd)::text = 'in_default'::text))Total runtime: 0.510 ms (8 rows) cnuapp_prod@postgres10.obr=# I hope that we'll have 9.2 stood up before the year is out, so we'll check this then and see if it's fixed. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Jim Nasby <jim@nasby.net> writes: > FWIW, it's definitely an issue of not being able to push down past the GROUP BY: I think it's not that so much as the EXISTS inside a LEFT JOIN. regards, tom lane
On 10/13/12 3:15 PM, Jim Nasby wrote: > FWIW, it's definitely an issue of not being able to push down past the GROUP BY: I take that back... GROUP BY doesn't matter. It's an issue of having the EXISTS in the inner query. I realize the exampleshave gotten a bit silly, but this seems to break it down to the simplest case of what's happening. FAST: explain analyze SELECT p.customer_id, p.status_cd, EXISTS( SELECT * FROM loan_statusesls WHERE ls.loan_id = p.id AND ls.status_cd = 'in_default' ) FROM loans p LEFT JOIN ( SELECT * FROM loans pWHERE loan_type_cd IN ( 'payday', 'cso' ) ) d USING( id ) WHERE p.customer_id= 10287151 AND p.status_cd = 'paid_off' AND p.loan_type_cd IN ( 'payday', 'cso' ) ; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- NestedLoop Left Join (cost=0.00..234.87 rows=13 width=17) (actual time=0.085..0.861 rows=31 loops=1) -> Index Scan usingloans_m13 on loans p (cost=0.00..36.01 rows=13 width=17) (actual time=0.045..0.137 rows=31 loops=1) Index Cond:((customer_id = 10287151) AND ((status_cd)::text = 'paid_off'::text)) Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[])) -> Index Scan using loans_pkey on loans p (cost=0.00..5.12 rows=1 width=4) (actual time=0.011..0.011rows=1 loops=31) Index Cond: (p.id = p.id) Filter: ((p.loan_type_cd)::text = ANY ('{payday,cso}'::text[])) SubPlan 1 -> Index Scan using loan_statuses__loan_id__status on loan_statuses ls (cost=0.00..10.17rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=31) Index Cond: ((loan_id = $0) AND ((status_cd)::text= 'in_default'::text)) Total runtime: 0.950 ms (11 rows) SLOW: cnuapp_prod@postgres10.obr=# explain SELECT p.customer_id, p.status_cd FROM loans p LEFT JOIN ( SELECT * FROM loans p WHERE loan_type_cd IN ( 'payday','cso' ) AND EXISTS( SELECT * FROM loan_statuses ls WHERE ls.loan_id = p.id AND ls.status_cd = 'in_default' ) ) d USING( id ) WHERE p.customer_id = 10287151 AND p.status_cd = 'paid_of f' AND p.loan_type_cd IN ( 'payday', 'cso' ) ; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Hash Left Join (cost=3003251.16..3027297.36 rows=13 width=13) Hash Cond: (p.id = p.id) -> Index Scan using loans_m13 on loansp (cost=0.00..36.01 rows=13 width=17) Index Cond: ((customer_id = 10287151) AND ((status_cd)::text = 'paid_off'::text)) Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[])) -> Hash (cost=2902419.07..2902419.07rows=6145927 width=4) -> Hash Join (cost=2028047.07..2902419.07 rows=6145927 width=4) Hash Cond: (p.id = ls.loan_id) -> Seq Scan on loans p (cost=0.00..688444.00 rows=10785509width=4) Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[])) -> Hash (cost=2015866.17..2015866.17 rows=974472 width=4) -> HashAggregate (cost=2006121.45..2015866.17rows=974472 width=4) -> Seq Scan on loan_statuses ls (cost=0.00..1984724.84rows=8558646 width=4) Filter: ((status_cd)::text = 'in_default'::text) (14 rows) cnuapp_prod@postgres10.obr=# -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net