Thread: wrong rows estimation by hash join
How does hash join estimation rows ? pg v14, it make wrong rows estimation then leave nest loop lef join that make poor sql plan. A
-> Nested Loop Left Join (cost=171112.69..475856.90 rows=1 width=521)
-> Nested Loop Left Join (cost=171111.31..474489.54 rows=1 width=423)
-> Hash Join (cost=171110.76..474488.93 rows=1 width=257) <<< here , actually the rows is 98000 ,but optimizer returns
Hash Cond: (((ccsm.xxx_id)::text = (cc.xxx_id)::text) AND ((ccsm.xxx_key)::text = (cc.account_key)::text)) <<< ccsm.xx_id and ccsm.xx_key are part of primary key.
-> Seq Scan on cs_xxxxx ccsm (cost=0.00..254328.08 rows=4905008 width=201)
-> Hash (cost=167540.92..167540.92 rows=237989 width=115)
-> Index Scan using cs_xxxx_test on cs_contract cc (cost=0.43..167540.92 rows=237989 width=115)
Index Cond: ((xx_to > CURRENT_DATE) AND ((status)::text = ANY ('{Active,Inactive,Pending}'::text[])))
-> Index Scan using cs_xxx_pk on cs_site cs (cost=0.56..0.61 rows=1 width=203)
Index Cond: ((xxx_key)::text = (ccsm.xxx_key)::text)
Thanks,
James
Hi, On 6/9/23 10:36, James Pang (chaolpan) wrote: > How does hash join estimation rows ? pg v14, it make wrong rows > estimation then leave nest loop lef join that make poor sql plan. A > I doubt this is specific to hashjoins, we estimate cardinality the same way for all joins (or more precisely, we estimate it before picking the particular join method). I'm just guessing, but I'd bet the join condition is correlated with the filter on cs_contract: > -> Index > Scan using cs_xxxx_test on cs_contract cc (cost=0.43..167540.92 > rows=237989 width=115) > Index Cond: ((xx_to > CURRENT_DATE) AND ((status)::text = ANY > ('{Active,Inactive,Pending}'::text[]))) > If you remove that condition, does the estimate improve? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company