Re: [HACKERS] please help on query - Mailing list pgsql-sql
From | Luis Alberto Amigo Navarro |
---|---|
Subject | Re: [HACKERS] please help on query |
Date | |
Msg-id | 012101c22998$8bbe5670$cab990c1@atc.unican.es Whole thread Raw |
In response to | Re: [HACKERS] please help on query (Manfred Koizar <mkoi-pg@aon.at>) |
Responses |
Re: [HACKERS] please help on query
|
List | pgsql-sql |
> The cost is now only 1141741215.35 compared to 2777810917708.17 > before; this is an improvement factor of more than 2000. So what's > your problem? ;-) > > Servus > Manfred > In fact planner is estimating incredibly badly, it took only 833msecs now runs perfectly I'm going to keep on asking about another query: SELECTcustomer.name,customer.custkey,orders.orderkey,orders.orderdate,orders.totalprice,sum(lineitem.quantity) FROMcustomer,orders,lineitem WHEREexists( SELECT lineitem.orderkey FROM lineitem WHERE lineitem.orderkey=orders.orderkey GROUP BY lineitem.orderkeyHAVING sum(lineitem.quantity)>300 )AND customer.custkey=orders.custkeyAND orders.orderkey=lineitem.orderkey GROUP BYcustomer.name,customer.custkey,orders.orderkey,orders.orderdate,orders.totalprice ORDER BYorders.totalprice DESC,orders.orderdate; NOTICE: QUERY PLAN: Sort (cost=26923941.97..26923941.97 rows=300061 width=66) -> Aggregate (cost=26851634.86..26896644.05 rows=300061 width=66) -> Group (cost=26851634.86..26889142.52 rows=3000612 width=66) -> Sort (cost=26851634.86..26851634.86rows=3000612 width=66) -> Hash Join (cost=26107574.81..26457309.10 rows=3000612 width=66) -> Seq Scan on lineitem (cost=0.00..222208.25 rows=6001225 width=8) -> Hash (cost=26105699.81..26105699.81 rows=750000 width=58) -> Hash Join (cost=7431.00..26105699.81 rows=750000 width=58) -> Seq Scan on orders (cost=0.00..26083268.81 rows=750000 width=25) SubPlan -> Aggregate (cost=0.00..17.35 rows=1 width=8) -> Group (cost=0.00..17.34 rows=5 width=8) -> Index Scan using lineitem_pkey on lineitem (cost=0.00..17.33 rows=5 width=8) -> Hash (cost=7056.00..7056.00 rows=150000 width=33) -> Seq Scan on customer (cost=0.00..7056.00 rows=150000 width=33) again: orders 1500000 tuples lineitem 6000000 tuples there are 1 to 7 lineitems per orderkey Customer 150000 tuples select attname,n_distinct,correlation from pg_stats where tablename='lineitem'; attname | n_distinct | correlation ---------------+------------+-------------orderkey | -0.199847 | 1partkey | 196448 | 0.0223377suppkey | 9658 | -0.00822751linenumber | 7 | 0.17274quantity | 50 | 0.0150153extendedprice| 25651 | -0.00790245discount | 11 | 0.103761tax | 9 | 0.0993771returnflag | 3 | 0.391434linestatus | 2 | 0.509791shipdate | 2440 | 0.0072777commitdate | 2497 | 0.00698162receiptdate | 2416 | 0.00726686shipinstruct | 4 | 0.241511shipmode | 7 | 0.138432comment | 275488 | 0.0188006 (16 rows) select attname,n_distinct,correlation from pg_stats where tablename='orders'; attname | n_distinct | correlation ---------------+------------+-------------orderkey | -1 | -0.999925custkey | 76309 | 0.00590596orderstatus | 3 | 0.451991totalprice | -1 | -0.00768806orderdate | 2431 | -0.0211354orderpriority| 5 | 0.182489clerk | 1009 | 0.00546939shippriority | 1 | 1comment | -0.750125 | -0.0123887 Customer attname | n_distinct | correlation ------------+------------+-------------custkey | -1 | 1name | -1 | 1address | -1 | -0.00510274nationkey | 25 | 0.0170533phone | -1 | -0.0227816acctbal | -0.83444 | -0.00220958mktsegment | 5 | 0.205013comment | -1 | 0.0327827 This query takes 12 minutes to run and returns about 50 customers. lineitem.quantity takes values from 1 to 50, so 300 per orderkey is very restrictive May someone help on improving performance? Again thanks in advance Regards