Thread: Difference from average
Hi all, I'm developing a property rental database. One of the tables tracks the price per week for different properties: CREATE TABLE "public"."prices" ( "id" SERIAL, "property_id" INTEGER, "start_date" TIMESTAMP WITHOUT TIME ZONE, "end_date"TIMESTAMP WITHOUT TIME ZONE, "price" DOUBLE PRECISION NOT NULL ) WITH OIDS; CREATE INDEX "prices_idx" ON "public"."prices" USING btree ("property_id"); I'd like to display the prices per property in a table, with each row coloured different shades; darker shades representing the more expensive periods for that property. To do this, I propose to calculate the percentage difference of each rows price from the average for that property, so if for example I have two rows, one for price=200 and one for price=300, i'd like to retrieve both records along with the calculated field indicating that the rows are -20%, +20% from the average, respectively. I've started with the following query, but since I'm still learning how PostgreSQL works, I'm confused as to the efficiency of the following statement: SELECT *, (price - (SELECT avg(price) from prices)) as diff FROM prices; EXPLAIN reveals (albeit not a real test, as only the two rows above) Seq Scan on prices (cost=1.03..2.05 rows=2 width=32) InitPlan -> Aggregate (cost=1.03..1.03 rows=1 width=8) -> Seq Scan on prices (cost=0.00..1.02 rows=2 width=8) Does this mean that I'll be performing a nested table scan every time I run this query? Also, I haven't yet calculated the percentage difference for this, which in my eyes means another instance of "SELECT avg(price) from prices". Is this the best way of doing this? Can I optimize this away by re-writing this as a function and storing "SELECT avg(price) from prices)" in a variable? All opinions gratefully received. Kind Regards, Neil
Neil Saunders wrote: > Hi all, > > I'm developing a property rental database. One of the tables tracks > the price per week for different properties: > > CREATE TABLE "public"."prices" ( > "id" SERIAL, > "property_id" INTEGER, > "start_date" TIMESTAMP WITHOUT TIME ZONE, > "end_date" TIMESTAMP WITHOUT TIME ZONE, > "price" DOUBLE PRECISION NOT NULL > ) WITH OIDS; > > CREATE INDEX "prices_idx" ON "public"."prices" > USING btree ("property_id"); > > I'd like to display the prices per property in a table, with each row > coloured different shades; darker shades representing the more > expensive periods for that property. To do this, I propose to > calculate the percentage difference of each rows price from the > average for that property, so if for example I have two rows, one for > price=200 and one for price=300, i'd like to retrieve both records > along with the calculated field indicating that the rows are -20%, > +20% from the average, respectively. > > I've started with the following query, but since I'm still learning > how PostgreSQL works, I'm confused as to the efficiency of the > following statement: > > SELECT *, (price - (SELECT avg(price) from prices)) as diff FROM prices; I'd personally write it something like: SELECT prices.property_id, prices.price AS actual_price, averages.avg_price, (averages.avg_price - prices.price) AS price_diff ((averages.avg_price - prices.price)/averages.avg_price) AS pc_diff FROM prices, (SELECT property_id, avg(price) as avg_price FROM prices) AS averages WHERE prices.property_id = averages.property_id ; That's as much to do with how I think about the problem as to any testing though. -- Richard Huxton Archonet Ltd