Thread: Difference from average

Difference from average

From

Neil Saunders

Date:

11 October 2005, 10:20:17

Hi all,

I'm developing a property rental database. One of the tables tracks
the price per week for different properties:

CREATE TABLE "public"."prices" ( "id" SERIAL, "property_id" INTEGER, "start_date" TIMESTAMP WITHOUT TIME ZONE,
"end_date"TIMESTAMP WITHOUT TIME ZONE, "price" DOUBLE PRECISION NOT NULL 
) WITH OIDS;

CREATE INDEX "prices_idx" ON "public"."prices" USING btree ("property_id");

I'd like to display the prices per property in a table, with each row
coloured different shades; darker shades representing the more
expensive periods for that property. To do this, I propose to
calculate the percentage difference of each rows price from the
average for that property, so if for example I have two rows, one for
price=200 and one for price=300, i'd like to retrieve both records
along with the calculated field indicating that the rows are -20%,
+20% from the average, respectively.

I've started with the following query, but since I'm still learning
how PostgreSQL works, I'm confused as to the efficiency of the
following statement:

SELECT *, (price - (SELECT avg(price) from prices)) as diff FROM prices;

EXPLAIN reveals (albeit not a real test, as only the two rows above)

Seq Scan on prices  (cost=1.03..2.05 rows=2 width=32) InitPlan   ->  Aggregate  (cost=1.03..1.03 rows=1 width=8)
->  Seq Scan on prices  (cost=0.00..1.02 rows=2 width=8) 

Does this mean that I'll be performing a nested table scan every time
I run this query? Also, I haven't yet calculated the percentage
difference for this, which in my eyes means another instance of
"SELECT avg(price) from prices". Is this the best way of doing this?
Can I optimize this away by re-writing this as a function and storing
"SELECT avg(price) from prices)" in a variable?

All opinions gratefully received.

Kind Regards,

Neil

Re: Difference from average

From

Richard Huxton

Date:

11 October 2005, 11:25:39

Neil Saunders wrote:
> Hi all,
> 
> I'm developing a property rental database. One of the tables tracks
> the price per week for different properties:
> 
> CREATE TABLE "public"."prices" (
>   "id" SERIAL,
>   "property_id" INTEGER,
>   "start_date" TIMESTAMP WITHOUT TIME ZONE,
>   "end_date" TIMESTAMP WITHOUT TIME ZONE,
>   "price" DOUBLE PRECISION NOT NULL
> ) WITH OIDS;
> 
> CREATE INDEX "prices_idx" ON "public"."prices"
>   USING btree ("property_id");
> 
> I'd like to display the prices per property in a table, with each row
> coloured different shades; darker shades representing the more
> expensive periods for that property. To do this, I propose to
> calculate the percentage difference of each rows price from the
> average for that property, so if for example I have two rows, one for
> price=200 and one for price=300, i'd like to retrieve both records
> along with the calculated field indicating that the rows are -20%,
> +20% from the average, respectively.
> 
> I've started with the following query, but since I'm still learning
> how PostgreSQL works, I'm confused as to the efficiency of the
> following statement:
> 
> SELECT *, (price - (SELECT avg(price) from prices)) as diff FROM prices;

I'd personally write it something like:

SELECT  prices.property_id,  prices.price AS actual_price,  averages.avg_price,  (averages.avg_price - prices.price) AS
price_diff ((averages.avg_price - prices.price)/averages.avg_price) AS pc_diff
 
FROM  prices,  (SELECT property_id, avg(price) as avg_price FROM prices) AS averages
WHERE  prices.property_id = averages.property_id
;

That's as much to do with how I think about the problem as to any 
testing though.

--   Richard Huxton  Archonet Ltd