Re: Another Plpgsql trigger example - summary table - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: Another Plpgsql trigger example - summary table |
Date | |
Msg-id | 200501040408.j0448O200354@candle.pha.pa.us Whole thread Raw |
In response to | Another Plpgsql trigger example - summary table (Mark Kirkwood <markir@coretech.co.nz>) |
Responses |
Re: Another Plpgsql trigger example - summary table
|
List | pgsql-patches |
Wow, yea, that is long. Not sure where that should go. --------------------------------------------------------------------------- Mark Kirkwood wrote: > I have always thought that an example of how to maintain a summary table > via triggers would be nice... but until the other day, had not attempted > to do one, so contributing a simplified version seemed like a good thing > to do. > > I have made the example pretty much self contained, which unfortunately > means it is longish. This made me wonder about its placement (i.e in > plpgsql examples). It could go in a 'Data warehousing' chapter - if we > had one.... > > Any suggestions welcome. > > best wishes > > Mark > > P.s : use is made of a schema from Ralph Kimball's "The Data Warehouse > Toolkit" - I mailed him to check it was ok (and it was, in fact he was > quite pleased) > --- plpgsql.sgml.orig 2004-12-29 15:48:53.089973005 +1300 > +++ plpgsql.sgml 2004-12-29 12:43:50.000000000 +1300 > @@ -2632,6 +2632,306 @@ > ; > </programlisting> > </example> > + > + <para> > + An area where triggers can be useful is maintaining a summary table > + of another table. The resulting summary can be used in place of the > + original table for certain queries - with often vastly reduced run > + times. > + </para> > + > + <para> > + This technique is commonly used in Data Warehousing, where the tables > + of measured or observed data (called fact tables) can be extremely large. > + </para> > + > + <example id="plpgsql-trigger-summary-example"> > + <title>A <application>PL/pgSQL</application> Trigger Procedure For Maintaining A Summary Table</title> > + > + <para> > + <xref linkend="plpgsql-trigger-summary-example"> shows an example of a > + trigger procedure in <application>PL/pgSQL</application> that maintains > + a summary table for a fact table in a data warehouse. > + </para> > + > + <para> > + The schema detailed here is loosely based on the <emphasis>Grocery Store > + </emphasis> example from <emphasis>The Data Warehouse Toolkit</emphasis> > + by Ralph Kimball. > + </para> > + > +<programlisting> > +-- > +-- Three dimension tables. > +-- > +CREATE TABLE time_dimension ( > + time_key integer NOT NULL, > + day_of_week integer NOT NULL, > + day_of_month integer NOT NULL, > + month integer NOT NULL, > + quarter integer NOT NULL, > + year integer NOT NULL > +); > + > +CREATE TABLE product_dimension ( > + product_key integer NOT NULL, > + description varchar(100) NOT NULL, > + brand varchar(50) NOT NULL, > + catageory varchar(20) NOT NULL > +); > + > +CREATE TABLE store_dimension ( > + store_key integer NOT NULL, > + store_name varchar(100) NOT NULL, > + address varchar(100) NOT NULL > +); > + > + > +-- > +-- Sales fact. > +-- > +CREATE TABLE sales_fact ( > + time_key integer NOT NULL, > + product_key integer NOT NULL, > + store_key integer NOT NULL, > + amount_sold numeric(12,2) NOT NULL, > + units_sold integer NOT NULL, > + amount_cost numeric(12,2) NOT NULL > +); > + > + > +-- > +-- Sales summary. > +-- > +CREATE TABLE sales_summary_bytime ( > + time_key integer NOT NULL, > + amount_sold numeric(15,2) NOT NULL, > + units_sold numeric(12) NOT NULL, > + amount_cost numeric(15,2) NOT NULL > +); > + > +-- > +-- COPY in data. > +-- > +COPY time_dimension FROM '/var/dump/time.dat' DELIMITERS ','; > +COPY product_dimension FROM '/var/dump/prod.dat' DELIMITERS ','; > +COPY store_dimension FROM '/var/dump/store.dat' DELIMITERS ','; > +COPY sales_fact FROM '/var/dump/sales.dat' DELIMITERS ','; > + > + > +-- > +-- Create indexes on the dimensions, facts and summary. > +-- > +CREATE UNIQUE INDEX time_dimension_key ON time_dimension(time_key); > +CREATE INDEX time_dimension_year ON time_dimension(year); > +CREATE UNIQUE INDEX product_dimension_key ON product_dimension(product_key); > +CREATE UNIQUE INDEX store_dimension_key ON store_dimension(store_key); > + > +CREATE INDEX sales_fact_time ON sales_fact(time_key); > +CREATE INDEX sales_fact_product ON sales_fact(product_key); > +CREATE INDEX sales_fact_store ON sales_fact(store_key); > + > +CREATE UNIQUE INDEX sales_summary_bytime_key ON sales_summary_bytime(time_key); > + > + > +-- > +-- Collect statistics for the optimizer. > +-- > +ANALYZE VERBOSE time_dimension; > +ANALYZE VERBOSE product_dimension; > +ANALYZE VERBOSE store_dimension; > +ANALYZE VERBOSE sales_fact; > + > + > +-- > +-- Pre populate (and collect statistics for) the summary table. > +-- > +INSERT INTO sales_summary_bytime ( > + time_key, > + amount_sold, > + units_sold, > + amount_cost) > + SELECT f.time_key, > + sum(f.amount_sold), > + sum(f.units_sold), > + sum(f.amount_cost) > + FROM sales_fact f > + GROUP BY f.time_key; > + > +ANALYZE VERBOSE sales_summary_bytime; > + > + > +-- > +-- Function to amend summarized column(s) on UPDATE, INSERT, DELETE. > +-- > +CREATE OR REPLACE FUNCTION maint_sales_summary_bytime() RETURNS TRIGGER AS $maint_sales_summary_bytime$ > + DECLARE > + delta_time_key integer; > + delta_amount_sold numeric(15,2); > + delta_units_sold numeric(12); > + delta_amount_cost numeric(15,2); > + BEGIN > + > + -- Work out the increment/decrement amount(s). > + IF (TG_OP = 'DELETE') THEN > + > + delta_time_key = OLD.time_key; > + delta_amount_sold = -1 * OLD.amount_sold; > + delta_units_sold = -1 * OLD.units_sold; > + delta_amount_cost = -1 * OLD.amount_cost; > + > + ELSIF (TG_OP = 'UPDATE') THEN > + > + -- forbid updates that change the time_key - > + -- (probably not too onerous, as DELETE + INSERT is how most > + -- changes will be made). > + IF ( OLD.time_key != NEW.time_key) THEN > + RAISE EXCEPTION 'Update of time_key : % -> % not allowed', OLD.time_key, NEW.time_key; > + END IF; > + > + delta_time_key = OLD.time_key; > + delta_amount_sold = NEW.amount_sold - OLD.amount_sold; > + delta_units_sold = NEW.units_sold - OLD.units_sold; > + delta_amount_cost = NEW.amount_cost - OLD.amount_cost; > + > + ELSIF (TG_OP = 'INSERT') THEN > + > + delta_time_key = NEW.time_key; > + delta_amount_sold = NEW.amount_sold; > + delta_units_sold = NEW.units_sold; > + delta_amount_cost = NEW.amount_cost; > + > + END IF; > + > + > + -- Update the summary row with the new values. > + UPDATE sales_summary_bytime > + SET amount_sold = amount_sold + delta_amount_sold, > + units_sold = units_sold + delta_units_sold, > + amount_cost = amount_cost + delta_amount_cost > + WHERE time_key = delta_time_key; > + > + > + -- There might have been no row with this time_key (e.g new data!). > + IF (NOT FOUND) THEN > + BEGIN > + INSERT INTO sales_summary_bytime ( > + time_key, > + amount_sold, > + units_sold, > + amount_cost) > + SELECT f.time_key, > + sum(f.amount_sold), > + sum(f.units_sold), > + sum(f.amount_cost) > + FROM sales_fact f > + WHERE f.time_key = delta_time_key > + GROUP BY f.time_key; > + -- This query can potentially be very expensive if the trigger > + -- is created on sales_fact without the time_key indexes. > + -- Some care is needed to ensure that this situation does > + -- *not* occur. > + EXCEPTION > + -- > + -- Catch race condition when two transactions are adding data > + -- for a new time_key. > + -- > + WHEN UNIQUE_VIOLATION THEN > + UPDATE sales_summary_bytime > + SET amount_sold = amount_sold + delta_amount_sold, > + units_sold = units_sold + delta_units_sold, > + amount_cost = amount_cost + delta_amount_cost > + WHERE time_key = delta_time_key; > + > + END; > + END IF; > + RETURN NULL; > + > + END; > +$maint_sales_summary_bytime$ LANGUAGE plpgsql; > + > + > +-- > +-- The trigger. > +-- > +CREATE TRIGGER maint_sales_summary_bytime > +AFTER INSERT OR UPDATE OR DELETE ON sales_fact > + FOR EACH ROW EXECUTE PROCEDURE maint_sales_summary_bytime() > +; > +</programlisting> > + <para> > + The effectiveness of the summary technique can be easily demonstrated. In > + the first case shown below, the base fact table is used. In the second the > + summary table is substituted. The run times and plans shown are real. > + </para> > +<programlisting> > +dwexample=# > +SELECT > + d0.quarter, > + sum(f.amount_sold) > +FROM > + time_dimension d0, > + sales_fact f > +WHERE d0.time_key = f.time_key > +AND d0.year = 2004 > +GROUP BY > + d0.quarter > +; > + quarter | sum > +---------+------------- > + 2 | 90000000.00 > + 1 | 90000000.00 > + 4 | 90000000.00 > + 3 | 90000000.00 > +(4 rows) > + > +Time: 2898.236 ms > + QUERY PLAN > +----------------------------------------------------------------------------------------------------- > + HashAggregate (cost=53237.46..53237.46 rows=1 width=14) > + -> Nested Loop (cost=0.00..51447.46 rows=358001 width=14) > + -> Index Scan using time_year on time_dimension d0 (cost=0.00..9.83 rows=358 width=8) > + Index Cond: ("year" = 2004) > + -> Index Scan using sales_fact_time on sales_fact f (cost=0.00..96.72 rows=3757 width=14) > + Index Cond: ("outer".time_key = f.time_key) > +(6 rows) > + > + > +dwexample=# > +SELECT > + d0.quarter, > + sum(f.amount_sold) > +FROM > + time_dimension d0, > + sales_summary_bytime f > +WHERE d0.time_key = f.time_key > +AND d0.year = 2004 > +GROUP BY > + d0.quarter > +; > + quarter | sum > +---------+------------- > + 2 | 90000000.00 > + 1 | 90000000.00 > + 4 | 90000000.00 > + 3 | 90000000.00 > +(4 rows) > + > +Time: 28.459 ms > + QUERY PLAN > +------------------------------------------------------------------------------------------------------- > + HashAggregate (cost=260.10..260.10 rows=1 width=14) > + -> Hash Join (cost=10.72..258.31 rows=358 width=14) > + Hash Cond: ("outer".time_key = "inner".time_key) > + -> Seq Scan on sales_summary_bytime f (cost=0.00..194.00 rows=10000 width=14) > + -> Hash (cost=9.83..9.83 rows=358 width=8) > + -> Index Scan using time_year on time_dimension d0 (cost=0.00..9.83 rows=358 width=8) > + Index Cond: ("year" = 2004) > +(7 rows) > + > + > +</programlisting> > + </example> > </sect1> > > <!-- **** Porting from Oracle PL/SQL **** --> > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
pgsql-patches by date: