When to store data that could be derived - Mailing list pgsql-general
From | Frank |
---|---|
Subject | When to store data that could be derived |
Date | |
Msg-id | a2b19f30-9e1d-aea5-7b33-427047522461@chagford.com Whole thread Raw |
Responses |
Re: When to store data that could be derived
Re: When to store data that could be derived |
List | pgsql-general |
Hi all As I understand it, a general rule of thumb is that you should never create a physical column if the data could be derived from existing columns. A possible reason for breaking this rule is for performance reasons. I have a situation where I am considering breaking the rule, but I am not experienced enough in SQL to know if my reason is valid. I would appreciate it if someone could glance at my 'before' and 'after' scenarios and see if, from a 'gut-feel' point of view, I should proceed. I have a VIEW constructed as follows - CREATE VIEW view_name AS [select statement 1] UNION ALL [select statement 2] etc. This is one of the select statements. I will give the 'after' scenario first - SELECT 'arec' AS tran_type, a.row_id AS tran_row_id, a.tran_number AS tran_number, a.cust_row_id AS cust_row_id, CASE WHEN a.tran_type = 'ar_rec' THEN y.tran_date WHEN a.tran_type = 'cb_rec' THEN w.tran_date END AS tran_date, CASE WHEN a.tran_type = 'ar_rec' THEN y.text WHEN a.tran_type = 'cb_rec' THEN w.text END AS text, 0 - a.arec_cust AS amount_cust, 0 - a.arec_local AS amount_local FROM ar_rec_subtran a LEFT JOIN ar_tran_rec_det z ON z.row_id = a.tran_det_row_id LEFT JOIN ar_tran_rec y ON y.row_id = z.tran_row_id LEFT JOIN cb_tran_rec_det x ON x.row_id = a.tran_det_row_id LEFT JOIN cb_tran_rec w ON w.row_id = x.tran_row_id WHERE CASE WHEN a.tran_type = 'ar_rec' THEN y.posted WHEN a.tran_type = 'cb_rec' THEN w.posted END = '1' The two columns a.arec_cust and a.arec_local *can* be derived from other columns, and in fact that is how it is working at the moment, so here is the 'before' scenario - SELECT 'arec' AS tran_type, a.row_id AS tran_row_id, a.tran_number AS tran_number, a.cust_row_id AS cust_row_id, CASE WHEN a.tran_type = 'ar_rec' THEN y.tran_date WHEN a.tran_type = 'cb_rec' THEN w.tran_date END AS tran_date, CASE WHEN a.tran_type = 'ar_rec' THEN y.text WHEN a.tran_type = 'cb_rec' THEN w.text END AS text, ROUND(0 - (ROUND(a.arec_amount / CASE WHEN a.tran_type = 'ar_rec' THEN y.tran_exch_rate WHEN a.tran_type = 'cb_rec' THEN w.tran_exch_rate END * a.cust_exch_rate, u.scale)), 2) AS amount_cust, ROUND(0 - (ROUND(a.arec_amount / CASE WHEN a.tran_type = 'ar_rec' THEN y.tran_exch_rate WHEN a.tran_type = 'cb_rec' THEN w.tran_exch_rate END, s.scale)), 2) AS amount_local FROM ar_rec_subtran a LEFT JOIN ar_tran_rec_det z ON z.row_id = a.tran_det_row_id LEFT JOIN ar_tran_rec y ON y.row_id = z.tran_row_id LEFT JOIN cb_tran_rec_det x ON x.row_id = a.tran_det_row_id LEFT JOIN cb_tran_rec w ON w.row_id = x.tran_row_id LEFT JOIN ar_customers v ON v.row_id = a.cust_row_id LEFT JOIN adm_currencies u ON u.row_id = v.currency_id LEFT JOIN adm_params t ON t.row_id = 1 LEFT JOIN adm_currencies s ON s.row_id = t.local_curr_id WHERE CASE WHEN a.tran_type = 'ar_rec' THEN y.posted WHEN a.tran_type = 'cb_rec' THEN w.posted END = '1' As you can see, complexity has increased and there are four additional JOINs. I am expecting the VIEW to be used extensively for query purposes, and my gut-feel says that the second one is likely to lead to performance problems in a system with a lot of data and a lot of users. I am not looking for an answer - I know that I should create dummy data and run some timing tests. I was just wondering if someone more experienced would wince when they look at the second SELECT, or if they would shrug and think that it looks fine. Any input will be appreciated. Frank Millman
pgsql-general by date: