Thread: BUG #3958: Self-Join Group-By Clause Produces Incorrect Results
The following bug has been logged online: Bug reference: 3958 Logged by: Marcus Torres Email address: marcsf23@yahoo.com PostgreSQL version: 8.19 Operating system: Linux - Ubuntu Description: Self-Join Group-By Clause Produces Incorrect Results Details: This is a repost of my original bug with self-contained sql to reproduce the problem: I wrote a simple self-join query to sum the transaction count of different types of records in a audit table and the result set for the different sum totals was the same which is incorrect. SQL: ------ DROP TABLE T_AUDIT; DROP TABLE T_POLICY; CREATE TABLE T_AUDIT ( ID integer NOT NULL, POLICY_ID integer NOT NULL, AUDIT_DATE date NOT NULL, AUDIT_TYPE_CODE character varying(50) NOT NULL, TXN_COUNT integer NOT NULL DEFAULT 1 ) WITHOUT OIDS TABLESPACE pg_default; ALTER TABLE t_audit OWNER TO postgres; GRANT ALL ON TABLE t_audit TO public; CREATE TABLE T_POLICY ( ID integer NOT NULL, CONTENT_POLICY_NAME character varying(50) NOT NULL ) WITHOUT OIDS TABLESPACE pg_default; ALTER TABLE t_audit OWNER TO postgres; GRANT ALL ON TABLE t_policy TO public; INSERT INTO T_POLICY VALUES (100, 'TEST POLICY'); INSERT INTO T_AUDIT VALUES (1000, 100, '2008-01-01','CONTENT_1',1); INSERT INTO T_AUDIT VALUES (1001, 100, '2008-01-01','CONTENT_1',1); INSERT INTO T_AUDIT VALUES (1002, 100, '2008-01-01','CONTENT_1',1); INSERT INTO T_AUDIT VALUES (1003, 100, '2008-01-01','CONTENT_1',1); INSERT INTO T_AUDIT VALUES (1004, 100, '2008-01-01','CONTENT_1',1); INSERT INTO T_AUDIT VALUES (1005, 100, '2008-01-01','CONTENT_2',1); INSERT INTO T_AUDIT VALUES (1006, 100, '2008-01-01','CONTENT_2',1); SELECT A1.AUDIT_DATE, P.CONTENT_POLICY_NAME, SUM(A1.TXN_COUNT) AS SUM_1, SUM(A2.TXN_COUNT) AS SUM_2 FROM T_AUDIT A1, T_AUDIT A2, T_POLICY P WHERE P.ID = A1.POLICY_ID AND P.ID = A2.POLICY_ID AND A1.POLICY_ID = A2.POLICY_ID AND A1.AUDIT_DATE = A2.AUDIT_DATE AND A1.AUDIT_TYPE_CODE = 'CONTENT_1' AND A2.AUDIT_TYPE_CODE = 'CONTENT_2' GROUP BY A1.AUDIT_DATE, P.CONTENT_POLICY_NAME;
Marcus Torres wrote: > The following bug has been logged online: > > Bug reference: 3958 > Logged by: Marcus Torres > Email address: marcsf23@yahoo.com > PostgreSQL version: 8.19 > Operating system: Linux - Ubuntu > Description: Self-Join Group-By Clause Produces Incorrect Results > Details: > > This is a repost of my original bug with self-contained sql to reproduce the > problem: > > I wrote a simple self-join query to sum the transaction count of different > types of records in a audit table and the result set for the different sum > totals was the same which is incorrect. Looks perfectly correct to me. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Marcus Torres wrote: >> I wrote a simple self-join query to sum the transaction count of different >> types of records in a audit table and the result set for the different sum >> totals was the same which is incorrect. > Looks perfectly correct to me. Me too. The underlying data before grouping/aggregation is regression=# select A1.AUDIT_DATE, P.CONTENT_POLICY_NAME, A1.TXN_COUNT, A2.TXN_COUNT FROM T_AUDIT A1, T_AUDIT A2, T_POLICY P WHERE P.ID = A1.POLICY_ID AND P.ID = A2.POLICY_ID AND A1.POLICY_ID = A2.POLICY_ID AND A1.AUDIT_DATE = A2.AUDIT_DATE AND A1.AUDIT_TYPE_CODE = 'CONTENT_1' AND A2.AUDIT_TYPE_CODE = 'CONTENT_2'; audit_date | content_policy_name | txn_count | txn_count ------------+---------------------+-----------+----------- 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 (10 rows) from which it's clear that given all ones in txn_count, the sums *must* be the same because they're taken over the same number of rows. I suspect what the OP needs is two separate queries (perhaps union'ed together) not a self-join. regards, tom lane
Please disregard the previous email. After rereading what you sent, I realized that I need an outer join to A2 and not simplya self join...thanks and my apologies! ----- Original Message ---- From: Tom Lane <tgl@sss.pgh.pa.us> To: Heikki Linnakangas <heikki@enterprisedb.com> Cc: Marcus Torres <marcsf23@yahoo.com>; pgsql-bugs@postgresql.org Sent: Wednesday, February 13, 2008 7:40:32 AM Subject: Re: [BUGS] BUG #3958: Self-Join Group-By Clause Produces Incorrect Results "Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Marcus Torres wrote: >> I wrote a simple self-join query to sum the transaction count of different >> types of records in a audit table and the result set for the different sum >> totals was the same which is incorrect. > Looks perfectly correct to me. Me too. The underlying data before grouping/aggregation is regression=# select A1.AUDIT_DATE, P.CONTENT_POLICY_NAME, A1.TXN_COUNT, A2.TXN_COUNT FROM T_AUDIT A1, T_AUDIT A2, T_POLICY P WHERE P.ID = A1.POLICY_ID AND P.ID = A2.POLICY_ID AND A1.POLICY_ID = A2.POLICY_ID AND A1.AUDIT_DATE = A2.AUDIT_DATE AND A1.AUDIT_TYPE_CODE = 'CONTENT_1' AND A2.AUDIT_TYPE_CODE = 'CONTENT_2'; audit_date | content_policy_name | txn_count | txn_count ------------+---------------------+-----------+----------- 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 (10 rows) from which it's clear that given all ones in txn_count, the sums *must* be the same because they're taken over the same number of rows. I suspect what the OP needs is two separate queries (perhaps union'ed together) not a self-join. regards, tom lane ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs
I respectfully challenge that the aggregation is correct. In the where clause, I specify A2.AUDIT_TYPE_CODE = CONTENT_2, thus returning only 2 rows for A2 and not all of the rowsin A2 which happen to have a TXN_COUNT of 1 / row but could in fact be any positive number. I used 1 for simplicity. Similarly, if you take out A1 from the query you receive the following result: audit_date | content_policy_name | sum_2 2008-01-01 | TEST POLICY | 2 I do not see how/why a self-join changes the condition specified in the where clause and thus returns a sum of 8 rows thatdo not meet the specified condition in the query? Thanks in advance, Marcus Torres ----- Original Message ---- From: Tom Lane <tgl@sss.pgh.pa.us> To: Heikki Linnakangas <heikki@enterprisedb.com> Cc: Marcus Torres <marcsf23@yahoo.com>; pgsql-bugs@postgresql.org Sent: Wednesday, February 13, 2008 7:40:32 AM Subject: Re: [BUGS] BUG #3958: Self-Join Group-By Clause Produces Incorrect Results "Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Marcus Torres wrote: >> I wrote a simple self-join query to sum the transaction count of different >> types of records in a audit table and the result set for the different sum >> totals was the same which is incorrect. > Looks perfectly correct to me. Me too. The underlying data before grouping/aggregation is regression=# select A1.AUDIT_DATE, P.CONTENT_POLICY_NAME, A1.TXN_COUNT, A2.TXN_COUNT FROM T_AUDIT A1, T_AUDIT A2, T_POLICY P WHERE P.ID = A1.POLICY_ID AND P.ID = A2.POLICY_ID AND A1.POLICY_ID = A2.POLICY_ID AND A1.AUDIT_DATE = A2.AUDIT_DATE AND A1.AUDIT_TYPE_CODE = 'CONTENT_1' AND A2.AUDIT_TYPE_CODE = 'CONTENT_2'; audit_date | content_policy_name | txn_count | txn_count ------------+---------------------+-----------+----------- 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 2008-01-01 | TEST POLICY | 1 | 1 (10 rows) from which it's clear that given all ones in txn_count, the sums *must* be the same because they're taken over the same number of rows. I suspect what the OP needs is two separate queries (perhaps union'ed together) not a self-join. regards, tom lane ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs