Thread: How to idenity duplicate rows
Can anybody assist with this problem I have a table that has some duplicated rows of data, I want to place a unique constraint on the columns userid and procno to eliminate this problem in the future but how do I identify and get rid of the existing duplication.
Thanks in advance
Regards
David Inglis
David Inglis wrote: > Can anybody assist with this problem I have a table that has some > duplicated rows of data, I want to place a unique constraint on the > columns userid and procno to eliminate this problem in the future but > how do I identify and get rid of the existing duplication. To find them, something like SELECT a, b, c FROM table GROUP BY a, b, c HAVING count(*) > 1; comes to mind, where you have to list all columns of the table in place of a, b, c. As for deleting all but one row in a duplicated group, you're going to have to get at them by the oid or ctid columns perhaps. The other idea is to run CREATE TABLE newtable AS SELECT DISTINCT * FROM oldtable;. -- Peter Eisentraut http://developer.postgresql.org/~petere/
As for deleting all but one row in a duplicated group, you're going to
have to get at them by the oid or ctid columns perhaps.
The other idea is to run CREATE TABLE newtable AS SELECT DISTINCT * FROM
oldtable;.
I believe getting oid and/or ctid is not possible since it would not display/get duplicate records
in a "HAVING CLAUSE" since their oid/ctid wouldn't be the same.
And creating a newtable and use SELECT DISTINCT isn't an acceptable idea to the audit team when you have millions of records in a table in production server when you only need to remove let say 70 records from that table.
I tried another approach where I queried and insert the duplicate record (35 records) into
a new/temporary table. Then I created a stored procedure something like this:
--START
DECLARE
foo table;
BEGIN
FOR foo IN
SELECT * FROM newtable
LOOP
DELETE FROM oldtable
where oldtable.field1 = foo.field1
and oldtable.field2 = foo.field2
and oldtable.field3 = foo.field3
and oldtable.field4 = foo.field4
...';
END LOOP;
END;
-- END
Problem with this approach I got 35 duplicate records (count = 2)
from the new table and delete only 20 records from the oldtable.
If I am not mistaken it should delete 70 records.
I wonder
Peter Eisentraut wrote: >David Inglis wrote: > > >>Can anybody assist with this problem I have a table that has some >>duplicated rows of data, I want to place a unique constraint on the >>columns userid and procno to eliminate this problem in the future but >>how do I identify and get rid of the existing duplication. >> >> > >To find them, something like > >SELECT a, b, c FROM table GROUP BY a, b, c HAVING count(*) > 1; > >comes to mind, where you have to list all columns of the table in place >of a, b, c. > >As for deleting all but one row in a duplicated group, you're going to >have to get at them by the oid or ctid columns perhaps. > >The other idea is to run CREATE TABLE newtable AS SELECT DISTINCT * FROM >oldtable;. > > This doesn't bring over to the new table any foreign key relationships or triggers. Another approach (if you don't have OID's) is to create uniqueness by appending a column to the table, populating it with sequential integers. Then you proceed as otherwise suggested above by using aggregation to identify the duplicated rows.