Re: identifying duplicates in table with redundancies - Mailing list pgsql-sql
From | Oliver d'Azevedo Christina |
---|---|
Subject | Re: identifying duplicates in table with redundancies |
Date | |
Msg-id | 065CBEA1-1647-4233-BBB6-48E879FD91B0@gmail.com Whole thread Raw |
In response to | Re: identifying duplicates in table with redundancies ("Tarlika Elisabeth Schmitz" <postgresql@numerixtechnology.de>) |
Responses |
Re: identifying duplicates in table with redundancies
Re: identifying duplicates in table with redundancies |
List | pgsql-sql |
Hey,Tarlika. I tried to reproduce your test case through a series of inserts and It seems that the lower case "d" went unnoticed. That explains the empty list i got as result. My fault. Sorry :-( Great to hear it helped you Best, Oliveiros Enviado de meu iPhone Em 24/09/2010, às 05:12 PM, "Tarlika Elisabeth Schmitz" <postgresql@numerixtechnology.d e> escreveu: > Dear Oliveiros, > Thank you for taking the time to help. > > On Fri, 24 Sep 2010 11:22:21 +0100 > "Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt> wrote: > >> ----- Original Message ----- >> From: "Tarlika Elisabeth Schmitz" <postgresql@numerixtechnology.de> >> To: <pgsql-sql@postgresql.org> >> Sent: Thursday, September 23, 2010 10:39 PM >> Subject: [SQL] identifying duplicates in table with redundancies >> >> >>> [...] I want to check for duplicates: >>> >>> 1) multiples trainer names for same trainer id >>> 2) multiple trainer ids for same trainer name >>> >>> I cobbled together the SQL and it does the job but it seems rather >>> convoluted. I would like to know how I can improve it. >>> >>> CREATE TABLE student ( >>> id INTEGER NOT NULL, >>> name VARCHAR(256) NOT NULL, >>> trainer_id INTEGER, >>> trainer_name VARCHAR(256), >>> ); >>> >>> ==== >>> EXAMPLE DATA >>> >>> 22 John 1 Macdonald >>> 23 Jane 1 MacDonald >>> 24 Paul 1 MacDonald >>> 25 Dick 2 Smith >>> 26 Bill 3 Smith >>> 27 Kate 3 Smith >>> ==== >>> >>> select trainer_id, trainer_name from >>> ( >>> select distinct on (trainer_name) trainer_id, trainer_name >>> from student >>> where trainer_id in >>> ( >>> select distinct on (id) id >>> from >>> ( >>> select distinct on (trainer_id,trainer_name) >>> trainer_id as id, >>> trainer_name as name from student >>> ) as trainer >>> group by trainer.id >>> having count (trainer.name) > 1 >>> ) >>> ) as y >>> order by trainer_id > > >> Howdy, Tarlika. >> >> First, did you past correctly your query into your mail? >> I am asking this because your query doesn't seem work for me, it >> returns an empty list :-| >> >> Your most nested query, [...] >> returns this >> >> 1|"MacDonald" >> 2|"Smith" >> 3|"Smith" > > > For me, the innermost query returns: > 1|"Macdonald" > 1|"MacDonald" > 2|"Smith" > 3|"Smith" > > (note the lower/uppercase "d" in MacDonald) > > The whole query returns: > 1|"Macdonald" > 1|"MacDonald" > > >>> 1) multiples trainer names for same trainer id >>> 2) multiple trainer ids for same trainer name >> >> To achieve 2) I would use this query >> >> SELECT DISTINCT trainer_id,trainer_name >> FROM ( >> SELECT trainer_name -- The field you want to test for duplicates >> FROM ( >> SELECT DISTINCT "trainer_id","trainer_name" >> FROM student >> ) x >> GROUP BY "trainer_name" -- the field you want to test for >> duplicates >> HAVING (COUNT(*) > 1) >> ) z >> NATURAL JOIN student y > > I see my 2 innermost queries are the same as yours, just a bit more > wordy. I messed up at the third query, which threw up an error when I > tried to add an ORDER BY. > >> It will give you a list of the trainer names who have more than one >> trainer ID and the respective trainer IDS. >> For your particular example data result will be >> 2|"Smith" >> 3|"Smith" > > Splendid! Just what I wanted. > >> As Smith is the only trainer with records with diferent trainer IDs. >> Question : Can this be what you want? > > The real table has 250000 entries and quite a few dups. > >> If you want to achieve 1) just substitute the trainer_name by >> trainer_id on the commented places. > > 1) works as well now - just had to transpose id/name. > > > > > > -- > > Best Regards, > Tarlika Elisabeth Schmitz > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql