Thursday, March 22, 2012

Deleting duplicate rows within a single table

I was wondering if anyone had a suggestion as to how to delete duplicate rows from a table. I have been doing this:

SELECT * INTO TempUsersNoRepeats
FROM TempUsers2
UNION
SELECT * FROM TempUsers3

This way I end up with a total of four tables (the fourth table being the original Users table) and I was hoping that there was a way that I could do this all within the the original Users table and not have to create the three TempUsers tables.

Thanks,
RonDo you have a primary key on the table?|||Douglas,

Thanks for the reply. These tables are staging tables and not part of my asp.net application as they are made from .csv files that I get from FOCUS jobs from a mainframe. I cannot add a PK as there are so many duplicate rows. The table cols are:

FacultyID,FacultyPW,FacultyFName,FacultyLName,FacultyEmailID

I could add a Identity col to get some unique values assoc. with each row though.

Thanks,
Ronald|||If you add an IDENTITY field (lets say, named ID):


DELETE FROM Faculty WHERE ID IN (
SELECT MAX(ID) FROM Faculty
GROUP BY
FacultyID,FacultyPW,FacultyFName,FacultyLName,FacultyEmailID
HAVING COUNT(*)>1
)

This will delete one row of the duplicates that are complete dups except the new ID field. Please try this on a test database first! This is untested SQL, but I believe it will work.|||Douglas,

Thanks that was pretty cool. If you could possibly answer one more question, some of the rows are duplicated multiple times, there could be one Faculty row 7 times (they are teaching 7 courses that semester) or some are just teaching 5 times thus only showing 5 dupes.

What would the proper syntax be for the HAVING COUNT(*)>1 so that I either catch a range (say 1-20) or for it to loop maybe.

I kept running the above script and was able to clear out all the duplicates after several runs.

Thanks very much,
Ronald|||This will delete one duplicate for each exact duplicate. If you have 20 identical duplicates, you would need to use the query 20 times (run it until 0 rows are effected).

Alternately, you could play and make it a query that deletes those dupes that are NOT MAX(ID).

No comments:

Post a Comment