Tuesday, February 14, 2012

Delete one of the duplicated row

I have many data in a table in which some rows are duplicated. How can I, for all duplicated rows, delete the extra rows and leave only one? You may assume checking one column is enough to tell if a row is duplicated.

Thanks

Step one: create a second table with identical structure: yourSecondTable;

Step two: CREATE UNIQUE INDEX removedups ON yourSecondTable (col1) WITH IGNORE_DUP_KEY

Step three:

INSERT yourSecondTable

SELECT * FROM yourFirstTable

--Duplicate key was ignored.

|||

DECLARE @.FirstName varchar(20),
@.LastName varchar(20),
@.Age int --declare all fields in table

DECLARE c1 CURSOR FOR

--Find Dupes
SELECT FirstName, LastName, Age
FROM MyTable
GROUP BY FirstName, LastName, Age
HAVING COUNT(FirstName) > 1

OPEN c1
FETCH NEXT FROM c1
INTO @.FirstName, @.LastName, @.Age
WHILE @.@.FETCH_STATUS = 0

BEGIN

--Delete all dupes...the cursor remembers the current record
DELETE FROM MyTable
WHERE FirstName IN (SELECT FirstName FROM MyTable GROUP BY FirstName HAVING COUNT(FirstName) > 1)

--insert the current record back into the table
INSERT INTO MyTable(FirstName, LastName, Age) VALUES(@.FirstName, @.LastName, @.Age)

FETCH NEXT FROM c1
INTO @.FirstName, @.LastName, @.Age
END

CLOSE c1
DEALLOCATE c1

Adamus

|||

Please look at my reply in the thread below for some solutions. You can do this with a single DELETE statement and batch it using SET ROWCOUNT or TOP clause if you have large number of rows to delete. And if you don't have any primary key or unique key on the table then you can use a cursor based approach.

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=666011&SiteID=1&PageID=1

|||

Umachandar Jayachandran - MS wrote:

Please look at my reply in the thread below for some solutions. You can do this with a single DELETE statement and batch it using SET ROWCOUNT or TOP clause if you have large number of rows to delete. And if you don't have any primary key or unique key on the table then you can use a cursor based approach.

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=666011&SiteID=1&PageID=1

Question. Why would the absence of a primary key be a determinant in selecting a cursor based approach?

Adamus

|||Because there is no easy way to determine which row to keep out of the duplicates which you can do easily in a single DELETE statement without writing procedural code.

No comments:

Post a Comment