Sunday, March 25, 2012

Deleting million rows after checking duplication

I have a table(9 fields) which contains around 10 million records. Within these 10 million records, don't know how many are duplicated rows. I wrote a cursor which checks duplication on all of the 9 fields within each row, and then returns back with the count of rows that are duplicated. I subtact 1 row and delete all the other rows. Problem is that this thing takes a lot of time to execute. At the current rate(approx. 90 records/hour) this cursor is going to take months to clean the table.

There is no PK or no indexing what so ever on the table, and I HAVE to check each and every field for duplication(all except 1 fields are nvarchar).

Please help me. I need to sort this out

What about doing a grouping on the table and pulling hte data out to another table.

Something like:

Select cola,ColB,Colc (all columns here)
INTO SomeNewTable
FROM SomeTable
Group by cola,ColB,Colc (all columns here)

HTH, Jens K. Suessmeyer.

http://www.sqlserver2005.de

No comments:

Post a Comment