Friday, February 17, 2012

Delete query to remove duplicate rows

Hi,

I have a query which is taking hours to complete. It is a self join on one table. Can some one please help me to tune this, so that it is completed in an efficient time. The table has only 1000 rows. This query runs on SQL server 7.0 (Windows) from a stored procedure.

The query is below.

delete a
from wtn a, wtn b
where a.code = b.code
and a.num = b.num
and a.unit = 'DLMMLT'
and b.unit != 'DLMMLT'

The data is like the following:

Table wtn
---

code num unit
-- -- --
123456 2345671990 ifty
123456 2345671990 DLMMLT
456123 4567891230 jkty
456123 4567891230 DLMMLT

All I need to do is to delete the duplicate rows with "unit = DLMMLT". The duplicate rows should be identified with "code" and "num" columns. The primary key includes all the three columns. One index exists for all the three columns.

I need this urgent. Any help is highly appreciated.
Thanx..
-BheemsenTry thisDELETE a
FROM wtn a
WHERE EXISTS
(
SELECT *
FROM wtn b
WHERE a.code = b.code
AND a.num = b.num
AND a.unit = 'DLMMLT'
AND b.unit != 'DLMMLT'
)|||Thanx I will give it a try. In my post, I was worng on number of rows in the table. The table has 15 million rows. But the rows to be deleted are around 1000.

Thanx..
-Bheemsen|||achorozy,

It did not help. But, I had created a clustered index on "unit" column now. So the query is using index scan. But I am still thinking that the query can be rewritten in a good way.

thanx..
-Bheemsen|||How long does it take to delete 1 record from this table using explicit values of a row that you know needs to be deleted ?|||I did not try deleting just one record. It is a production query. The no. of rows to be deleted varies every day. There may be around 1000 to be deleted every day as I said in my previuos notes.

Thanx..
-Bheemsen|||I understand that you want to delete x number of rows, but to establish a baseline, a good start is to see how long 1 record takes to delete -- It may not be that you are deleting 1000 records that only takes a long time to complete - 1 record could take a long(respectively) time as well - which would mean that your query is ok (especially if you use explicit values of the record that you know needs to be deleted and not try to pull the information by trying to delete all the records).|||rnealejr,

Sorry to get back to you so late. The reason was that the query tooks so long time to run. I ran the query on a mirror database to production with 18 millim rows. To delete 6 rows, the query took
21 hrs 31 min 34 sec.

I badly need help in tuning this query to reduce the time of execution.
I am going to rerun the same query with a clustered index created on unit.

Thanx..
-Bheemsen|||I am still curious to see what happens when you delete 1 row explicity - with hard-coded values in your where clause:

e.g.

delete from wtn
where wtn.code = 123456 and
wtn.num = 2345671990 and
wtn.unit = 'ifty'

What is your table structure (include name, type, size) and your indexes (type, columns) and any other dependencies ?|||One row got deleted with hard coded values in less than one second.
I tested the original delete query after creating the clustered index on 'unit'. The original delete query tihis time surprisingly finished in just 54 secs. But this may not happen always since there is lot of update/insert/delete activity on the table.

That's why I wanted to tune the query too, besides having a clustered index.

The table strucutre is as follows:

name type size
--- --- ---
code char 6
num decimal 9
unit varchar 8

one full-text index exists on all the three columns.
Primary key includes all three columns.
No other dependencies.

thanx..
-Bheemsen|||TRY THIS OUT

DELETE FROM WTN
WHERE UNIT IN
(SELECT UNIT FROM WTN GROUP BY UNIT HAVING COUNT(UNIT)>1)

No comments:

Post a Comment