Tuesday, March 27, 2012

deleting records

I have a couple SQL tables that have been appended to daily over the last two years. There is now about 50,000,000 records in the table. Does anyone know the fastest way to delete records before a certain date to shorten these tables? Delete queries and everything else I've tried is taking way too long.

i'm guessing there is no index on the date column - so you are suffering a table scan - you'll just have to suck it up and let the delete take however long it takes.

If this will be an ongoing requirement (i.e. purge this table) - why don't you add an index?

.....gtr

|||

Thanks for your response

This table was set up way before my time, but there is no index. I am still relatively new with MS Sql so am not very famililiar w/ indexing. Is this all I have to do?

CREATE INDEX IDX_Daily_snapshot_INV_File_Date
on Daily_snapshot_INV (File_Date)

After this, do I just query the table normally and it will bypass the table scan?

|||

yup, that's all there is to it

.....gtr

|||

>> it will bypass the table scan?<<

This is not possible to guess, but if you are only deleting a few rows, like say 10% or less (VERY ROUGH estimate) this should do the trick.

Even so, if you have 50 million rows over 2 years it is going to take a while to delete a large number of these rows regardless. If it is slow, try deleting only a small number of rows, like a days, weeks, or months worth at a time (instead of a year at a time) which will depend a lot on your hardware.

|||

Thank you for your answers. I know now that indexing will help me search but unfortunately there is nothing to help the process of of the actual deletion. I am going to run a delete query in Query Analyzer over the weekend when our server is not being used.

Matt

|||

Here is a small example that you can use to experiment with

And you can modify it to delete your data in batches

this code creates a table that has the au_id from the authors table

The delete query has a where clause that will delete exactly 9 rows

the code will run until @.@.rowcount =0 and it will delete in batches of 2

This of course is just to illustrate this concept but you probably want to have a much higher number perhaps 10000

use pubs
set nocount on
create table SomeTable (ID varchar(49))
insert into SomeTable
select au_id from authors


declare @.count int
select @.count = count(*) from SomeTable

print 'Count before delete == ' + convert(varchar(10),@.count)

declare @.rowcount int
set rowcount 2 --delete in batches of 2
select @.rowcount =666 --initialize to <> 0

while @.rowcount <> 0 -- do until we are done
begin
-- the table should be 9 rows less after we are done
delete SomeTable
where ID like '7%'
or ID like '8%'

select @.rowcount =@.@.ROWCOUNT

end

set rowcount 0 --very important to set this back to 0

select @.count = count(*) from SomeTable

print 'Count after delete == ' + convert(varchar(10),@.count)
set nocount off

Denis the SQL Menace

http://sqlservercode.blogspot.com/

|||

Is there an advantage of doing it that way when I could do this:

DELETE FROM dbo.Daily_snapshot_INV
WHERE (File_date < CONVERT(DATETIME, '2005-04-01 00:00:00', 102))

The problem is that this probably consists of 20,000,000 rows so instead of one query to delete them all I will need to delete a month or half month at a time. Each day has about 120,000 rows. This table was created in Oct, 2004.

Like I said above, I am relatively new at SQL so any help in understanding is appreciated.

|||

The advantage is that you don't have to write a lot of WHERE statements

Denis the SQL Menace

http://sqlservercode.blogspot.com/

|||

Since this is the first time you are deleting from the table - yes, you are just going to have to let the thing run - deleting many rows (and logging) just plain takes time.

In the future, as this becomes a regular process (perhaps monthly) the index will help a great deal.

.....gtr

No comments:

Post a Comment