Sunday, March 25, 2012

Deleting large number of rows in SQL Server 2000

Hi,

I have some problems with our database which is growing too large, and was hoping someone might have some tips on what I can do!

I have about 100 clients, each logging about 10 000 rows of status logs a day. So after just a few days the db is growing very large.

At present it's manageable, since I don't need to "dig" into the logs more than a few times a day. The system it self is not affected by the size of the log or traffic on the server. But it will increase to about 500 clients in 2004, and 1000-1500 in 2005. So I really need a smarter solution than what I have today to be able to use the log efficiently.

98-99% of these rows are status-messages which are more or less garbage during normal operation. But I still need to keep them in case an error occurs, and we need to go back an hour or two (maybe a day) to see what went wrong. After 24-48 hours these 98-99% are of no use. I do however like to keep the remaining 1-2%, they are messages like startup, errors, etc. Ideally they should be logged in two separate tables by the clients, but unfortunatelly I cannot make the clients change their logging.

This presents problems on multiple levels. Mainly in searching, which often times out, but also with backup and storagespace. At the moment I check the system for errors, and every other day I just truncate the log-file. It works, but it's not exacly elegant.....

The server is a 1100 MHz P3 / 512MB / Windows 2000 Server /
SQL Server 2000. Faster hardware would help, but the problem is more of a "bad design" than "slow hardware" problem.

My log is pretty simple, as follows:

LogId - int - primary key - clustered index
ClientId - int - index asc
LogTypeId - int - index asc
LogValue - nvarchar[2500], ikke index
LogTimeStamp- datetime - index asc

I have deducted 3 different solutions:

Method 1:
Simply run "Delete from db_log where logtyipeid <> stuff_I_want_to_keep".

This is the simplest and the one i prefer, but it takes too long time to complete. Any tips to speed this process up?

Method 2:
Create a trigger which runs something like "Delete from db_log where logtypeid <> stuff_I_want_to_keep and date < today_minus_two_days" every hour or so. This will ensure that the db doesn't grow to large. But if I'm away from work a few days we might loose data we'd wanted to keep.

Method 3:

Copy what I want to keep into another table, and empty the log. Sort of like "Insert into db_log_keep stuff_to_keep; drop db_log; create table db_log; " (or truncate, but that takes a long time too)

But then I would be stuck with two log tables, "48-hour_db_log" and "db_log_keep". I could use a view to "union" them so they would appear as a single table, but that's not ideal either.

However, it seems as this method is what will work best for my set-up, unless there are other suggestions??

Method 4:

...eagerly awaiting ideas!!! :-)

(Also, whatever tips and/or links to info on maintaing VLDB's are greatly appreciated. )

Thanks in advance for your help! :-)

NikolaiI would personally use method 3, and then rename the table back to original name after the original table had been dropped.

I find this site is quite useful

http://www.sql-server-performance.com/|||Method 3 is what I would do as well. Just cause I like to be overprotective. I would move data you don't need into another table, keep it there for a day, and then empty it out at night. Sort of like an archive table.|||Is there any redundancy in:

LogTypeId - int - index asc
LogValue - nvarchar[2500], ikke index ?

What I mean is, are the messages free form or are the same messages and types being sent repeatedly? If there's good redundancy, particularly in the pair values (TypeID AND LogValue), these could be normalized out and the log reduced to holding just the client id and the time stamp.|||(Sorry about that "ikke", means no index... :-) )

From the last post, I tried turning of the indexing (sp_autostats) and then on again after deleting, but that didn't help either... deleting such large amounts just takes too much time I guess.

Indeed, pairing logvalue and logtype might reduce the log size, I'll try it and see what improvements I'll get. The problem however is that as I mentioned, I cannot change the way the client logs, so I have to add some sort of trigger or something to get around it.

Of the 98% "garbage", Logvalue would be typically about 20 different values that are repeated often. This number will increase to approx. 150 or so by 2005.

Overly simplified for demonstration purposes, Logtype and logvalue looks something like this:

2010, "stringvalue1"
2010, "stringvalue2"
2010, "stringvalue3"
2010, "stringvalue4"
2020, "some_status_message_different_each_time"
2010, "stringvalue1"
2010, "stringvalue2"
2010, "stringvalue3"
2010, "stringvalue4"

2010 is the most used value, and it's paired with stringvalue1 through 20 or so. So by exchanging 2010, "stringvalue1" with LogEntryValueId1, and 2010, "stringvalue2" with LogEntryValueId2 or something like that would really reduce my database.

I didn't actually think about renaming the table back to the original, but that's a great idea! I'll try that if that pairing thing has to be postponed to the next version.

Thank so very much for for all your help guys!!! :-)

No comments:

Post a Comment