Hi,
I have to perform a delete on a table. About 73.000
records that actually need 2.5 GB of storage. I simply
will perform a delete from table where .... Should i
consider doing this after hours ? What impact should i
consider for this action ' this is a 24x7 DB Server.
I'm planning the delete and after do a dbcc clean table to
organize the table and index.
Any suggestions or recommendations '
Thanks in advance
MikePresonally I would set up a job to
1. Backup the database
2. Delete the data
3. CLEANDB
4. DBREINDEX
5. ShrinkFile
The major inpact will be on any batch jobs happening on
that table at the same time i.e it may not be using it but
it may be linked to the process, which could cause locking.
J
>--Original Message--
>Hi,
>I have to perform a delete on a table. About 73.000
>records that actually need 2.5 GB of storage. I simply
>will perform a delete from table where .... Should i
>consider doing this after hours ? What impact should i
>consider for this action ' this is a 24x7 DB Server.
>I'm planning the delete and after do a dbcc clean table
to
>organize the table and index.
>Any suggestions or recommendations '
>Thanks in advance
>Mike
>.
>|||I agree with Julie on some points but not all. Always backup the db first.
I would do the deletes in smaller batches and you might need to backup the
tran log in between. If the rows take up 2.5GB of space then you will have
at least that much in the tran log. DBCC CLEANTABLE may not buy you
anything here since you aren't dropping any columns. I would do a DBREINDEX
but DO NOT issue a Shrinkfile unless you absolutely need that space. A
Shrinkfile will most likely undo all you just accomplished with the
DBREINDEX.
--
Andrew J. Kelly
SQL Server MVP
"Julie" <anonymous@.discussions.microsoft.com> wrote in message
news:12c0d01c4433e$ffa93890$a301280a@.phx.gbl...
> Presonally I would set up a job to
> 1. Backup the database
> 2. Delete the data
> 3. CLEANDB
> 4. DBREINDEX
> 5. ShrinkFile
> The major inpact will be on any batch jobs happening on
> that table at the same time i.e it may not be using it but
> it may be linked to the process, which could cause locking.
> J
>
> >--Original Message--
> >Hi,
> >
> >I have to perform a delete on a table. About 73.000
> >records that actually need 2.5 GB of storage. I simply
> >will perform a delete from table where .... Should i
> >consider doing this after hours ? What impact should i
> >consider for this action ' this is a 24x7 DB Server.
> >
> >I'm planning the delete and after do a dbcc clean table
> to
> >organize the table and index.
> >
> >Any suggestions or recommendations '
> >
> >Thanks in advance
> >Mike
> >.
> >|||i'd recommend deleting the data in small portions instead of one delete
statement that deletes all 73k rows at once. deleting small groups of
data should have minimal to no impact on your users. if you're using
full or bulk-logged recovery, be sure to do transaction log backups at
appropriate times during your deletes.
Mike wrote:
> Hi,
> I have to perform a delete on a table. About 73.000
> records that actually need 2.5 GB of storage. I simply
> will perform a delete from table where .... Should i
> consider doing this after hours ? What impact should i
> consider for this action ' this is a 24x7 DB Server.
> I'm planning the delete and after do a dbcc clean table to
> organize the table and index.
> Any suggestions or recommendations '
> Thanks in advance
> Mike|||Agreed, you only need to do the shrink file if want to
want to get your disk space back.
Totally agree with cleantable.
Deletion not too sure I agree, personally prefer to get
it al over at once.
Shrinkfile affecting DBREINDEX, i didn't know that
thanks, but wouldn't it just squeeze the pages together
rather than changing the order ?
J
>--Original Message--
>I agree with Julie on some points but not all. Always
backup the db first.
>I would do the deletes in smaller batches and you might
need to backup the
>tran log in between. If the rows take up 2.5GB of space
then you will have
>at least that much in the tran log. DBCC CLEANTABLE may
not buy you
>anything here since you aren't dropping any columns. I
would do a DBREINDEX
>but DO NOT issue a Shrinkfile unless you absolutely need
that space. A
>Shrinkfile will most likely undo all you just
accomplished with the
>DBREINDEX.
>--
>Andrew J. Kelly
>SQL Server MVP
>
>"Julie" <anonymous@.discussions.microsoft.com> wrote in
message
>news:12c0d01c4433e$ffa93890$a301280a@.phx.gbl...
>> Presonally I would set up a job to
>> 1. Backup the database
>> 2. Delete the data
>> 3. CLEANDB
>> 4. DBREINDEX
>> 5. ShrinkFile
>> The major inpact will be on any batch jobs happening on
>> that table at the same time i.e it may not be using it
but
>> it may be linked to the process, which could cause
locking.
>> J
>>
>> >--Original Message--
>> >Hi,
>> >
>> >I have to perform a delete on a table. About 73.000
>> >records that actually need 2.5 GB of storage. I simply
>> >will perform a delete from table where .... Should i
>> >consider doing this after hours ? What impact should i
>> >consider for this action ' this is a 24x7 DB Server.
>> >
>> >I'm planning the delete and after do a dbcc clean
table
>> to
>> >organize the table and index.
>> >
>> >Any suggestions or recommendations '
>> >
>> >Thanks in advance
>> >Mike
>> >.
>> >
>
>.
>|||Not sure if this applies, but if you are deleting all the records in the
table you can use the TRUNCATE command.
Rand
This posting is provided "as is" with no warranties and confers no rights.|||Julie wrote:
> Shrinkfile affecting DBREINDEX, i didn't know that
> thanks, but wouldn't it just squeeze the pages together
> rather than changing the order ?
it changes the page order dramatically. in fact, it's sort of a goofy
situation. you shrink the db to get all the data to the front of the
file. after you shrink it, it's most likely very heavily fragmented.
so then you defrag it. when you defrag it, it moves pages out into the
free space in the db file(s), so now your pages are no longer all at the
front of the file.|||Thanks ch,
I never knew that.
J
>--Original Message--
>Julie wrote:
>> Shrinkfile affecting DBREINDEX, i didn't know that
>> thanks, but wouldn't it just squeeze the pages together
>> rather than changing the order ?
>it changes the page order dramatically. in fact, it's
sort of a goofy
>situation. you shrink the db to get all the data to the
front of the
>file. after you shrink it, it's most likely very
heavily fragmented.
>so then you defrag it. when you defrag it, it moves
pages out into the
>free space in the db file(s), so now your pages are no
longer all at the
>front of the file.
>.
>|||Hi,
Thanks for your comments, but now i have a problem:
1. I backed up the database
2. i've made the delete in small batches
3. This table only has a clustered index on an PK Column,
i have performed a DBREINDEX on that Pk clustered index.
Results:
name
C29___BINARIES
rows reserved data index_size unused
221392 4902000 KB 2724296 KB 112 KB 2177592 KB
How can i free the unused table space '? Why the
dbreindex didin't release this space '
Sorry :) Can you help me ?
Thanks again
Mike|||dbcc updateusage (0) with count_rows
it may actually be freed but not recorded yet.
Mike wrote:
> Hi,
> Thanks for your comments, but now i have a problem:
> 1. I backed up the database
> 2. i've made the delete in small batches
> 3. This table only has a clustered index on an PK Column,
> i have performed a DBREINDEX on that Pk clustered index.
> Results:
> name
> C29___BINARIES
> rows reserved data index_size unused
> 221392 4902000 KB 2724296 KB 112 KB 2177592 KB
> How can i free the unused table space '? Why the
> dbreindex didin't release this space '
> Sorry :) Can you help me ?
> Thanks again
> Mike
Showing posts with label simply. Show all posts
Showing posts with label simply. Show all posts
Tuesday, February 14, 2012
Delete 'Non-Unique' Rows Question (with DDL)
Hi all,
I'm working on a query which will ultimately be used in a stored proc
for a report. This report will quite simply list the components that
are unique to a particular customers goods. I've created a mechanism
for traversing our BOM structure, and getting a list of all components
used in any customers parts. I then intended to clear out all the
components that were used by more than one customer. Unfortunately,
after about 5 hours effort, I've come up short. Perhaps someone can
offer a suggestion.
The DDL is posted below. What I'm looking for, ultimately, is to get a
'selection' containing only the two "1000" partnumbers, as components
"1001" and "1002" are also used by customer "B". Also, Customer "A" has
two products which use the "1000" component, but its still unique to
them, so we want to keep it in the results.
You can see the DELETE statement I was trying, but it misses the mark
rather wildly. I'm open to tackling this problem a different way to, if
that would provide a clearer solution. In the meanwhile, I'll keep
plugging away at it, perhaps I can get lucky.
Thanks for your help,
Brian Ackermann
---
-- DDL
---
DROP TABLE zz_report
GO
CREATE TABLE zz_report
(
partNumber varchar(10)
, customer varchar(10)
)
INSERT zz_report VALUES ('1000', 'A')
INSERT zz_report VALUES ('1000', 'A')
INSERT zz_report VALUES ('1001', 'A')
INSERT zz_report VALUES ('1002', 'A')
INSERT zz_report VALUES ('1001', 'B')
INSERT zz_report VALUES ('1002', 'B')
INSERT zz_report VALUES ('1002', 'B')
DELETE zz_report FROM zz_report R
WHERE EXISTS
(
SELECT Z.partnumber
FROM zz_report Z
WHERE R.partnumber = Z.partnumber
GROUP BY Z.customer, Z.partnumber
HAVING count(*) <> 1
)
SELECT * FROM zz_report
GO
DROP TABLE zz_report
GOa shot in the dark:
DELETE zz_report FROM zz_report R
WHERE EXISTS
(
SELECT 1
FROM zz_report Z
WHERE R.partnumber = Z.partnumber
AND z.customer > r.customer
)
partNumber customer
-- --
1000 A
1000 A
1001 B
1002 B
1002 B|||Try using a select, without deleting anything...
Select a.Customer, a.partnumber
from zz_report as a
where not exists (select 1
from zz_report b
where b.partnumber = a.partnumber
and b.customer <> a.customer)
"Brian J. Ackermann" <brianj774@.allmail.net> wrote in message
news:gdudnZReWYKmavDZnZ2dnUVZ_tednZ2d@.gi
ganews.com...
> Hi all,
> I'm working on a query which will ultimately be used in a stored proc
> for a report. This report will quite simply list the components that
> are unique to a particular customers goods. I've created a mechanism
> for traversing our BOM structure, and getting a list of all components
> used in any customers parts. I then intended to clear out all the
> components that were used by more than one customer. Unfortunately,
> after about 5 hours effort, I've come up short. Perhaps someone can
> offer a suggestion.
> The DDL is posted below. What I'm looking for, ultimately, is to get a
> 'selection' containing only the two "1000" partnumbers, as components
> "1001" and "1002" are also used by customer "B". Also, Customer "A" has
> two products which use the "1000" component, but its still unique to
> them, so we want to keep it in the results.
> You can see the DELETE statement I was trying, but it misses the mark
> rather wildly. I'm open to tackling this problem a different way to, if
> that would provide a clearer solution. In the meanwhile, I'll keep
> plugging away at it, perhaps I can get lucky.
> Thanks for your help,
> Brian Ackermann
>
> ---
> -- DDL
> ---
> DROP TABLE zz_report
> GO
> CREATE TABLE zz_report
> (
> partNumber varchar(10)
> , customer varchar(10)
> )
>
> INSERT zz_report VALUES ('1000', 'A')
> INSERT zz_report VALUES ('1000', 'A')
> INSERT zz_report VALUES ('1001', 'A')
> INSERT zz_report VALUES ('1002', 'A')
> INSERT zz_report VALUES ('1001', 'B')
> INSERT zz_report VALUES ('1002', 'B')
> INSERT zz_report VALUES ('1002', 'B')
>
> DELETE zz_report FROM zz_report R
> WHERE EXISTS
> (
> SELECT Z.partnumber
> FROM zz_report Z
> WHERE R.partnumber = Z.partnumber
> GROUP BY Z.customer, Z.partnumber
> HAVING count(*) <> 1
> )
>
> SELECT * FROM zz_report
> GO
> DROP TABLE zz_report
> GO|||Jim Underwood wrote:
> Try using a select, without deleting anything...
> Select a.Customer, a.partnumber
> from zz_report as a
> where not exists (select 1
> from zz_report b
> where b.partnumber = a.partnumber
> and b.customer <> a.customer)
Thanks Jim!
This works, at least as far as the small dataset I provided. I'm
running it now on my real data. Unfortunately, its taking quite a long
time to process (about a minute and a half). I'll report back my
findings in a few.|||Brian J. Ackermann wrote
> Thanks Jim!
> This works, at least as far as the small dataset I provided. I'm
> running it now on my real data. Unfortunately, its taking quite a long
> time to process (about a minute and a half). I'll report back my
> findings in a few.
>
Well,
ly, its not what I am needing. The very first part I checked
(part "000001") is used in just about every single finished product
produced here, and consequently, is used by nearly every customer. I'll
poke around with it some, because the idea seems good to me...
Brian|||Brian J. Ackermann wrote:
> Well,
ly, its not what I am needing. The very first part I checked
> (part "000001") is used in just about every single finished product
> produced here, and consequently, is used by nearly every customer. I'll
> poke around with it some, because the idea seems good to me...
> Brian
Okay, I found a bit of garbage data (some customers with NULL values).
Removing that, I've found that the first hundred items are perfect. The
logic seems strong, so I'm going to go with it.
But, do you have any suggestions for picking up the pace a little bit.
That little query you provided me with takes about 00:01:05 to run. The
table is about 19,000 rows. Perhaps theres some kind of optimization we
can try on it?
Thanks again, I appreciate your help.
Brian Ackermann|||Well, if you post full DDL showing your actual table and indexes, someone
may be able to offer a suggestion.
If partnumber and customer form the primary key of this table (or have a
unique constraint) , then you might try something like...
Select customer, partnumber from zz_report
where partnumber in (
select partnumber
from zz_report
group by partnumber
having count(customer) = 1
)
Having an index on partnumber may help in either case. Note, this is an
index with partnumber as the first column, not the second.
"Brian J. Ackermann" <brianj774@.allmail.net> wrote in message
news:446E1199.8070504@.allmail.net...
> Brian J. Ackermann wrote:
> Okay, I found a bit of garbage data (some customers with NULL values).
> Removing that, I've found that the first hundred items are perfect. The
> logic seems strong, so I'm going to go with it.
> But, do you have any suggestions for picking up the pace a little bit.
> That little query you provided me with takes about 00:01:05 to run. The
> table is about 19,000 rows. Perhaps theres some kind of optimization we
> can try on it?
> Thanks again, I appreciate your help.
> Brian Ackermann|||Jim Underwood wrote:
> Well, if you post full DDL showing your actual table and indexes, someone
> may be able to offer a suggestion.
The DDL is pretty much as I gave it. Its just a temporary table, with
only those two fields. I'll try putting an index on it, and see if that
make a big difference. I expect it will.
Thanks again for your excellent help.
Brian.|||Do you really need the temporary table? If the data is coming from an
existing table, why not access it directly? Or is this data coming from an
application, and not the database?
"Brian J. Ackermann" <brianj774@.allmail.net> wrote in message
news:hbSdnfRYaMeOh_PZRVn-uw@.giganews.com...
> Jim Underwood wrote:
someone
> The DDL is pretty much as I gave it. Its just a temporary table, with
> only those two fields. I'll try putting an index on it, and see if that
> make a big difference. I expect it will.
> Thanks again for your excellent help.
> Brian.
>|||Jim Underwood wrote:
> Do you really need the temporary table? If the data is coming from an
> existing table, why not access it directly? Or is this data coming from a
n
> application, and not the database?
>
I am not sure if I _need_ it, but I can't really think of a better way
to get to it. All of that data comes out of our Bill of Material
Tables, which is a hierarchical data structure, and some of our parts
are only 2 levels deep, and some are 5 levels deep, and even that could
change in the future.
Presently, I am using a cursor to loop over all our 'non-obsolete'
parts, and then a UDF which navigates the BOM, pulling out all the
components along the way. The results of this UDF I put into the temp
table.
I'm quite certain there are other ways to tackle this problem, but that
seemed the most likely to generate easily understood results.
Brian
I'm working on a query which will ultimately be used in a stored proc
for a report. This report will quite simply list the components that
are unique to a particular customers goods. I've created a mechanism
for traversing our BOM structure, and getting a list of all components
used in any customers parts. I then intended to clear out all the
components that were used by more than one customer. Unfortunately,
after about 5 hours effort, I've come up short. Perhaps someone can
offer a suggestion.
The DDL is posted below. What I'm looking for, ultimately, is to get a
'selection' containing only the two "1000" partnumbers, as components
"1001" and "1002" are also used by customer "B". Also, Customer "A" has
two products which use the "1000" component, but its still unique to
them, so we want to keep it in the results.
You can see the DELETE statement I was trying, but it misses the mark
rather wildly. I'm open to tackling this problem a different way to, if
that would provide a clearer solution. In the meanwhile, I'll keep
plugging away at it, perhaps I can get lucky.
Thanks for your help,
Brian Ackermann
---
-- DDL
---
DROP TABLE zz_report
GO
CREATE TABLE zz_report
(
partNumber varchar(10)
, customer varchar(10)
)
INSERT zz_report VALUES ('1000', 'A')
INSERT zz_report VALUES ('1000', 'A')
INSERT zz_report VALUES ('1001', 'A')
INSERT zz_report VALUES ('1002', 'A')
INSERT zz_report VALUES ('1001', 'B')
INSERT zz_report VALUES ('1002', 'B')
INSERT zz_report VALUES ('1002', 'B')
DELETE zz_report FROM zz_report R
WHERE EXISTS
(
SELECT Z.partnumber
FROM zz_report Z
WHERE R.partnumber = Z.partnumber
GROUP BY Z.customer, Z.partnumber
HAVING count(*) <> 1
)
SELECT * FROM zz_report
GO
DROP TABLE zz_report
GOa shot in the dark:
DELETE zz_report FROM zz_report R
WHERE EXISTS
(
SELECT 1
FROM zz_report Z
WHERE R.partnumber = Z.partnumber
AND z.customer > r.customer
)
partNumber customer
-- --
1000 A
1000 A
1001 B
1002 B
1002 B|||Try using a select, without deleting anything...
Select a.Customer, a.partnumber
from zz_report as a
where not exists (select 1
from zz_report b
where b.partnumber = a.partnumber
and b.customer <> a.customer)
"Brian J. Ackermann" <brianj774@.allmail.net> wrote in message
news:gdudnZReWYKmavDZnZ2dnUVZ_tednZ2d@.gi
ganews.com...
> Hi all,
> I'm working on a query which will ultimately be used in a stored proc
> for a report. This report will quite simply list the components that
> are unique to a particular customers goods. I've created a mechanism
> for traversing our BOM structure, and getting a list of all components
> used in any customers parts. I then intended to clear out all the
> components that were used by more than one customer. Unfortunately,
> after about 5 hours effort, I've come up short. Perhaps someone can
> offer a suggestion.
> The DDL is posted below. What I'm looking for, ultimately, is to get a
> 'selection' containing only the two "1000" partnumbers, as components
> "1001" and "1002" are also used by customer "B". Also, Customer "A" has
> two products which use the "1000" component, but its still unique to
> them, so we want to keep it in the results.
> You can see the DELETE statement I was trying, but it misses the mark
> rather wildly. I'm open to tackling this problem a different way to, if
> that would provide a clearer solution. In the meanwhile, I'll keep
> plugging away at it, perhaps I can get lucky.
> Thanks for your help,
> Brian Ackermann
>
> ---
> -- DDL
> ---
> DROP TABLE zz_report
> GO
> CREATE TABLE zz_report
> (
> partNumber varchar(10)
> , customer varchar(10)
> )
>
> INSERT zz_report VALUES ('1000', 'A')
> INSERT zz_report VALUES ('1000', 'A')
> INSERT zz_report VALUES ('1001', 'A')
> INSERT zz_report VALUES ('1002', 'A')
> INSERT zz_report VALUES ('1001', 'B')
> INSERT zz_report VALUES ('1002', 'B')
> INSERT zz_report VALUES ('1002', 'B')
>
> DELETE zz_report FROM zz_report R
> WHERE EXISTS
> (
> SELECT Z.partnumber
> FROM zz_report Z
> WHERE R.partnumber = Z.partnumber
> GROUP BY Z.customer, Z.partnumber
> HAVING count(*) <> 1
> )
>
> SELECT * FROM zz_report
> GO
> DROP TABLE zz_report
> GO|||Jim Underwood wrote:
> Try using a select, without deleting anything...
> Select a.Customer, a.partnumber
> from zz_report as a
> where not exists (select 1
> from zz_report b
> where b.partnumber = a.partnumber
> and b.customer <> a.customer)
Thanks Jim!
This works, at least as far as the small dataset I provided. I'm
running it now on my real data. Unfortunately, its taking quite a long
time to process (about a minute and a half). I'll report back my
findings in a few.|||Brian J. Ackermann wrote
> Thanks Jim!
> This works, at least as far as the small dataset I provided. I'm
> running it now on my real data. Unfortunately, its taking quite a long
> time to process (about a minute and a half). I'll report back my
> findings in a few.
>
Well,

(part "000001") is used in just about every single finished product
produced here, and consequently, is used by nearly every customer. I'll
poke around with it some, because the idea seems good to me...
Brian|||Brian J. Ackermann wrote:
> Well,

> (part "000001") is used in just about every single finished product
> produced here, and consequently, is used by nearly every customer. I'll
> poke around with it some, because the idea seems good to me...
> Brian
Okay, I found a bit of garbage data (some customers with NULL values).
Removing that, I've found that the first hundred items are perfect. The
logic seems strong, so I'm going to go with it.
But, do you have any suggestions for picking up the pace a little bit.
That little query you provided me with takes about 00:01:05 to run. The
table is about 19,000 rows. Perhaps theres some kind of optimization we
can try on it?
Thanks again, I appreciate your help.
Brian Ackermann|||Well, if you post full DDL showing your actual table and indexes, someone
may be able to offer a suggestion.
If partnumber and customer form the primary key of this table (or have a
unique constraint) , then you might try something like...
Select customer, partnumber from zz_report
where partnumber in (
select partnumber
from zz_report
group by partnumber
having count(customer) = 1
)
Having an index on partnumber may help in either case. Note, this is an
index with partnumber as the first column, not the second.
"Brian J. Ackermann" <brianj774@.allmail.net> wrote in message
news:446E1199.8070504@.allmail.net...
> Brian J. Ackermann wrote:
> Okay, I found a bit of garbage data (some customers with NULL values).
> Removing that, I've found that the first hundred items are perfect. The
> logic seems strong, so I'm going to go with it.
> But, do you have any suggestions for picking up the pace a little bit.
> That little query you provided me with takes about 00:01:05 to run. The
> table is about 19,000 rows. Perhaps theres some kind of optimization we
> can try on it?
> Thanks again, I appreciate your help.
> Brian Ackermann|||Jim Underwood wrote:
> Well, if you post full DDL showing your actual table and indexes, someone
> may be able to offer a suggestion.
The DDL is pretty much as I gave it. Its just a temporary table, with
only those two fields. I'll try putting an index on it, and see if that
make a big difference. I expect it will.
Thanks again for your excellent help.
Brian.|||Do you really need the temporary table? If the data is coming from an
existing table, why not access it directly? Or is this data coming from an
application, and not the database?
"Brian J. Ackermann" <brianj774@.allmail.net> wrote in message
news:hbSdnfRYaMeOh_PZRVn-uw@.giganews.com...
> Jim Underwood wrote:
someone
> The DDL is pretty much as I gave it. Its just a temporary table, with
> only those two fields. I'll try putting an index on it, and see if that
> make a big difference. I expect it will.
> Thanks again for your excellent help.
> Brian.
>|||Jim Underwood wrote:
> Do you really need the temporary table? If the data is coming from an
> existing table, why not access it directly? Or is this data coming from a
n
> application, and not the database?
>
I am not sure if I _need_ it, but I can't really think of a better way
to get to it. All of that data comes out of our Bill of Material
Tables, which is a hierarchical data structure, and some of our parts
are only 2 levels deep, and some are 5 levels deep, and even that could
change in the future.
Presently, I am using a cursor to loop over all our 'non-obsolete'
parts, and then a UDF which navigates the BOM, pulling out all the
components along the way. The results of this UDF I put into the temp
table.
I'm quite certain there are other ways to tackle this problem, but that
seemed the most likely to generate easily understood results.
Brian
Subscribe to:
Posts (Atom)