Tuesday, February 14, 2012

Delete 'Non-Unique' Rows Question (with DDL)

Hi all,
I'm working on a query which will ultimately be used in a stored proc
for a report. This report will quite simply list the components that
are unique to a particular customers goods. I've created a mechanism
for traversing our BOM structure, and getting a list of all components
used in any customers parts. I then intended to clear out all the
components that were used by more than one customer. Unfortunately,
after about 5 hours effort, I've come up short. Perhaps someone can
offer a suggestion.
The DDL is posted below. What I'm looking for, ultimately, is to get a
'selection' containing only the two "1000" partnumbers, as components
"1001" and "1002" are also used by customer "B". Also, Customer "A" has
two products which use the "1000" component, but its still unique to
them, so we want to keep it in the results.
You can see the DELETE statement I was trying, but it misses the mark
rather wildly. I'm open to tackling this problem a different way to, if
that would provide a clearer solution. In the meanwhile, I'll keep
plugging away at it, perhaps I can get lucky.
Thanks for your help,
Brian Ackermann
---
-- DDL
---
DROP TABLE zz_report
GO
CREATE TABLE zz_report
(
partNumber varchar(10)
, customer varchar(10)
)
INSERT zz_report VALUES ('1000', 'A')
INSERT zz_report VALUES ('1000', 'A')
INSERT zz_report VALUES ('1001', 'A')
INSERT zz_report VALUES ('1002', 'A')
INSERT zz_report VALUES ('1001', 'B')
INSERT zz_report VALUES ('1002', 'B')
INSERT zz_report VALUES ('1002', 'B')
DELETE zz_report FROM zz_report R
WHERE EXISTS
(
SELECT Z.partnumber
FROM zz_report Z
WHERE R.partnumber = Z.partnumber
GROUP BY Z.customer, Z.partnumber
HAVING count(*) <> 1
)
SELECT * FROM zz_report
GO
DROP TABLE zz_report
GOa shot in the dark:
DELETE zz_report FROM zz_report R
WHERE EXISTS
(
SELECT 1
FROM zz_report Z
WHERE R.partnumber = Z.partnumber
AND z.customer > r.customer
)
partNumber customer
-- --
1000 A
1000 A
1001 B
1002 B
1002 B|||Try using a select, without deleting anything...
Select a.Customer, a.partnumber
from zz_report as a
where not exists (select 1
from zz_report b
where b.partnumber = a.partnumber
and b.customer <> a.customer)
"Brian J. Ackermann" <brianj774@.allmail.net> wrote in message
news:gdudnZReWYKmavDZnZ2dnUVZ_tednZ2d@.gi
ganews.com...
> Hi all,
> I'm working on a query which will ultimately be used in a stored proc
> for a report. This report will quite simply list the components that
> are unique to a particular customers goods. I've created a mechanism
> for traversing our BOM structure, and getting a list of all components
> used in any customers parts. I then intended to clear out all the
> components that were used by more than one customer. Unfortunately,
> after about 5 hours effort, I've come up short. Perhaps someone can
> offer a suggestion.
> The DDL is posted below. What I'm looking for, ultimately, is to get a
> 'selection' containing only the two "1000" partnumbers, as components
> "1001" and "1002" are also used by customer "B". Also, Customer "A" has
> two products which use the "1000" component, but its still unique to
> them, so we want to keep it in the results.
> You can see the DELETE statement I was trying, but it misses the mark
> rather wildly. I'm open to tackling this problem a different way to, if
> that would provide a clearer solution. In the meanwhile, I'll keep
> plugging away at it, perhaps I can get lucky.
> Thanks for your help,
> Brian Ackermann
>
> ---
> -- DDL
> ---
> DROP TABLE zz_report
> GO
> CREATE TABLE zz_report
> (
> partNumber varchar(10)
> , customer varchar(10)
> )
>
> INSERT zz_report VALUES ('1000', 'A')
> INSERT zz_report VALUES ('1000', 'A')
> INSERT zz_report VALUES ('1001', 'A')
> INSERT zz_report VALUES ('1002', 'A')
> INSERT zz_report VALUES ('1001', 'B')
> INSERT zz_report VALUES ('1002', 'B')
> INSERT zz_report VALUES ('1002', 'B')
>
> DELETE zz_report FROM zz_report R
> WHERE EXISTS
> (
> SELECT Z.partnumber
> FROM zz_report Z
> WHERE R.partnumber = Z.partnumber
> GROUP BY Z.customer, Z.partnumber
> HAVING count(*) <> 1
> )
>
> SELECT * FROM zz_report
> GO
> DROP TABLE zz_report
> GO|||Jim Underwood wrote:
> Try using a select, without deleting anything...
> Select a.Customer, a.partnumber
> from zz_report as a
> where not exists (select 1
> from zz_report b
> where b.partnumber = a.partnumber
> and b.customer <> a.customer)
Thanks Jim!
This works, at least as far as the small dataset I provided. I'm
running it now on my real data. Unfortunately, its taking quite a long
time to process (about a minute and a half). I'll report back my
findings in a few.|||Brian J. Ackermann wrote
> Thanks Jim!
> This works, at least as far as the small dataset I provided. I'm
> running it now on my real data. Unfortunately, its taking quite a long
> time to process (about a minute and a half). I'll report back my
> findings in a few.
>
Well, ly, its not what I am needing. The very first part I checked
(part "000001") is used in just about every single finished product
produced here, and consequently, is used by nearly every customer. I'll
poke around with it some, because the idea seems good to me...
Brian|||Brian J. Ackermann wrote:
> Well, ly, its not what I am needing. The very first part I checked
> (part "000001") is used in just about every single finished product
> produced here, and consequently, is used by nearly every customer. I'll
> poke around with it some, because the idea seems good to me...
> Brian
Okay, I found a bit of garbage data (some customers with NULL values).
Removing that, I've found that the first hundred items are perfect. The
logic seems strong, so I'm going to go with it.
But, do you have any suggestions for picking up the pace a little bit.
That little query you provided me with takes about 00:01:05 to run. The
table is about 19,000 rows. Perhaps theres some kind of optimization we
can try on it?
Thanks again, I appreciate your help.
Brian Ackermann|||Well, if you post full DDL showing your actual table and indexes, someone
may be able to offer a suggestion.
If partnumber and customer form the primary key of this table (or have a
unique constraint) , then you might try something like...
Select customer, partnumber from zz_report
where partnumber in (
select partnumber
from zz_report
group by partnumber
having count(customer) = 1
)
Having an index on partnumber may help in either case. Note, this is an
index with partnumber as the first column, not the second.
"Brian J. Ackermann" <brianj774@.allmail.net> wrote in message
news:446E1199.8070504@.allmail.net...
> Brian J. Ackermann wrote:
> Okay, I found a bit of garbage data (some customers with NULL values).
> Removing that, I've found that the first hundred items are perfect. The
> logic seems strong, so I'm going to go with it.
> But, do you have any suggestions for picking up the pace a little bit.
> That little query you provided me with takes about 00:01:05 to run. The
> table is about 19,000 rows. Perhaps theres some kind of optimization we
> can try on it?
> Thanks again, I appreciate your help.
> Brian Ackermann|||Jim Underwood wrote:
> Well, if you post full DDL showing your actual table and indexes, someone
> may be able to offer a suggestion.
The DDL is pretty much as I gave it. Its just a temporary table, with
only those two fields. I'll try putting an index on it, and see if that
make a big difference. I expect it will.
Thanks again for your excellent help.
Brian.|||Do you really need the temporary table? If the data is coming from an
existing table, why not access it directly? Or is this data coming from an
application, and not the database?
"Brian J. Ackermann" <brianj774@.allmail.net> wrote in message
news:hbSdnfRYaMeOh_PZRVn-uw@.giganews.com...
> Jim Underwood wrote:
someone
> The DDL is pretty much as I gave it. Its just a temporary table, with
> only those two fields. I'll try putting an index on it, and see if that
> make a big difference. I expect it will.
> Thanks again for your excellent help.
> Brian.
>|||Jim Underwood wrote:
> Do you really need the temporary table? If the data is coming from an
> existing table, why not access it directly? Or is this data coming from a
n
> application, and not the database?
>
I am not sure if I _need_ it, but I can't really think of a better way
to get to it. All of that data comes out of our Bill of Material
Tables, which is a hierarchical data structure, and some of our parts
are only 2 levels deep, and some are 5 levels deep, and even that could
change in the future.
Presently, I am using a cursor to loop over all our 'non-obsolete'
parts, and then a UDF which navigates the BOM, pulling out all the
components along the way. The results of this UDF I put into the temp
table.
I'm quite certain there are other ways to tackle this problem, but that
seemed the most likely to generate easily understood results.
Brian

No comments:

Post a Comment