I need to delete all rows that match at least one of the account_id values of another row *and* that has the same email address. However, if they have the same email address and none of the account_id values then I need to keep it. I've attached a sample dataset along with the expected results.
I have this:
DELETE [acctID_emailAddress_tmp] FROM [acctID_emailAddress_tmp]
JOIN
(select emailaddress, account_id, max(contact_id_tmp) max_cid
from [acctID_emailAddress_tmp]
group by emailaddress, account_id) AS tempImportTable
ON tempImportTable.[emailaddress] = [acctID_emailAddress_tmp].[emailaddress]
 WHERE [acctID_emailAddress_tmp].[contact_id_tmp] < tempImportTable.[max_cid]
 AND tempImportTable.[account_id] = [acctID_emailAddress_tmp].[account_id];
but it doesn't work since it's keeping the subset of the dupe row(s).
Can someone shed some light?
TIASo why do you keep 2 instead of 1?
EDIT:  You won't be able to answer that one..unless you say that because it comes "first"...is that it?|||Because it comes first  :)|||OK
USE Northwind
GO
SET NOCOUNT ON
CREATE TABLE myTable99(temp_id int, contact_id_tmp int, account_id int, emailAddress varchar(255))
GO
INSERT INTO myTable99(temp_id, contact_id_tmp, account_id, emailAddress)    
SELECT 1,           1,              3,           'test@.acme.com' UNION ALL
SELECT 2,           1,              5,           'test@.acme.com' UNION ALL
SELECT 3,           1,              10,          'test@.acme.com' UNION ALL
SELECT 4,           2,              12,          'test@.acme.com' UNION ALL
SELECT 5,           2,              3,           'test@.acme.com' UNION ALL
SELECT 6,           2,              9,           'test@.acme.com'
GO
SELECT * 
  FROM ( SELECT * FROM myTable99 l 
    WHERE contact_id_tmp IN (
 SELECT MAX(contact_id_tmp) 
 FROM (
    SELECT contact_id_tmp, emailAddress
      FROM myTable99
  GROUP BY contact_id_tmp, emailAddress
    HAVING COUNT(*) > 1) AS XXX)
 ) AS L
  JOIN ( SELECT * FROM myTable99 l 
    WHERE contact_id_tmp NOT IN (
 SELECT MAX(contact_id_tmp) 
 FROM (
    SELECT contact_id_tmp, emailAddress
      FROM myTable99
  GROUP BY contact_id_tmp, emailAddress
    HAVING COUNT(*) > 1) AS XXX)
 ) AS R
    ON L.account_id = R.account_Id
GO
SET NOCOUNT OFF
DROP TABLE myTable99
GO
You can convert that in to a delete...let me know if you can't|||Thanks very much. Looks like this will do the trick if I ever figure out the delete statement for it  =\|||Did you cut and paste the example in to QA and run it?
Also, I would use that as SELECT to see the population BEFORE you unleash the delete...oh and make sure you back up the table BEFORE you do the DELETE|||Yes, I ran it in QA and got the expected results. I need to delete R.contact_id_tmp
Currently trying to figure out the delete statement...|||Come on...that's too easy
DELETE FROM myTable99 WHERE contact_id_tmp IN (
SELECT R.contact_id_tmp
  FROM ( SELECT * FROM myTable99 l 
    WHERE contact_id_tmp IN (
 SELECT MAX(contact_id_tmp) 
 FROM (
    SELECT contact_id_tmp, emailAddress
      FROM myTable99
  GROUP BY contact_id_tmp, emailAddress
    HAVING COUNT(*) > 1) AS XXX)
 ) AS L
  JOIN ( SELECT * FROM myTable99 l 
    WHERE contact_id_tmp NOT IN (
 SELECT MAX(contact_id_tmp) 
 FROM (
    SELECT contact_id_tmp, emailAddress
      FROM myTable99
  GROUP BY contact_id_tmp, emailAddress
    HAVING COUNT(*) > 1) AS XXX)
 ) AS R
    ON L.account_id = R.account_Id
)
GO
SELECT * FROM myTable99
GO|||OK. This just got uglier... Turns out accounts will be stored as a comma separated list in one field. See the attachment for an example.
Now, is it possible to de-dupe from a comma separated list all in one field? I know I could create a new table and loop over each account-emailAddress combo, but I'd like to know if it's possible to avoid doing this (since it will be much slower) and de-dupe straight from the source table.
Thanks again|||Read this
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=25830&SearchTerms=udf,csv,string
Then create a new table where all the values are on 1 row, then use the query in this thread|||I tried the UDF but I'm getting 0 rows back. I've attached the sample dataset (1 row) and the UDF call.|||You really should normalized your data...I feel I'm doing more harm than good...
EDIT:  This is pure theft from this Article (http://www.sqlteam.com/item.asp?ItemID=2652)  written by A most Valuable Yak (http://www.sqlteam.com/forums/pop_profile.asp?mode=display&id=1176) , Rob Volk (http://weblogs.sqlteam.com/robv/) 
USE Northwind
GO
SET NOCOUNT ON
SET NOCOUNT ON
CREATE TABLE myTable99(temp_id int IDENTITY(1,1), contact_id_tmp int, account_id int, emailAddress varchar(255))
CREATE TABLE myTable00(contact_id int, accounts varchar(200), emailAddress varchar(255))
CREATE TABLE Tally(ID int)
GO
INSERT INTO myTable00(contact_id, accounts, emailAddress)
SELECT 1,  '1,2,3,10,15',  'test@.acme.com' UNION ALL
SELECT 2,  '5,10,20,30',  'test@.acme.com' UNION ALL
SELECT 3,  '88,42',  'test@.acme.com' UNION ALL
SELECT 4,  '1,2,3',  'test2@.acme.com'
GO
DECLARE @.x int
SELECT @.x = 1
WHILE @.x < 100
  BEGIN
 INSERT INTO Tally(ID) SELECT @.x
 SELECT @.x = @.x + 1
  END
GO
INSERT INTO myTable99(contact_id_tmp, Account_Id, emailAddress)
SELECT contact_id AS contact_id_tmp, 
NullIf(SubString(',' + accounts + ',' , ID , CharIndex(',' , ',' + accounts + ',' , ID) - ID) , '') AS Account_Id,
emailAddress
FROM Tally, myTable00 
WHERE ID <= Len(',' + accounts + ',') AND SubString(',' + accounts + ',' , ID - 1, 1) = ',' 
AND CharIndex(',' , ',' + accounts + ',' , ID) - ID > 0  --remove this line to keep NULL rows
SELECT * FROM myTable99
GO
-- Now just add the other SQL
SET NOCOUNT OFF
DROP TABLE myTable99
DROP TABLE myTable00
DROP TABLE Tally
GO|||This is exactly what I needed, but...users have decided to go back and separate each account in individual rows!!
Thanks a lot for your help. I guess I'll check out sqlteam.com from now on before posting here.  :)|||Even if they do...the data is still not normalized if you need an operation like this...anyway, it was a neat exercise...|||This is just 1/3 of the entire process. The data is normalized in its final state.
Subscribe to:
Post Comments (Atom)
 
No comments:
Post a Comment