Showing posts with label account_id. Show all posts
Showing posts with label account_id. Show all posts

Thursday, March 22, 2012

Deleting dupes in special cases

I need to delete all rows that match at least one of the account_id values of another row *and* that has the same email address. However, if they have the same email address and none of the account_id values then I need to keep it. I've attached a sample dataset along with the expected results.

I have this:
DELETE [acctID_emailAddress_tmp] FROM [acctID_emailAddress_tmp]
JOIN
(select emailaddress, account_id, max(contact_id_tmp) max_cid
from [acctID_emailAddress_tmp]
group by emailaddress, account_id) AS tempImportTable
ON tempImportTable.[emailaddress] = [acctID_emailAddress_tmp].[emailaddress]
WHERE [acctID_emailAddress_tmp].[contact_id_tmp] < tempImportTable.[max_cid]
AND tempImportTable.[account_id] = [acctID_emailAddress_tmp].[account_id];

but it doesn't work since it's keeping the subset of the dupe row(s).

Can someone shed some light?

TIASo why do you keep 2 instead of 1?

EDIT: You won't be able to answer that one..unless you say that because it comes "first"...is that it?|||Because it comes first :)|||OK

USE Northwind
GO

SET NOCOUNT ON
CREATE TABLE myTable99(temp_id int, contact_id_tmp int, account_id int, emailAddress varchar(255))
GO

INSERT INTO myTable99(temp_id, contact_id_tmp, account_id, emailAddress)
SELECT 1, 1, 3, 'test@.acme.com' UNION ALL
SELECT 2, 1, 5, 'test@.acme.com' UNION ALL
SELECT 3, 1, 10, 'test@.acme.com' UNION ALL
SELECT 4, 2, 12, 'test@.acme.com' UNION ALL
SELECT 5, 2, 3, 'test@.acme.com' UNION ALL
SELECT 6, 2, 9, 'test@.acme.com'
GO

SELECT *
FROM ( SELECT * FROM myTable99 l
WHERE contact_id_tmp IN (
SELECT MAX(contact_id_tmp)
FROM (
SELECT contact_id_tmp, emailAddress
FROM myTable99
GROUP BY contact_id_tmp, emailAddress
HAVING COUNT(*) > 1) AS XXX)
) AS L
JOIN ( SELECT * FROM myTable99 l
WHERE contact_id_tmp NOT IN (
SELECT MAX(contact_id_tmp)
FROM (
SELECT contact_id_tmp, emailAddress
FROM myTable99
GROUP BY contact_id_tmp, emailAddress
HAVING COUNT(*) > 1) AS XXX)
) AS R
ON L.account_id = R.account_Id
GO

SET NOCOUNT OFF
DROP TABLE myTable99
GO

You can convert that in to a delete...let me know if you can't|||Thanks very much. Looks like this will do the trick if I ever figure out the delete statement for it =\|||Did you cut and paste the example in to QA and run it?

Also, I would use that as SELECT to see the population BEFORE you unleash the delete...oh and make sure you back up the table BEFORE you do the DELETE|||Yes, I ran it in QA and got the expected results. I need to delete R.contact_id_tmp

Currently trying to figure out the delete statement...|||Come on...that's too easy

DELETE FROM myTable99 WHERE contact_id_tmp IN (
SELECT R.contact_id_tmp
FROM ( SELECT * FROM myTable99 l
WHERE contact_id_tmp IN (
SELECT MAX(contact_id_tmp)
FROM (
SELECT contact_id_tmp, emailAddress
FROM myTable99
GROUP BY contact_id_tmp, emailAddress
HAVING COUNT(*) > 1) AS XXX)
) AS L
JOIN ( SELECT * FROM myTable99 l
WHERE contact_id_tmp NOT IN (
SELECT MAX(contact_id_tmp)
FROM (
SELECT contact_id_tmp, emailAddress
FROM myTable99
GROUP BY contact_id_tmp, emailAddress
HAVING COUNT(*) > 1) AS XXX)
) AS R
ON L.account_id = R.account_Id
)

GO

SELECT * FROM myTable99
GO|||OK. This just got uglier... Turns out accounts will be stored as a comma separated list in one field. See the attachment for an example.

Now, is it possible to de-dupe from a comma separated list all in one field? I know I could create a new table and loop over each account-emailAddress combo, but I'd like to know if it's possible to avoid doing this (since it will be much slower) and de-dupe straight from the source table.

Thanks again|||Read this

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=25830&SearchTerms=udf,csv,string

Then create a new table where all the values are on 1 row, then use the query in this thread|||I tried the UDF but I'm getting 0 rows back. I've attached the sample dataset (1 row) and the UDF call.|||You really should normalized your data...I feel I'm doing more harm than good...

EDIT: This is pure theft from this Article (http://www.sqlteam.com/item.asp?ItemID=2652) written by A most Valuable Yak (http://www.sqlteam.com/forums/pop_profile.asp?mode=display&id=1176) , Rob Volk (http://weblogs.sqlteam.com/robv/)

USE Northwind
GO

SET NOCOUNT ON
SET NOCOUNT ON
CREATE TABLE myTable99(temp_id int IDENTITY(1,1), contact_id_tmp int, account_id int, emailAddress varchar(255))
CREATE TABLE myTable00(contact_id int, accounts varchar(200), emailAddress varchar(255))
CREATE TABLE Tally(ID int)
GO

INSERT INTO myTable00(contact_id, accounts, emailAddress)
SELECT 1, '1,2,3,10,15', 'test@.acme.com' UNION ALL
SELECT 2, '5,10,20,30', 'test@.acme.com' UNION ALL
SELECT 3, '88,42', 'test@.acme.com' UNION ALL
SELECT 4, '1,2,3', 'test2@.acme.com'
GO

DECLARE @.x int
SELECT @.x = 1
WHILE @.x < 100
BEGIN
INSERT INTO Tally(ID) SELECT @.x
SELECT @.x = @.x + 1
END
GO

INSERT INTO myTable99(contact_id_tmp, Account_Id, emailAddress)
SELECT contact_id AS contact_id_tmp,
NullIf(SubString(',' + accounts + ',' , ID , CharIndex(',' , ',' + accounts + ',' , ID) - ID) , '') AS Account_Id,
emailAddress
FROM Tally, myTable00
WHERE ID <= Len(',' + accounts + ',') AND SubString(',' + accounts + ',' , ID - 1, 1) = ','
AND CharIndex(',' , ',' + accounts + ',' , ID) - ID > 0 --remove this line to keep NULL rows

SELECT * FROM myTable99
GO

-- Now just add the other SQL

SET NOCOUNT OFF
DROP TABLE myTable99
DROP TABLE myTable00
DROP TABLE Tally
GO|||This is exactly what I needed, but...users have decided to go back and separate each account in individual rows!!

Thanks a lot for your help. I guess I'll check out sqlteam.com from now on before posting here. :)|||Even if they do...the data is still not normalized if you need an operation like this...anyway, it was a neat exercise...|||This is just 1/3 of the entire process. The data is normalized in its final state.