Thursday, March 29, 2012

Deleting repeated data

The table contains data as follows,

SAP_CUS TOMER_NBRSAP_CUSTOMER_NBR1
A B
B A
C D
D C
C E
E C

A=B is same as B=A

The final table should be as follows,

SAP_CUSTOMER_NBRSAP_CUSTOMER_NBR1
A B
C D
C E

please help me in doing this

Quote:

Originally Posted by venkat81

The table contains data as follows,

SAP_CUS TOMER_NBRSAP_CUSTOMER_NBR1
A B
B A
C D
D C
C E
E C

A=B is same as B=A

The final table should be as follows,

SAP_CUSTOMER_NBRSAP_CUSTOMER_NBR1
A B
C D
C E

please help me in doing this


Provided the duplication set is as you say a single character swapped round like you indicate, the principle is to extract the SAME AS records as you point out and remove them. The logic being to identify the row to extract and leave you with the result set you want. So.. how can this be done?.

Each character will have an ASCII representation, a numeric value. 'A'=65 and B='66' Added together they make 131 so over two rows we are going to see two values of 131.

If we created a third column to store the ASCII value and worked on that we could extract out the MAXIMUM record ID identifying the row, extract those rows into a temporary table compare the tables against each other and remove the rows by the relevant SQL comparison on a LEFT INNER JOIN between the relevant tables.

One word of CAUTION here, The ascii values when summed will give a value remember that 8+5=12 so does 5+7 so the actual theory has limited scope
and is absolutely based on your data provided above, You do NOT want a summation giving you an erronous result value on which you base your delete.

Anyway below is a script I knocked up to assist you in demonstrating the HOWS!! not intended as the SOLUTION obviously as that is for you to deal with If you create a physical table ..like so

CREATE TABLE mytable (
[recid] [int] null ,
[sap_customer_nbr] [char] (1)null ,
[sap_customer_nbr1] [char] (1) null ,
[asciivalue] [int] null )

and populate it with data of the type you provided above and then run the script below in query analyser you will see that it works only on temporary tables, which you can adjust to suit your production environment when you are ready and happy it works for you.

--USE whateverdatabasenamehere --<<<replace database name with yours
SET NOCOUNT ON
--create a temporary table to store the values
CREATE TABLE #mytable (
[recid] [int] null ,
[sap_customer_nbr] [char] (1)null ,
[sap_customer_nbr1] [char] (1) null ,
[asciivalue] [int] null )
--and the insert into this table
INSERT #mytable
--values from the main production table
SELECT * from mytable --<<substitute your table name here the column orders must match
--then update the temporary table with --the ACSII numeric values representing the characters ie A and B summed together=131
UPDATE #mytable
SET asciivalue=ASCII(sap_customer_nbr) + ASCII(sap_customer_nbr1)
--create another temporary table
CREATE TABLE #mylink (
[asciivalue] [int] null ,
[recid] [int] null )
--into which we will insert
INSERT #mylink
-- the MAXIMUM RecordID (RecID) for each Asciivalue which essential only gives
-- ACTUAL recordID to identify the row that we wish ultimately to Delete from the table
SELECT #mytable.asciivalue, MAX(#mytable.recid) AS linkkey
FROM #mytable
GROUP BY #mytable.asciivalue

--the next two lines are left in to merely show you in query analyser the resultsets
SELECT * FROM #mytable
SELECT * FROM #mylink

--we now delete from the first temporary table the rows we wish to remove
DELETE #mytable
FROM #mylink INNER JOIN #mytable ON #mylink.recid = #mytable.recid
--and to finalise we show the results again in query analyser
SELECT * FROM #mytable
--and finally drop the temporary explicitly
DROP TABLE #mylink
DROP TABLE #mytable|||

Quote:

Originally Posted by Jim Doherty

Provided the duplication set is as you say a single character swapped round like you indicate, the principle is to extract the SAME AS records as you point out and remove them. The logic being to identify the row to extract and leave you with the result set you want. So.. how can this be done?.

Each character will have an ASCII representation, a numeric value. 'A'=65 and B='66' Added together they make 131 so over two rows we are going to see two values of 131.

If we created a third column to store the ASCII value and worked on that we could extract out the MAXIMUM record ID identifying the row, extract those rows into a temporary table compare the tables against each other and remove the rows by the relevant SQL comparison on a LEFT INNER JOIN between the relevant tables.

One word of CAUTION here, The ascii values when summed will give a value remember that 8+5=12 so does 5+7 so the actual theory has limited scope
and is absolutely based on your data provided above, You do NOT want a summation giving you an erronous result value on which you base your delete.

Anyway below is a script I knocked up to assist you in demonstrating the HOWS!! not intended as the SOLUTION obviously as that is for you to deal with If you create a physical table ..like so

CREATE TABLE mytable (
[recid] [int] null ,
[sap_customer_nbr] [char] (1)null ,
[sap_customer_nbr1] [char] (1) null ,
[asciivalue] [int] null )

and populate it with data of the type you provided above and then run the script below in query analyser you will see that it works only on temporary tables, which you can adjust to suit your production environment when you are ready and happy it works for you.

--USE whateverdatabasenamehere --<<<replace database name with yours
SET NOCOUNT ON
--create a temporary table to store the values
CREATE TABLE #mytable (
[recid] [int] null ,
[sap_customer_nbr] [char] (1)null ,
[sap_customer_nbr1] [char] (1) null ,
[asciivalue] [int] null )
--and the insert into this table
INSERT #mytable
--values from the main production table
SELECT * from mytable --<<substitute your table name here the column orders must match
--then update the temporary table with --the ACSII numeric values representing the characters ie A and B summed together=131
UPDATE #mytable
SET asciivalue=ASCII(sap_customer_nbr) + ASCII(sap_customer_nbr1)
--create another temporary table
CREATE TABLE #mylink (
[asciivalue] [int] null ,
[recid] [int] null )
--into which we will insert
INSERT #mylink
-- the MAXIMUM RecordID (RecID) for each Asciivalue which essential only gives
-- ACTUAL recordID to identify the row that we wish ultimately to Delete from the table
SELECT #mytable.asciivalue, MAX(#mytable.recid) AS linkkey
FROM #mytable
GROUP BY #mytable.asciivalue

--the next two lines are left in to merely show you in query analyser the resultsets
SELECT * FROM #mytable
SELECT * FROM #mylink

--we now delete from the first temporary table the rows we wish to remove
DELETE #mytable
FROM #mylink INNER JOIN #mytable ON #mylink.recid = #mytable.recid
--and to finalise we show the results again in query analyser
SELECT * FROM #mytable
--and finally drop the temporary explicitly
DROP TABLE #mylink
DROP TABLE #mytable


i have not read the entire solution, but i have one comment, 8 + 5 <> 12...|||

Quote:

Originally Posted by ck9663

i have not read the entire solution, but i have one comment, 8 + 5 <> 12...


It is where I come from hahahaha

No comments:

Post a Comment