Sunday, March 25, 2012

Deleting lines from a text file

I have a flat file that look like this

############################################################
# Market Issue Lookup
# Applies to: Muni,Pfd,Govt,Corp
###########################################################
LU_MARKET_ISSUE| |AUSTRALIAN|Corp|
LU_MARKET_ISSUE| |BULLDOG|Corp|
LU_MARKET_ISSUE| |CANADIAN|Corp|
LU_MARKET_ISSUE| |WARRANTS|Muni|
LU_MARKET_ISSUE| |YANKEE|Corp|

############################################################
# Maturity Type Lookup

and i want it to look like this

LU_MARKET_ISSUE| |AUSTRALIAN|Corp|
LU_MARKET_ISSUE| |BULLDOG|Corp|
LU_MARKET_ISSUE| |CANADIAN|Corp|
LU_MARKET_ISSUE| |WARRANTS|Muni|
LU_MARKET_ISSUE| |YANKEE|Corp|

Basically remove any line that start with a "#" or any blank lines..

I am assuming you can do this only using a script component and not directly through ssis..but i am not too familiar with scripting...so some code would be helpful

Thanks for any help in advance.

smathew

You don't need a script component for that. Use a derived colum to trim leading spaces and then a conditional split to discard all rows that start with '#'|||

but when it reads the file.. I am assuming you have to use a Flat File Source...

in that case all the lines are read into a single column...

true that you will be able to delete empty lines and lines that start with a #, but you end up having the rest in a single column instead of having 4 coulmns.

|||

You could add a second dataflow where you read the file after removing the '#' rows. This time, the flat file source has 4 columns instead a single one. This is just in case you don't want to use the script component. A drawback is that you need to read the file twice.

|||Given your example, I'd follow Rafael's recommendation. If the file is large, and you don't want to process it twice, take a look at http://agilebi.com/cs/blogs/jwelch/archive/2007/05/08/handling-flat-files-with-varying-numbers-of-columns.aspx. This has some example script for taking in the flat file and parsing it. You could alter the script slightly to look for and discard the "#" rows.

No comments:

Post a Comment