Convert text with whitespace to csv
-
perl or sed or awk help for newbie I am trying to get my tax preparation done and so I need to get info into Moneydance. One of my credit card accounts does not give ofx downloads, just pdfs of the monthly account summaries, and I have managed to extract the text from the pdfs and I now need to transform the text into csv format to import it. I have roughly a thousand transactions to process. I have MacOS 10.7 and can use the terminal but no regex skills. The text is currently formatted as MMM DD MMM DD PAYEE INFORMATION CITY PROVINCE TRANSACTIONNUMBER COMMENT $AMOUNT and here is an example of two records DEC 19 DEC 21 RADIO PARADISE 530¯872¯4993 CA 85450930354980007961385 Foreign Currency¯USD 20.00 Exchange rate¯1.046500 $20.93 DEC 23 DEC 24 SUNTERRA LENDRUM MA CALGARY AB 55181360357461606795546 $101.45 How can I transform this to a csv file such as MMM DD, MMM DD, PAYEE INFORMATION CITY PROVINCE, TRANSACTIONNUMBER COMMENT, AMOUNT which I can import into my financial software?
-
Answer:
You can do some hacky perl. Here's the perl: #!/usr/bin/perl my %times; my $output = ""; while ( ){ my $entry = $_; my $date1; my $date2; my $payee; my $transactionNum; my $comment; my $amount; if ($entry =~ m/([A-Z]{3}\s+\d{2})\s+([A-Z]{3}\s+\d{2})(.*)/){ $date1 = $1; $date2 = $2; $payee = $3; $output .= "$date1, $date2, $payee,"; }elsif ($entry =~ m/(\d{10,})\s+(.*?)\n/){ $transactionNum = $1; $comment = $2; $output .= "$transactionNum, $comment"; }elsif ($entry =~ m/(\$\d+.*)/){ $amount = $1; $output .= "$amount\n"; } } print "$output\n"; -------end of perl---- usage is like cat | perl I just tested, it seems to work: cat testinput| / perl test-script DEC 19, DEC 21, RADIO PARADISE 530¯872¯4993 CA,85450930354980007961385, Foreign Currency¯USD 20.00 Exchange rate¯1.046500$20.93 DEC 23, DEC 24, SUNTERRA LENDRUM MA CALGARY AB,$101.45>
v-tach at Ask.Metafilter.Com Visit the source
Other answers
The input file has only white space and returns.
v-tach
Sucess!!! with both scripts!!! Thanks to introp and lyra4!!!
v-tach
Just checked Smultron preferences and I had it set to leave line endings alone, changed it to Unix line endings and will try again.
v-tach
Your banktest.txt *does* have Mac line endings. It looks like your version of awk doesn't appreciate that, for some reason. (Or, at least, I get the same strange-looking parse errors as you do without running mac2unix on the stream.) If you don't have mac2unix on your machine, trycat banktest.txt | sed -e 's/\r/\n/g' | ./records.awk
introp
Tried the acrobat paste-still not formatted. I saved the file as text with Smultron, perhaps it didn't save with mac line endings?
v-tach
Hm. I suppose I could've used sed to convert the format, too.
retypepassword
That's funny you noted that, introp. I've been trying to do it with sed, and it doesn't like the Mac line endings, either, so I did: perl -p -e 's/\r/\n/g' < banktest.txt > banktestfixed.txt, and then sed behaved properly.
retypepassword
Well, I can process your data file with no trouble *if* I convert your Mac line endings into my platform's format: $ cat banktest.txt | mac2unix | ./records.awk DEC 13,DEC 14,AMAZON.CA AMAZON.CA ON ,55490530347000817023251,$147.97 DEC 13,DEC 15,CRESTWOOD APOTHECARY(P EDMONTON AB ,75259110347920789651500,$147.82 DEC 14,DEC 15,ITUNES 800¯676¯2775 ON ,55490530348000069666079,$14.99 DEC 15,DEC 17,ORIGINAL JOEâS EDMONTON AB ,55134420350800116193147,$128.93 DEC 15,DEC 16,MOUNTAIN EQUIP CO¯OP EDMONTON AB,55134420349800172792422,$32.55 DEC 16,DEC 17,SOUNDSPECTRUM INC 02123444400 NY ,55460290350200229000282 Foreign Currency¯USD 19.95 Exchange rate¯1.032581,$20.60 DEC 16,DEC 17,THE WILDBIRD GENERA EDMONTON AB,55181360350463608305461,$76.59 ... etc. etc. etc. If I don't include the mac2unix, I get trashed lines like you do. I would've *thought* all Mac utilities would understand native Mac line endings ('\r'), but apparently not?
introp
What about copying the tabular data from the PDF into a spreadsheet? Opening a PDF in Acrobat will let you "Copy as Table" and paste that into whatever spreadsheet you prefer. Performing transformations from there on the spreadsheet should be cake after that.
sub-culture
Related Q & A:
- How to read csv file using php?Best solution by Stack Overflow
- How to merge multiple CSV files into a single CSV file?Best solution by solveyourtech.com
- How to skip columns empty in CSV file when importing into MySQL table using LOAD DATA INFILE?Best solution by Stack Overflow
- How to convert Json to CSV or Excel?Best solution by Stack Overflow
- How to covert csv file to excel and back excel file to csv in python?Best solution by completecampaigns.com
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.