How to extract information from text file in Python?

Convert text with whitespace to csv

  • perl or sed or awk help for newbie I am trying to get my tax preparation done and so I need to get info into Moneydance. One of my credit card accounts does not give ofx downloads, just pdfs of the monthly account summaries, and I have managed to extract the text from the pdfs and I now need to transform the text into csv format to import it. I have roughly a thousand transactions to process. I have MacOS 10.7 and can use the terminal but no regex skills. The text is currently formatted as MMM DD MMM DD PAYEE INFORMATION CITY PROVINCE TRANSACTIONNUMBER COMMENT $AMOUNT and here is an example of two records DEC 19 DEC 21 RADIO PARADISE 530¯872¯4993 CA 85450930354980007961385 Foreign Currency¯USD 20.00 Exchange rate¯1.046500 $20.93 DEC 23 DEC 24 SUNTERRA LENDRUM MA CALGARY AB 55181360357461606795546 $101.45 How can I transform this to a csv file such as MMM DD, MMM DD, PAYEE INFORMATION CITY PROVINCE, TRANSACTIONNUMBER COMMENT, AMOUNT which I can import into my financial software?

  • Answer:

    You can do some hacky perl. Here's the perl: #!/usr/bin/perl my %times; my $output = ""; while ( ){ my $entry = $_; my $date1; my $date2; my $payee; my $transactionNum; my $comment; my $amount; if ($entry =~ m/([A-Z]{3}\s+\d{2})\s+([A-Z]{3}\s+\d{2})(.*)/){ $date1 = $1; $date2 = $2; $payee = $3; $output .= "$date1, $date2, $payee,"; }elsif ($entry =~ m/(\d{10,})\s+(.*?)\n/){ $transactionNum = $1; $comment = $2; $output .= "$transactionNum, $comment"; }elsif ($entry =~ m/(\$\d+.*)/){ $amount = $1; $output .= "$amount\n"; } } print "$output\n"; -------end of perl---- usage is like cat | perl I just tested, it seems to work: cat testinput| / perl test-script DEC 19, DEC 21, RADIO PARADISE 530¯872¯4993 CA,85450930354980007961385, Foreign Currency¯USD 20.00 Exchange rate¯1.046500$20.93 DEC 23, DEC 24, SUNTERRA LENDRUM MA CALGARY AB,$101.45>

v-tach at Ask.Metafilter.Com Visit the source

Was this solution helpful to you?

Other answers

The input file has only white space and returns.

v-tach

Sucess!!! with both scripts!!! Thanks to introp and lyra4!!!

v-tach

Just checked Smultron preferences and I had it set to leave line endings alone, changed it to Unix line endings and will try again.

v-tach

Your banktest.txt *does* have Mac line endings. It looks like your version of awk doesn't appreciate that, for some reason. (Or, at least, I get the same strange-looking parse errors as you do without running mac2unix on the stream.) If you don't have mac2unix on your machine, trycat banktest.txt | sed -e 's/\r/\n/g' | ./records.awk

introp

Tried the acrobat paste-still not formatted. I saved the file as text with Smultron, perhaps it didn't save with mac line endings?

v-tach

Hm. I suppose I could've used sed to convert the format, too.

retypepassword

That's funny you noted that, introp. I've been trying to do it with sed, and it doesn't like the Mac line endings, either, so I did: perl -p -e 's/\r/\n/g' < banktest.txt > banktestfixed.txt, and then sed behaved properly.

retypepassword

Well, I can process your data file with no trouble *if* I convert your Mac line endings into my platform's format: $ cat banktest.txt | mac2unix | ./records.awk DEC 13,DEC 14,AMAZON.CA AMAZON.CA ON ,55490530347000817023251,$147.97 DEC 13,DEC 15,CRESTWOOD APOTHECARY(P EDMONTON AB ,75259110347920789651500,$147.82 DEC 14,DEC 15,ITUNES 800¯676¯2775 ON ,55490530348000069666079,$14.99 DEC 15,DEC 17,ORIGINAL JOE’S EDMONTON AB ,55134420350800116193147,$128.93 DEC 15,DEC 16,MOUNTAIN EQUIP CO¯OP EDMONTON AB,55134420349800172792422,$32.55 DEC 16,DEC 17,SOUNDSPECTRUM INC 02123444400 NY ,55460290350200229000282 Foreign Currency¯USD 19.95 Exchange rate¯1.032581,$20.60 DEC 16,DEC 17,THE WILDBIRD GENERA EDMONTON AB,55181360350463608305461,$76.59 ... etc. etc. etc. If I don't include the mac2unix, I get trashed lines like you do. I would've *thought* all Mac utilities would understand native Mac line endings ('\r'), but apparently not?

introp

What about copying the tabular data from the PDF into a spreadsheet? Opening a PDF in Acrobat will let you "Copy as Table" and paste that into whatever spreadsheet you prefer. Performing transformations from there on the spreadsheet should be cake after that.

sub-culture

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.