Using grep to extract x characters of text after a predictable pattern
-
I'm trying to work out how to use something like grep or sed or awk (or maybe even some Perl) to extract a string of characters which appears in a predictable place in a series of text files. Help/advice/tutorials very welcome! I have a number of text files in a Google Drive folder, which gets replicated on my Mac, and a new one gets added every day. The files are plain text and they aren't structured formally (as in, they're not XML or anything) but they each contain a certain string ("Total Portfolio Value") which is always followed by a space, followed by "$xx,xxx.xx", followed by five more spaces. The value of $xx,xxx.xx changes every day, and that's what I'm trying to extract, to put into a separate file. I can use Automator to check whenever a new file appears and run a shell script on the file, so I'm trying to work out what goes in the shell script. As much as anything else I'm using this as a practical exercise to teach myself a bit about text processing using grep/sed/awk, Perl and regular expressions (any/all of the above!) so just a few pointers about the best approach the contents of the shell script would be great!
-
Answer:
And perhaps you then want to extract just the price, which you can do as egrep -o 'Total Portfolio Value \$[^ ]*' /tmp/foo | cut -d$ -f2 | tr -d ,. (Because | means "send the output of the previous command to be the input of the subsequent one", and cut -d$ -f2 means "split the input at every $, and extract the second field", and tr -d , means "delete all the commas".)
infinitejones at Ask.Metafilter.Com Visit the source
Other answers
Here's a solution with egrep and sed that searches for 'Total Portfolio Value', works if there are other numbers in the file (empath's solution returns all the $x,x.x in a file, which may not be what you want), and throws away the commas and the dollar sign. $ cat test.txt Total Portfolio Value $11,1234.56 more text $99,999.99 Extra line $ cat test.txt | egrep -o 'Total Portfolio Value \$[^ ]*' | sed -e 's/^Total\ Portfolio\ Value\ \$//' | sed -e 's/,//' 111234.56
caek
grep -o '\$[0-9]*,[0-9]*\.[0-9]*'
empath
that's kind of a quick and dirty way of doing it, it'll basically match $*,*.* where * is any number of digits.
empath
Or similarly egrep -o 'Total Portfolio Value \$[^ ]*' /your/file, which will match "Total Portfolio Value " and then everything up to the next space. (Because [^ ] means "anything that isn't a space", so [^ ]* means "as many characters as possible as long as they're not a space", and egrep -o means "search for the following and only return the result of the match".)
katrielalex
In some of these examples, the [^ ]* should really be [^ ]+, i.e. require there to be at least one non-space character, * matches any number including 0, so some the examples above would match a line ending in "Total Portfolio Value $". Changing it to [^[:blank:]]+ would be more robust (catch tabs too) but wouldn't guarantee that it was a number which followed the '$' sign. [[:digit:],]+ is better still but would match an arbitrary number of commas with no digits.
epo
These all look brilliant. Lots of detail for me to get my head around, which is what I was hoping for! Thanks very much everyone.
infinitejones
None of the answers offered actually match your specification. This will print the number found after "Total Portfolio Value" and a space and a dollar sign, with five spaces after it:perl -lne 'print $1 if /Total Portfolio Value \$([\d.,]+ {5})/' input_file.txt
nicwolff
Related Q & A:
- How can I echo characters before and after a string?Best solution by stackoverflow.com
- Is it a good idea to extract two million barrels of kerogen a day from the Green River Formation in Colorado?Best solution by Yahoo! Answers
- How can I make JAXB-generated classes participate in a visitor pattern?Best solution by Stack Overflow
- Who are the characters in 'It's A Wonderful Life?Best solution by imdb.com
- How do you text from a computer to a Sprint phone?Best solution by eHow old
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.