What's the best way to extract phrases from a corpus of text using Python?
-
I've been looking into some of the features of NLTK, and the section on "Information Extraction" seems to be along the lines of what I'm looking to do: http://nltk.googlecode.com/svn/trunk/doc/book/ch07.html Am I headed in the right direction? The limitation seems to be that it is only truly effective at chunking noun phrases. This is a follow-up question to .
-
Answer:
To Train nltk chunker check out the link(http://streamhacker.com/2009/02/23/chunk-extraction-with-nltk/) Like NLTK, Montilingua( http://web.media.mit.edu/~hugo/montylingua/) is end to end NLP framework, it uses common sense knowledge, and don't require training. It also supports python. You can use Montilingua chunker. If you want to extract key phrases. Then use python term extractor(http://pypi.python.org/pypi/topia.termextract/), it uses POS tag rule to extract important phrases.
Vineet Yadav at Quora Visit the source
Other answers
It depends on how you define a phrase.My answer is regarding collocations, which are roughly token combinations appearing in a text more than they are statistically likely to appear (e.g. In many texts the phrase "San Francisco" appears more than is expected by the individual frequencies of the tokens "San" and "Francisco").For some theory, please see Chapter 5 from Foundations of Statistical Natural Language Processing (Manning & Schutze): http://nlp.stanford.edu/fsnlp/promo/colloc.pdf https://radimrehurek.com/gensim/models/phrases.html and http://www.nltk.org/howto/collocations.html show how to find collocations using gensim and NLTK, respectively.This blog post by Mark Needham gives a nice explanation of using gensim to find phrases:http://www.markhneedham.com/blog/2015/02/12/pythongensim-creating-bigrams-over-how-i-met-your-mother-transcripts/
Yuval Feinstein
How about the solution found here: https://github.com/cirlabs/citizen-quotes/ It's a wonderful solution by https://twitter.com/chasedavis
Shola Smith
There're quite a lot of text processing examples at http://streamhacker.com - and they are using NTLK.
Dima Kuchin
After many hours of checking various API, we've decided to go with TextRazor.Quality of NLP phrase extraction / classification results is superb - TextRazor uses Freebase and DBpedia (among other repositories) and this allows TextRazor to classify / categorize / extract PHRASES such as "computer security" - correctly as one entity (and not as many other APIs - incorrectly classifying this example as one class of "computer" AND another class as "security"). Programmatic control over which terms TextRazor will use and which ones will not - is again, very simple.In terms of speed - TextRazor is amazingly fast. If I understand correctly, it uses parallel computing on many (hundreds ? thousands?) of Amazon on-demand machines.Cost - we compared it to others and did an in-depth analysis with one of their competitors (a very large 3 letters company) - and they are definitely competitive and reasonable.Integration with their API using Python was (relatively) straight-forward, except some minor issue with https when working locally on a Web2Py framework. If you hit an obstacle while using TextRazor on Web2Py locally - feel free to ping me and I'll gladly share our solution.Service / support - almost instantaneous - they usually reply within 12 hours to all inquiries.Disclosure - I have no interests, shares or any other financial benefits related to TextRazor and we are actually still on their free plan - so we didn't pay them yet for their API services.
Dan Toren
Related Q & A:
- What's the best way to hook up an overhead projector to a laptop?Best solution by Yahoo! Answers
- What's the best way to start a small clothing line business?Best solution by Yahoo! Answers
- What's the best way to get a job in a restaurant?Best solution by Yahoo! Answers
- What's the best way to get smudges off of a plasma?Best solution by Yahoo! Answers
- What's the best way to make a good impression at a job interview?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.