List of Natural Language Processing Tools in Regards to Sentiment Analysis - Which one do you recommend?
-
first up sorry for my not so perfect English... I am from Germany ;) So, for a research project of mine (Bachelor thesis) I need to analyze the sentiment of tweets about certain companies and brands (in English). For this purpose I will need to script my own program / use some sort of modified open source code (no APIs' - I need to understand what is happening). Below you will find a list of some of the NLP Applications I found. My Question now is which one and which approach would you recommend? And which one does not require long nights adjusting the code? For example: When I screen twitter for the music player >iPod< and someone writes: "It's a terrible day but at least my iPod makes me happy" or even harder: "It's a terrible day but at least my iPod makes up for it" Which software is smart enough to understand that the focused is on iPod and not the weather? Also which software is scalable / resource efficient (I want to analyze several tweets and don't want to spend thousands of dollars)? Machine learning and data mining Weka - is a collection of machine learning algorithms for data mining. It is one of the most popular text classification frameworks. It contains implementations of a wide variety of algorithms including Naive Bayes and Support Vector Machines (SVM, listed under SMO) [Note: Other commonly used non-Java SVM implementations are SVM-Light, LibSVM, and SVMTorch]. A related project is Kea (Keyphrase Extraction Algorithm) an algorithm for extracting keyphrases from text documents. Apache Lucene Mahout - An incubator project to created highly scalable distributed implementations of common machine learning algorithms on top of the Hadoop map-reduce framework. NLP Tools LingPipe - (not technically 'open-source, see below) Alias-I's Lingpipe is a suite of java tools for linguistic processing of text including entity extraction, speech tagging (pos) , clustering, classification, etc... It is one of the most mature and widely used open source NLP toolkits in industry. It is known for it's speed, stability, and scalability. One of its best features is the extensive collection of well-written tutorials to help you get started. They have a list of links to competition, both academic and industrial tools. Be sure to check out their blog. LingPipe is released under a royalty-free commercial license that includes the source code, but it's not technically 'open-source'. OpenNLP - hosts a variety of java-based NLP tools which perform sentence detection, tokenization, part-of-speech tagging, chunking and parsing, named-entity detection, and co-reference analysis using the Maxent machine learning package. Stanford Parser and Part-of-Speech (POS) Tagger - Java packages for sentence parsing and part of speech tagging from the Stanford NLP group. It has implementations of probabilistic natural language parsers, both highly optimized PCFG and lexicalized dependency parsers, and a lexicalized PCFG parser. It's has a full GNU GPL license. OpenFST - A package for manipulating weighted finite state automata. These are often used to represented a probablistic model. They are used to model text for speech recognition, OCR error correction, machine translation, and a variety of other tasks. The library was developed by contributors from Google Research and NYU. It is a C++ library that is meant to be fast and scalable. NTLK - The natural language toolkit is a tool for teaching and researching classification, clustering, speech tagging and parsing, and more. It contains a set of tutorials and data sets for experimentation. It is written by Steven Bird, from the University of Melbourne. Opinion Finder - A system that performs subjectivity analysis, automatically identifying when opinions, sentiments, speculations and other private states are present in text. Specifically, OpinionFinder aims to identify subjective sentences and to mark various aspects of the subjectivity in these sentences, including the source (holder) of the subjectivity and words that are included in phrases expressing positive or negative sentiments. Tawlk/osae - A python library for sentiment classification on social text. The end-goal is to have a simple library that "just works". It should have an easy barrier to entry and be thoroughly documented. We have acheived best accuracy using stopwords filtering with tweets collected on negwords.txt and poswords.txt GATE - GATE is over 15 years old and is in active use for all types of computational task involving human language. GATE excels at text analysis of all shapes and sizes. From large corporations to small startups, from â¬multi-million research consortia to undergraduate projects, our user community is the largest and most diverse of any system of this type, and is spread across all but one of the continents1. textir - A suite of tools for text and sentiment mining. This includes the âmnlmâ function, for sparse multinomial logistic regression, âplsâ, a concise partial least squares routine, and the âtopicsâ function, for efficient estimation and dimension selection in latent topic models. NLP Toolsuite - The JULIE Lab here offers a comprehensive NLP tool suite for the application purposes of semantic search, information extraction and text mining. Most of our continuously expanding tool suite is based on machine learning methods and thus is domain- and language independent. ... On a side note: Would you recommend EC2 as a streaming intermidiate? As to me, I am a fan of python and java ;) Thanks a lot for your help!!!
-
Answer:
To analyze the sentiment of tweets, one approach is to use the label propagation algorithm as detailed in: Twitter Polarity Classification with Label Propagation over Lexical Links and the Follower Graph This approach can be made to run faster by running the label propagation algorithm in a distributed environment. Hadoop is the best bet to start with distributed computation. There is an existing implementation of LP algorithm on graphs: https://github.com/parthatalukdar/junto Junto consists of both distributed and non-distribtued versions of LP algorithm.
Hemanth Kumar Mantri at Quora Visit the source
Related Q & A:
- If you were to start using a Wordpress framework today, which one would you use?Best solution by WordPress
- Which one do u like better hollywood or bollywood?Best solution by Yahoo! Answers
- Which one is the best Turkish Restaurant in Singapore?Best solution by yelp.com.sg
- Which camcorder do you recommend?Best solution by Yahoo! Answers
- Which bicycle would you recommend?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.