Where do I initialize a database?

Where will I get the database/corpus which contains sentences that are classified into following emotion : 1) happy 2) sad 3) angry 4) surprise 5) disgust 6) neutral?

  • example: "I am angry" should belong to class 3. My task is to classify a sentence in the above mentioned six classes. I am using nltk as my classifier. Those of you who dont know what nltk is, here is the link http://www.nltk.org/. To train nltk I need a database/corpus of sentences which which are already classified in the above mentioned categories.  If you know about any such database/corpus or if you know simple and fast method to create my own database/corpus of such sentences then please point me to it.

  • Answer:

    http://www.lrec-conf.org/proceedings/lrec2012/pdf/201_Paper.pdf describes building a similar corpus. The authors might be willing to share the data. Rada Michalcea, an expert on emotions in text, built the Affective text dataset: http://www.cse.unt.edu/~rada/downloads.html#affective Alternatively, you can use semi-supervised learning - start with a small set of sentences you label with emotions manually, and then use sentence similarity (say the EM algorithm, as in http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.154.3651&rep=rep1&type=pdf) to automatically label other sentences.

Yuval Feinstein at Quora Visit the source

Was this solution helpful to you?

Other answers

No exactly what you want. But http://boston.lti.cs.cmu.edu/classes/95-865/HW/HW2/ has corpus for sentiment analysis from twitter and IMBD. Also corpus for other classification tasks.

Prakash B Pimpale

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.