How would you design question answering project?
-
I was just thinking about designing a question answering project. The system should read the emails/messages from the users, analyze the content of it, classify it and then answer them in a most accurate way possible. Let's assume I can analyze the emails and classify them by topics and other parameters. I would appreciate your ideas/recommendations about the abstract draft I have right now. Let's say all the questions are somehow related to Computer Science. What I'm thinking is to: Crawl a website which could ideally have all the possible answers/articles related to Computer Science. Possible free crawling tools: Nutch Parse the data and index the collection using . Improve request classification using MALLET or any other free tools. Develop an answer extraction from retrieved content (from crawling). What would be your abstract, architectural kind of ideas and recommendations and also what possible open source software/tools would you recommend?
-
Answer:
There is no single website which has all answers to Computer Science. You would need to query a web scale index. Perhaps Stack Overflow comes closest to. But if you just mine a specific Q&A site it is unlikely that your project will have any benefit over using directly that site, except for a simple email gateway. To have a more realistic test, you could 1. Index Wikipedia. 2. Collect questions and answers from Stack Overflow for benchmarking. 3. Run those questions against the Wikipedia index by extracting/creating the answers from the Wikipedia index using your project: 3.1 Query processing: transform the natural language question into an index query, possible splitting it in several sub queries 3.2 Retrieval of documents from the index which contain the answer or parts of the answers 3.3 Extraction of the answer or partial facts. Combine partial facts into an answer. 4. benchmark your answers against those from Stack Overflow, e.g. on Amazon Mechanical Turk For more complex queries it is likely that there is no single document which contains the final answers, and then you need to combine facts located across different documents. Alternatively to indexing Wikipedia yourself, you could also use an API of a search engine which has Wikipedia already indexed, and focus solely on the query processing and answer extraction. NLP http://en.wikipedia.org/wiki/Natural_language_user_interface https://en.wikipedia.org/wiki/Outline_of_natural_language_processing Open Source NLP libraries Carrot2 http://project.carrot2.org/ Mahout http://mahout.apache.org/ Kea http://www.nzdl.org/Kea/index.html SharpNLP http://sharpnlp.codeplex.com/ OpenNLP http://opennlp.sourceforge.net/projects.html NLTK http://nltk.org/ NLP API Alchemy http://www.alchemyapi.com/ Semantria https://semantria.com/
Wolf Garbe at Quora Visit the source
Other answers
You should check out Siri, which does this for a collection of verticals.
Greg Lindahl
Related Q & A:
- How do you convert an Android project to a maven project manually?Best solution by Quora
- How to remove your question from yahoo question/answers?Best solution by Yahoo! Answers
- How do you change your answering machine on your mobile?Best solution by Yahoo! Answers
- How to remove a question in the ask question editor?Best solution by Meta Stack Overflow
- How can I check my answering machine?Best solution by eHow old
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.