What are the best tools to analyze a competitor's website?

How would you design question answering project?

  • I was just thinking about designing a question answering project. The system should read the emails/messages from the users, analyze the content of it, classify it and then answer them in a most accurate way possible. Let's assume I can analyze the emails and classify them by topics and other parameters. I would appreciate your ideas/recommendations about the abstract draft I have right now. Let's say all the questions are somehow related to Computer Science. What I'm thinking is to: Crawl a website which could ideally have all the possible answers/articles related to Computer Science. Possible free crawling tools: Nutch Parse the data and index the collection using . Improve request classification using MALLET or any other free tools. Develop an answer extraction from retrieved content (from crawling). What would be your abstract, architectural kind of ideas and recommendations and also what possible open source software/tools would you recommend?

  • Answer:

    There is no single website which has all answers to Computer Science. You would need to query a web scale index.   Perhaps Stack Overflow comes closest to. But if you just mine a specific Q&A site it is unlikely that your project will have any benefit over using directly that site, except for a simple email gateway.   To have a more realistic test, you could 1. Index Wikipedia. 2. Collect questions and answers from Stack Overflow for benchmarking. 3. Run those questions against the Wikipedia index by extracting/creating the answers from the Wikipedia index using your project:         3.1 Query processing: transform the natural language question into an index query, possible splitting it in several sub queries         3.2 Retrieval of documents from the index which contain the answer or parts of the answers         3.3 Extraction of the answer or partial facts. Combine partial facts into an answer. 4. benchmark your answers against those from Stack Overflow, e.g. on Amazon Mechanical Turk   For more complex queries it is likely that there is no single document which contains the final answers, and then you need to combine facts located across different documents.   Alternatively to indexing Wikipedia yourself, you could also use an API of a search engine which has Wikipedia already indexed, and focus solely on the query processing and answer extraction.     NLP http://en.wikipedia.org/wiki/Natural_language_user_interface https://en.wikipedia.org/wiki/Outline_of_natural_language_processing   Open Source NLP libraries Carrot2 http://project.carrot2.org/ Mahout http://mahout.apache.org/ Kea http://www.nzdl.org/Kea/index.html SharpNLP http://sharpnlp.codeplex.com/ OpenNLP http://opennlp.sourceforge.net/projects.html NLTK http://nltk.org/   NLP API Alchemy http://www.alchemyapi.com/ Semantria https://semantria.com/

Wolf Garbe at Quora Visit the source

Was this solution helpful to you?

Other answers

You should check out Siri, which does this for a collection of verticals.

Greg Lindahl

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.