How to make a web crawler?

What can make a web app listen, learn for a specific topic from the web?

There are many topics people may be interested in from the web. For some topics, there are already some websites specialized for for them, e.g., http://weather.com http://eather.com for weather and espn .com for sport. Not all topics are well organized as the above two topics. We usually have to dig out more information on such topics by ourselves by search engines. My question is whether there is a way to make a web app to teach itself to be the "expert" on a specific topic and digest the web content and present to the subscribed user in a nice way? Does it have to crawler the whole web to filter out the related the information continuously or any other better ways?
Answer:

For a web app to really become an "expert" you would have to invent Artificial General Intelligence, or "Strong AI". Good luck there. But you could certainly achieve a lot in terms of having a crawler crawl data from the web and present it in a useful way. Think about what Google News does, for example. To start down that path, you might want to look into Machine Learning algorithms that are defined as "clustering" and "classification". Also look into "concept mining" and the Linked Data initiative. I think you'll find that there is, indeed, a lot of cool stuff you can do in this regard. As to whether or not you would need to crawl the entire Web... it depends. If you want to be as close to 100% sure as possible, that you find all the possible relevant content, then you would probably need a crawler. The downside to this is that now you're talking about needing Google or Yahoo scale infrastructure. But if you don't have that exact requirement, you have a few other options: 1. Use data from http://CommonCrawl.org 2. Only index content from sites with RSS feeds. You could develop a much more limited, and less intensive, indexer that only indexes data from RSS provided links, and pair that with a crawler that just looks for more feeds. You would still wind up indexing an awful lot of data, but I think that would be easier than a Google scale crawler from scratch. At the very least, if you started with feeds from sites with high quality content and seeded your system with that, you could start producing useful results almost immediately. Also depending on just how "smart" you want your system to appear to be, you might find some value in exploring something like http://en.wikipedia.org/wiki/Literature-based_discovery and related techniques.

Phillip Rhodes at Quora Visit the source

Was this solution helpful to you?

Related Q & A:

How To Make A Web Site?Best solution by Yahoo! Answers
What are some ways I can make a little money?Best solution by Yahoo! Answers
What can be a good healthy diet plan for a 14 year old?Best solution by Yahoo! Answers
What are some part time jobs I can make a living by?Best solution by wikihow.com
What are 10 businesses that can make a lot of money?Best solution by ehow.com

Just Added Q & A:

How many active mobile subscribers are there in China?Best solution by Quora
How to find the right vacation?Best solution by bookit.com
How To Make Your Own Primer?Best solution by thekrazycouponlady.com
How do you get the domain & range?Best solution by ChaCha
How do you open pop up blockers?Best solution by Yahoo! Answers

For every problem there is a solution! Proved by Solucija.

Got an issue and looking for advice?
Ask Solucija to search every corner of the Web for help.
Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.