What are some really interesting web crawling projects?
-
I want to make some small project using web crawler to get used myself with python. Any idea of things that I should do? It could be a console or website application. Please share your experience/details if you ever done this before :)
-
Answer:
The Udacity CS101 course is the best resource for a beginner and it's in Python :) http://www.udacity.com/overview/Course/cs101/CourseRev/apr2012 If you just want to build a crawler skip to the related videos.
Dhiraj Thakur at Quora Visit the source
Other answers
A nice little project that I've done in the past is a simple 'sites similar to X' recommendation engine. You crawl sites, strip out any html tags, and build word lists for the content on each one, then you just need some metric for comparing how similar they are to one another (I used the http://en.wikipedia.org/wiki/Jaccard_index to calculate the similarity of the word lists for each site) The user can then just enter a url and get a list of the most similar sites (could be a console app, or I did mine with a web front end) It's a heap of fun to tweak, and you end up with something pretty usable at the end of it.
Tom Robert
I've got an ongoing partially unresolved inquiry into converting sitemaps into mindmaps [1], which in ideal conditions would involve crawling a website, parsing the directory structure into a nested outline, and then inserting the crawled webpage titles into the outline. A cloud based (web) API would be the ideal form for this. I can further recommend my Meta Guide webpage, "100 Best Web Crawler Videos" [2]. [1] http://www.meta-guide.com/home/ai-engine/mindmap-conversion [2] http://www.meta-guide.com/home/about/best-of-the-best-videos/100-best-web-crawler-videos
Marcus L Endicott
http://Import.io is by far the best crawling software. Unfortunatly it Will be no more free starting next month (over priced, Too expensive).The best thing to do is learning Python And use beautifulsoup and/or all the frameworks using this library. It has no limit, with some research you can crawl any website even if it's known to be "uncrawlable" (google results, linkedin, ...)
Hamadi Lanouar
Related Q & A:
- What are the best resources to learn about web crawling and scraping?Best solution by Quora
- What is the difference between web developer or web software engineer?Best solution by Programmers
- What are some unique/interesting colleges?Best solution by Yahoo! Answers
- What are some really interesting and scary Nancy Drew PC games?Best solution by Yahoo! Answers
- What's the difference between Web 3.0 and Web 2.0?Best solution by wiki.answers.com
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.