How to make a web crawler?

Make a Web Crawler/ Spider

  • I'm looking into making a web crawler/ spider but I need someone to point me in the right direction to get started. Basically, my spider is going to search for audio files and index them. I'm just wondering if anyone has any ideas for how I should do it. I've heard having it done in php would be extremely slow. I know vb.net so could that come in handy? I was thinking about using googles filetype search to get links to crawl, would that be ok? Thanks Guys

  • Answer:

    In VB.NET you will need to firstly get the HTML, use the WebClient class or HttpWebRequest and HttpWebResponse classes. There is plenty of info on how to use these on the interweb. Then you will need to parse the HTML, i recommend using regular expressions for this. Your idea of using google with filetype search is a good one, i did a similar thing a few years ago to gather PDF's to test PDF indexing in SharePoint, it worked really well.

Belgin Fish at Stack Overflow Visit the source

Was this solution helpful to you?

Other answers

Here is a link on a tutorial on how to write a web crawler in java. http://java.sun.com/developer/technicalArticles/ThirdParty/WebCrawler/ I'm sure if you google it you can find ones for other languages.

qw3n

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.