How to scrape data from a website?

Web scrapping of paginated data on web page(c#?

  • I have been given an interesting problem at work. We scrape the Dept. of Labor & Industries website to get information on contractors, which in turn is then used to populate some fields in our web portal for our insurance agents. Recently, they have changed their site. To access some of the data you now have to click a link on the page which will cause a postback and a table with the additional information will show up. The site is done in ASP.NET. Now personally, I have never done any website scraping. Getting the HTML and parsing the data I need from it is not a problem. I am using the HttpWebRequest/HttpWebResponse objects to get the HTML. The issue is how do I cause the postback to occur for a specific control on the page. I imagine ViewState will come into play. Since I have never done anything like this, I am not sure how to approach the problem. Just looking for any suggestions on how to best approach this problem. Thanks.

  • Answer:

    I would start by installing Firebug in Firefox and using its NET panel to look at exactly what is being posted to the server. The NET panel has a POST tab that will show you everything that's being sent. And yeah, that probably does include viewstate. You can then add this same information to your HttpWebRequest by writing it to the request stream. See link below for an example of how to do that.

Ratchetr at Yahoo! Answers Visit the source

Was this solution helpful to you?

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.