How to scrape data from a website?

Running a data only website?

  • Any ideas on starting or running a website just for data and charts? I love data but hate the news. I always wonder "how do they know Americans drove 50 million fewer miles last month"? Where do they get that data? I want to start a website just for data and charts so anyone can upload data and anyone could search for it. The only limitations would be: data has to be in Excel format (ubiquitous) and there has to be a citation (web link) to the original source of the data. Any ideas on how to construct and manage such a thing?

  • Answer:

    It's really not my desire to piss in your cornflakes, but there are already several websites that are doing this: http://services.alphaworks.ibm.com/manyeyes/home http://www.swivel.com/ http://dabbledb.com/ (see in particular the http://dabbledb.com/explore/commons/). These are just the ones I can think of off the top of my head. There are probably others.

gnossos at Ask.Metafilter.Com Visit the source

Was this solution helpful to you?

Other answers

I reckon you're best off starting it as a blog, using your own knowledge and enthusiasm to build up some content and a small audience with virtually no investment in coding. If it takes off then you'll find people will send you URLs and data and you might decide to add community/sharing features to make the most of this and let the site grow. If you start by immediately building a more advanced data-sharing tool then you'll spend far more time and money, be in direct competition with some deeper pockets, and still have to kickstart it by providing/encouraging contributions.

malevolent

data has to be in Excel format What a PITA to download, and not as ubiquitous as you might think. Even the census bureau provides CSV files. Plus, what do you do about Excel 2007 files vs 2003 etc? I think you'd do better to focus on data/charts linked with some particular topic of interest to you; i.e. provide depth rather than breadth.

desjardins

Malevloent: I like the suggestion of a low key approach. Desjardins: Hmmm, CSV's, good idea. You are right, easier to upload.

gnossos

Maybe you could frame your blog posts in the form of a question, for example, the one in your OP: "How do they know Americans drove ...?" Then answer the question with data + charts. (Obviously, it will take you some work to go find them.) As the site gains popularity, have people ask/answer others' questions. Kind of like AskMe but for data.

desjardins

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.