How to scrape data from a website?

PHP: how to extract content or scrape data sets from website source page

  • I would like to know how to scrape the content of the source code from website using php. I have tried using http://simplehtmldom.sourceforge.net/ and also looked at http://stackoverflow.com/questions/3577641/best-methods-to-parse-html-with-php Im still having hard time trying to get info from the source code. As you can see the main page of the source code contain the link list of author which include the year and the number of books wrote. <div id="fleft"> <ul> <li><a href="http://www.books.com/john-smith/index.html">John Smith (2011-2012)</a> : 11 books <li><a href="http://www.books.com/bobby-bob/index.html">Bobby Bob (2011-2012)</a> : 89 books .... </ul> </div> I click on john smith it would open the list of books that john smith wrote. <h1>John Smith (11 Books)</h1> <div id="fleft"> <ul> <li><a href="http://www.books.com/john-smith/best-book.html">Best Book</a> <li><a href="http://www.books.com/john-smith/other-best-book.html">Other Best Book</a> .... </ul> </div> I click in one of the book "best book" it would show the title of the book and aurther and the whole story of the book. <div id="bookbox"> <h1>Book : Best Book</h1> <h2>Aurther : John Smith</h2> <pre> story of the best book...... ....... .... the end </pre> I would like to be able to grab all the author name and the their year, and list of books, and the content of the book. Actually as dataset. Can someone help me or show me the code sample of php to make this happen. I would like to create a database of the information of all the author's name, year of their lives, books they created, books title, category, books content, etc

  • Answer:

    you should mention what approach you are using to get html of target page, i suppose that you have html of target page in $targetHTML variable you cand load it in dom like this /*********** Load In Dom *********/ $html = new DOMDocument; $html->loadHTML($targetHTML); $xPath = new DOMXPath($html); /*********** Load In Dom *********/ you can use xpath to fetch your desired data from html loaded in dom. If you are using this approach already you can show your code to find out problem. Regards

merrill at Stack Overflow Visit the source

Was this solution helpful to you?

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.