PHP: how to extract content or scrape data sets from website source page
-
I would like to know how to scrape the content of the source code from website using php. I have tried using http://simplehtmldom.sourceforge.net/ and also looked at http://stackoverflow.com/questions/3577641/best-methods-to-parse-html-with-php Im still having hard time trying to get info from the source code. As you can see the main page of the source code contain the link list of author which include the year and the number of books wrote. <div id="fleft"> <ul> <li><a href="http://www.books.com/john-smith/index.html">John Smith (2011-2012)</a> : 11 books <li><a href="http://www.books.com/bobby-bob/index.html">Bobby Bob (2011-2012)</a> : 89 books .... </ul> </div> I click on john smith it would open the list of books that john smith wrote. <h1>John Smith (11 Books)</h1> <div id="fleft"> <ul> <li><a href="http://www.books.com/john-smith/best-book.html">Best Book</a> <li><a href="http://www.books.com/john-smith/other-best-book.html">Other Best Book</a> .... </ul> </div> I click in one of the book "best book" it would show the title of the book and aurther and the whole story of the book. <div id="bookbox"> <h1>Book : Best Book</h1> <h2>Aurther : John Smith</h2> <pre> story of the best book...... ....... .... the end </pre> I would like to be able to grab all the author name and the their year, and list of books, and the content of the book. Actually as dataset. Can someone help me or show me the code sample of php to make this happen. I would like to create a database of the information of all the author's name, year of their lives, books they created, books title, category, books content, etc
-
Answer:
you should mention what approach you are using to get html of target page, i suppose that you have html of target page in $targetHTML variable you cand load it in dom like this /*********** Load In Dom *********/ $html = new DOMDocument; $html->loadHTML($targetHTML); $xPath = new DOMXPath($html); /*********** Load In Dom *********/ you can use xpath to fetch your desired data from html loaded in dom. If you are using this approach already you can show your code to find out problem. Regards
merrill at Stack Overflow Visit the source
Related Q & A:
- How To Extract Zipx Online?Best solution by solvusoft.com
- how to extract video and audio meta data?Best solution by Stack Overflow
- How to extract parts of (La)TeX document?Best solution by TeX - LaTeX
- How to extract data from any website?Best solution by Stack Overflow
- How to extract text from web page?Best solution by Stack Overflow
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.