Extract data from website via PHP

I am trying to create a simple alert app for some friends. Basically i want to be able to extract data "price" and "stock availability" from a webpage like the folowing two: http://www.sparkfun.com/commerce/product_info.php?products_id=5 http://www.sparkfun.com/commerce/product_info.php?products_id=9279 I have made the alert via e-mail and sms part but now i want to be able to get the quantity and price out of the webpages (those 2 or any other ones) so that i can compare the price and quantity available and alert us to make an order if a product is between some thresholds. I have tried some regex (found on some tutorials, but i an way too n00b for this) but haven't managed to get this working, any good tips or examples?
Answer:

$content = file_get_contents('http://www.sparkfun.com/commerce/product_info.php?products_id=9279'); preg_match('#<tr><th>(.*)</th> <td><b>price</b></td></tr>#', $content, $match); $price = $match[1]; preg_match('#<input type="hidden" name="quantity_on_hand" value="(.*?)">#', $content, $match); $in_stock = $match[1]; echo "Price: $price - Availability: $in_stock\n";

Was this solution helpful to you?

Other answers

It's called screen scraping, in case you need to google for it. I would suggest that you use a dom parser and xpath expressions instead. Feed the HTML through HtmlTidy first, to ensure that it's valid markup. For example: $html = file_get_contents("http://www.example.com"); $html = tidy_repair_string($html); $doc = new DomDocument(); $doc->loadHtml($html); $xpath = new DomXPath($doc); // Now query the document: foreach ($xpath->query('//table[@class="pricing"]/th') as $node) { echo $node, "\n"; }

troelskn

What ever you do: Don't use regular expressions to parse HTML or http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454. Use a http://de3.php.net/manual/en/refs.xml.php instead.

You are probably best off loading the HTML code into a DOM parser like http://simplehtmldom.sourceforge.net/manual.htm and searching for the "pricing" table. However, any kind of scraping you do can break whenever they change their page layout, and is probably illegal without their consent. The best way, though, would be to talk to the people who run the site, and see whether they have alternative, more reliable forms of data delivery (Web services, RSS, or database exports come to mind).

Pekka

1st, asking this question goes too into details. 2nd, extracting data from a website might not be legitimate. However, I have hints: Use Firebug or Chrome/Safari Inspector to explore the HTML content and pattern of interesting information Test your RegEx to see if the match. You may need do it many times (multi-pass parsing/extraction) Write a client via cURL or even much simpler, use file_get_contents (NOTE that some hosting disable loading URLs with file_get_contents) For me, I'd better use Tidy to convert to valid XHTML and then use XPath to extract data, instead of RegEx. Why? Because XHTML is not regular and XPath is very flexible. You can learn XSLT to transform. Good luck!

Viet

The best HTML Parse I've ever come across. http://simplehtmldom.sourceforge.net/

Vishal

Related Q & A:

how to extract video and audio meta data?Best solution by Stack Overflow
How to scrape data from a website?Best solution by Stack Overflow
How to store data in php and get data from php?Best solution by Stack Overflow
How can I extract data of an external XML to PHP?Best solution by Stack Overflow
What web scraping tool is the best to extract data?Best solution by Quora

Just Added Q & A:

How many active mobile subscribers are there in China?Best solution by Quora
How to find the right vacation?Best solution by bookit.com
How To Make Your Own Primer?Best solution by thekrazycouponlady.com
How do you get the domain & range?Best solution by ChaCha
How do you open pop up blockers?Best solution by Yahoo! Answers

For every problem there is a solution! Proved by Solucija.

Got an issue and looking for advice?
Ask Solucija to search every corner of the Web for help.
Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.