How can I parse a complex XML with PHP and CDATA?

PHP XML Cannot parse the CDATA

  • I am parsing into PHP an RSS feed from the national data buoy center. I am not able to parse in the description which is tagged as CDATA. The end goal is to have the description items variables such as Location, Wind Direction, Wind Speed, etc.. I am unsure how to break this out and omit the tags. Here is a snippet of the feed: <item> <pubDate>Thu, 08 Sep 2011 17:59:39 UT</pubDate> <title>Station SFXC1 - SAN FRANCISCO BAY RESERVE, CA</title> <description><![CDATA[ <strong>September 8, 2011 9:45 am PDT</strong><br /> <strong>Location:</strong> 38.223N 122.026W or 77 nautical miles S of search location of 39.5N 122.1W.<br /> <strong>Wind Direction:</strong> W (270&#176;)<br /> <strong>Wind Speed:</strong> 11 knots<br /> <strong>Atmospheric Pressure:</strong> 30.03 in (1017.0 mb)<br /> <strong>Air Temperature:</strong> 62&#176;F (16.9&#176;C)<br /> <strong>Dew Point:</strong> 50&#176;F (10.2&#176;C)<br /> ]]></description> <link>http://www.ndbc.noaa.gov/station_page.php?station=sfxc1</link> <guid>http://www.ndbc.noaa.gov/station_page.php?station=sfxc1&amp;ts=1315500300</guid> <georss:point>38.223 -122.026</georss:point> </item> Here is the PHP: $feed_url = "http://www.ndbc.noaa.gov/rss/ndbc_obs_search.php?lat=39.5&lon=-122.1&radius=400"; $xmlString = file_get_contents($feed_url); $xmlString = str_replace('georss:point','point',$xmlString); $xml = new SimpleXMLElement($xmlString); $items = $xml->xpath('channel/item'); $closeItems = array(); $new_array = array(); foreach($items as &$item) { echo "<br>"; $item_title = $item->title; $item_title = mb_convert_case($item_title, MB_CASE_UPPER, "UTF-8"); list($lat, $lng) = explode(' ',trim($item->point)); echo $item_title; echo "<br>"; echo $lat; echo "<br>"; echo $lng; echo "<br>"; echo $item->description; echo "<br>"; echo $item->pubDate; echo "<br>"; }

  • Answer:

    Rewrote my solution to actually be correct: $feed_url = "http://www.ndbc.noaa.gov/rss/ndbc_obs_search.php?lat=39.5&lon=-122.1&radius=400"; $xmlString = file_get_contents($feed_url); $xmlString = str_replace('georss:point','point',$xmlString); $xml = new SimpleXMLElement($xmlString); $items = $xml->xpath('channel/item'); foreach($items as $item) { $item_title = mb_convert_case($item->title, MB_CASE_UPPER, "UTF-8"); $description = mb_convert_case(str_replace(' ', '', trim(html_entity_decode(strip_tags($item->description)))), MB_CASE_UPPER, "UTF-8"); list($lat, $lng) = explode(' ',trim($item->point)); echo $item_title . PHP_EOL . $lat . ' x ' . $lng . PHP_EOL . 'published: ' . $item->pubDate . PHP_EOL . 'Description: ' . PHP_EOL . $description . PHP_EOL . PHP_EOL; } I took the CDATA removed the tags, decoded the html entities, and removed the pesky white space. A regex might be better in removing the white space.

matt colley at Stack Overflow Visit the source

Was this solution helpful to you?

Other answers

In situations like this, don't just echo the expected value and give up when it's empty and come crying to SO (just kidding about the crying part). Use PHP's http://php.net/manual/en/function.var-dump.php or http://php.net/manual/en/function.print-r.php to see what you're really getting. Is it NULL? Is it the empty string? Is it some other SimpleXMLElement object you need to descend into? Not only will this make your question more informative and likely to be answered, but you'll probably end up solving the problem yourself (and then posting an answer here for other people who stumble upon it in the future).

dkamins

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.