How to query XML like SQL?

I'm looking for a software tool that will ingest a large XML file, and allow me to write sql-like queries against it for both summary reporting, as well as detailed reports. I've been tasked with trying to extract some statistical information about data in some XML's we've been generating over the last year and a half. The suggestion from dev is to construct some horrific grep that pulls pre/post lines, and iterating thru that's output with more series of greps. I've looked into XPath and XQuery, and they both seem to be starts along the road I want, but what I'd really like is something that ingests my XML, and lets me write SQL against it. Ever heard of any such beast out there? I have two tasks I need to accomplish, one is to get a count of a certain kind of element, where it has a specific value. The second is to get the list of IDs for those elements.
Answer:

Didn't read the original poster's subcomment later: My whole goal is to be able to use SQL, which I already know. You're trying to drive a screw with a hammer. Get yourself a screwdriver and learn how to use it, you'll save yourself a lot of time. two tasks... one is to get a count of a certain kind of element, where it has a specific value [xsl:value-of select="count(//foo[@bar='baz'])" /] will count all [foo] nodes whose bar attribute is equal to baz. Adjust as necessary for your data. The second is to get the list of IDs for those elements. [xsl:for-each select="//foo[@bar='baz']"][xsl:value-of select="@id"], [/xsl:for-each]

nomisxid at Ask.Metafilter.Com Visit the source

Was this solution helpful to you?

Other answers

holloway

I can't think of ANY realworld xml that doesn't map meaningfully to the relational model. Wow. That's just... wow. How about every XHTML file on earth?

ook

ook - is there a way to test CONTAINS or BEGINS-WITH style clauses? IE, counting foo nodes where value begins with 'baz'?

nomisxid

I have to agree that you want XPath, and if you want to do queries that XPath won't support, you want XQuery. http://www.sleepycat.com/products/bdbxml.html supports XQuery and XPath. I'm not sure how well it works; the only XQuery database I've used is a huge commercial product.

nev

nomisxid: XPath has a contains() and a starts-with(). See the http://www.w3.org/TR/xpath.

nev

The XPath equivalents are starts-with() and contains(). The fact that you need those, however, implies that your XML schema is badly in need of a redesign. Which, given that your dev group's suggestion was to use grep to solve this problem, is perhaps not too surprising :)

ook

ook, isn't xhtml the PRESENTATION of data, not storage of data? It is related to XML, but they are not the same thing, nor used for the same purposes. That being said, I don't buy it, even for xhtml. Show me your xhtml that isn't a structured document. The whole point of a markup language is to present your data in a non-random order so that it has meaning. Every example I see of xhtml is just reading data from a datasource and presenting it on a webpage.

nomisxid

thanx nev, I was looking at this stupid page, http://www.w3schools.com/xpath/xpath_operators.asp which made it seem like the handful of operators they listed was the definitive list.

nomisxid

I'm sure there are a lot of XML documents which could very easily be translated into an SQL database. Take this example from W3Schools introduction to XSLT: http://www.w3schools.com/xsl/cdcatalog.xml Excerpt: <catalog> <cd> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <country>USA</country> <company>Columbia</company> <price>10.90</price> <year>1985</year> </cd> <cd> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <country>UK</country> <company>CBS Records</company> <price>9.90</price> <year>1988</year> </cd> [etc] <catalog> It's relationships are so simple that of course you could transfer it into a database in five minutes. You'd have a "titles" table, a "titles_artists" linking table, a "titles_companies" linking table, a couple more and you're done. Do a join to get all artists on Columbia, or all albums by Bob Dylan, or all albums by Bob Dylan on Columbia and so on. But that's a deceptively simple example. A CD has a title, an artist and a couple of other things. It's nested one level deep and that's it. Because of the arbritrary levels of complexity which are possible in XML, you have no way of knowing if any given XML file could be easily translated. What about a file with nesting seventeen levels deep, with attributes at each level? Do you end up with 172 tables to describe all the relationships? Maybe it's 1717 but whatever, it's too many. You'd either have to use human judgement to decide which relationships were important and discard the rest or end up creating thousands of tables to describe the relationships between different bits of data. XSLT and XPath are obviously much better ways to get at the data matching any given relationship.

AmbroseChapel

Related Q & A:

How to query by datetime in Doctrine MongoDB ODM?Best solution by Stack Overflow
How to Read xml file in java?Best solution by Stack Overflow
How to make only ONE Sql query?Best solution by Stack Overflow
How to output XML from a regular SQL query?Best solution by Stack Overflow
How to import XML into SQL Server database?Best solution by Stack Overflow

Just Added Q & A:

How many active mobile subscribers are there in China?Best solution by Quora
How to find the right vacation?Best solution by bookit.com
How To Make Your Own Primer?Best solution by thekrazycouponlady.com
How do you get the domain & range?Best solution by ChaCha
How do you open pop up blockers?Best solution by Yahoo! Answers

For every problem there is a solution! Proved by Solucija.

Got an issue and looking for advice?
Ask Solucija to search every corner of the Web for help.
Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.