What's wrong with my XML?
-
XML Character Woes: I'm getting the error Reference to undefined entity 'ldquo' (and 'rdquo, etc) when I try to open my XML files with IE or Firefox. Can you help me fix it? I know the parser is seeing it as an entity and looking for a definition, but I can't define them in the DTD because I don't know what entities might be coming in. I've set the elements to CDATA hoping the parser would ignore it, but that doesn't change anything. I've also tried changing the entities to the various numerical entities. My goal is just to have valid XHTML entities in the text. These files are certainly going to be converted to HTML at some point but who knows where else they'll go. They might go back into InDesign, etc. In case it matters: I'm getting the content from InDesign and running it against some scripts to fix them up. InDesign is giving me Unicode, and I'm converting the Unicode special characters to the 'rdquo' style html entities
-
Answer:
The only XML entities are: amp, lt, gt, apos, quot. That's it. Untrue. http://www.w3.org/TR/REC-xml/#sec-references. XHTML has optional support for the character entities but you'll have to include the entities in xhtml-lat1.ent, xhtml-special.ent, xhtml-symbol.ent Also untrue. http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Strict. You don't need to do anything special to include them. Is there a Decimal Entity equivalent to HTML-ENTITIES? I couldn't find one in the PHP docs. Not that I know of. A while back http://www.randomchaos.com/documents/?source=php_and_unicode. You could use those like so: $contents_for_xml_and_html = unicode_to_entities_preserving_ascii( utf8_to_unicode( $utf8_contents ) );
miniape at Ask.Metafilter.Com Visit the source
Other answers
I think it's because those are HTML entities, not XML entities. Would & quot ; work better? (spaces because without them it looks like ")
utsutsu
You could just leave the Unicode (UTF8 I assume) alone and encode the bare minimum required by XML, as long as you store and process the content with Unicode-friendly software throughout. If you want to handle it as ASCII or ISO-8859-1 then using numeric entities should work fine, I've done that myself to get around Unicode-hostile systems. Are you sure you tried it properly?
malevolent
InDesign is giving me Unicode, and I'm converting the Unicode special characters to the 'rdquo' style html entities Don't do that. Use the Unicode values to create numerical character entities, which work in both HTML and XML.
scottreynen
The only XML entities are: amp, lt, gt, apos, quot. That's it. XHTML has optional support for the character entities but you'll have to include the entities in xhtml-lat1.ent, xhtml-special.ent, xhtml-symbol.ent , and even then many applications won't read the entities correctly. As malevolent said, Use UTF8. All modern application support it. Or use numeric entities if not.
Sharcho
Wow. Thanks. I must have been trying the decimal numbers improperly because it looks like it might work. I would like to use UTF-8, I'd I'll probably keep a copy in that form, but I'm getting a little resistance from outside forces. Someone we shipped some of this content to (A huge internet company no less) actually asked for ASCII csv's, so we're trying to keep it as simple as possible in case something like that comes up again. So basically, scottreynen's solution, but I was using a php function: mb_convert_encoding($contents, 'HTML-ENTITIES', "UTF-8"); to convert them. Is there a Decimal Entity equivalent to HTML-ENTITIES? I couldn't find one in the PHP docs. I prefer not to come up with a conversion table if possible.
miniape
If you're using a stylesheet (xsl), you can also define the ones you want near the top. For example:<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp " "> ]> <xsl:stylesheet xmlns="http://www.w3.org/1999/XSL/Transform"; version="1.0">XSLT brand noobie here, but I ran into that problem trying to get non-breaking spaces into table cells and googled the specific. Here's the http://www.biglist.com/lists/xsl-list/archives/200011/msg00056.html I pulled this from, in case the code above doesn't show up right. I'm not sure it's the same one I found from work, but the general approach is the same. (Isn't there a way to show pre/code stuff here? I guess I should check the FAQ.)
phrits
On overnight review, it's not limited to stylesheets, I guess. You can declare any of your commonly used entities such that the shorthand (e.g., ) references the numeric code.
phrits
Thanks to everyone. All is well. I'm marking scottreynen's answer as best because his link has some great resources and his functions worked perfectly.
miniape
Related Q & A:
- What's wrong with this PHP Twitter API POST?Best solution by Stack Overflow
- What's wrong with my yahoo 360 page stat counter?Best solution by answers.yahoo.com
- What's wrong with msn hotmail?Best solution by Yahoo! Answers
- What's wrong with my digital camera?Best solution by Yahoo! Answers
- What's wrong with Nokia N97 Mini's wifi?Best solution by wiki.answers.com
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.