How do I get around the Python error "UnicodeEncodeError: 'ascii' codec can't encode character..." when using a Python script on the command line?
-
I am retrieving some text content from the web, then displaying it in a shell using a Python script. However, the print() command is throwing the following exception: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1029: ordinal not in range(128)' My platform is Linux. Is there a simple method for replacing the offending unicode character so that I can print the string? I am not concerned with absolute correctness of the printed content in this case, this is merely a quick hack.
-
Answer:
Sorry, there is no simple one-line answer to this issue. Imagine a a string comes into a python program via an I/O operation (read from terminal, or from file, or from the network), it makes its way around the program getting copied from place to place, and finally it gets output via an I/O operation. At any step of the way, if you assign a unicode to a str, you'll see the dreaded ascii codec can't encode... error. Unfortunately, there is no easy way to fix it other than to go through your code fixing all the spots. For example consider: f_in = open('filein.txt') line = f_in.read() out_msg = "The input line was: {line}".format(line=line) f_out = open('fileout.txt') Unless you know what encoding was used when storing filein.txt, you are in for some bizarre behavior and some characters in your line will look like garbage. (Note: you won't get the ascii codec can't encode... error, but the results will be bad nevertheless.) So you need to use something like: f_in = codecs.open('filein.txt', 'rb', 'utf-8') and then hope desperately that whoever stored filein.txt had stored it in utf-8 and not one of the other UTF encodings. (Note: utf-8 is upwards compatible with ascii, so a file that was stored using regular ascii will open fine using utf-8 encoding). By now, obviously you'll have realized that unless you open fileout.txt using an appropriate unicode encoding, you are again going to run into trouble and get the ascii codec can't encode... error (if your out_msg contains any unicode character). So you need to do codecs.open('fileout.txt', 'wb', 'utf-8'). (Note again: using a utf-8 encoding to store regular ascii is not a problem, since it's upward compatible, so if your text does not contain non-ascii unicode characters, the utf-8 encoded file is identical to a regular ascii file.) The thing that tripped me up was that doing these two things is not good enough. Consider this updated code: f_in = codecs.open('filein.txt', 'rb', 'utf-8') line = f_in.read() out_msg = "The input line was: {line}".format(line=line) f_out = codecs.open('fileout.txt', 'wb', 'utf-8') This could still give you the same dreaded error on the third line. That's right, the problem is that "The input line was: {line}".format(xxx)" is a str and if line contains any unicode characters (by which I mean characters that cannot be encoded into ascii) you are in trouble. The fix for that is: out_msg = u"The input line was: {line}".format(line=line) (Yes, this line is different from line 3 in the code sample. You just have to look carefully.) So there you have it. Need to go through every line of your code and find places where unicode is being assigned to str (or is being sent to a method that expects an str) and fix the destination to be unicode instead of str. To understand unicode in python better, take a look at: http://farmdev.com/thoughts/23/what-i-thought-i-knew-about-unicode-in-python-amounted-to-nothing/ To understand unicode in general (and i would highly recommend that you should do this) read Joel Spolsky's 'The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)': http://www.joelonsoftware.com/articles/Unicode.html And while you're about it, you might as well read: Quick 'n Dirty Hack: If you just want to get some output printed in approximate ascii you can do: import unicodedata line = unicodedata.normalize('NFKD', line).encode('ascii','ignore') This will replace all non-ascii characters with the closest ascii equivalent, or just ignore the character if nothing is appropriate. Is good enough for many purposes...
Navin Kabra at Quora Visit the source
Other answers
You should convert your Unicode strings (which are made of characters, a unit decoupled from memory size) to bytes using the proper encoding before doing any I/O with it. By default, Python tries to encode your Unicode string using the ASCII encoding when writing to stdout (i.e, using print), but this encoding can't represent every Unicode character, which is why you are getting that error: "'ascii' codec can't encode character". Pretty explicit. You should pick a proper encoding, and encode your Unicode string using that. For example, UTF-8 is an efficient encoding that can handle every Unicode character. Assuming foo is an Unicode string, you could (and should) do: print foo.encode('utf-8') instead of just print foo. Just make sure your terminal or whatever understands the encoding you pick. Again: UTF-8 is the hottest encoding out there for scenarios like this, you will probably want to use it unless you have very specific needs.
Renzo Carbonara
You usually need to take the following steps: third party string -> decode -> now work with it in python -> encode -> now you can output it to a file. So sounds like you're reading in text that already has an encoding, which you need to decode so python can work with it natively.
Jim Plush
Try this: sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
Evan Klitzke
I really like using this particular solution: def byteify(input): if isinstance(input, dict): return {byteify(key):byteify(value) for key,value in input.iteritems()} elif isinstance(input, list): return [byteify(element) for element in input] elif isinstance(input, unicode): return input.encode('utf-8') else: return input which takes all input and turns it into its byte equivalent.
Blair Gemmer
before execute the script export PYTHONIOENCODING=utf-8
Wubao Li
Related Q & A:
- How do I get a job with a cruise line?Best solution by Yahoo! Answers
- How do I get a border around my profile?Best solution by Yahoo! Answers
- How can I get around paying extra charges when traveling by train through Italy with a Eurail pass?Best solution by Yahoo! Answers
- How do I get my computer to notify me when I get a email?Best solution by Yahoo! Answers
- There is a line on my laptop screen, how do I get rid of it?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.