UnicodeDecodeError error writing .xlsx file using xlsxwriter

UnicodeDecodeError error writing .xlsx file using xlsxwriter - python

I am trying to write about 1000 rows to a .xlsx file from my python application. The data is basically a combination of integers and strings. I am getting intermittent error while running wbook.close() command. The error is the following:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 15:
ordinal not in range(128)
My data does not have anything in unicode. I am wondering why the decoder is being at all. Has anyone noticed this problem?

0xc3 is "À". So what you need to do is change the encoding. Use the decode() method.
string.decode('utf-8')
Also depending on your needs and uses you could add
# -*- coding: utf-8 -*-
at the beginning of your script, but only if you are sure that the encoding will not interfere and break something else.

As Alex Hristov points out you have some non-ascii data in your code that needs to be encoded as UTF-8 for Excel.
See the following examples from the docs which each have instructions on handling UTF-8 with XlsxWriter in different scenarios:
Example: Simple Unicode with Python 2
Example: Simple Unicode with Python 3
Example: Unicode - Polish in UTF-8

Related

'ascii' codec can't encode character when redirecting Python script to file through Bash [duplicate]

I have a python script that grabs a bunch of recent tweets from the twitter API and dumps them to screen. It works well, but when I try to direct the output to a file something strange happens and a print statement causes an exception:
> ./tweets.py > tweets.txt
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2018' in position 61: ordinal not in range(128)
I understand that the problem is with a UTF-8 character in one of the tweets that doesn't translate well to ASCII, but what is a simple way to dump the output to a file? Do I fix this in the python script or is there a way to coerce it at the commandline?
BTW, the script was written in Python2.

Without modifying the script, you can just set the environment variable PYTHONIOENCODING=utf8 and Python will assume that encoding when redirecting to a file.
References:
https://docs.python.org/2.7/using/cmdline.html#envvar-PYTHONIOENCODING
https://docs.python.org/3.3/using/cmdline.html#envvar-PYTHONIOENCODING

You may need encode the unicode object with .encode('utf-8')
In your python file append this to first line
# -*- coding: utf-8 -*-
If your script file is working standalone, append it to second line
#!/usr/local/bin/python
# -*- coding: utf-8 -*-
Here is the document: PEP 0263

UnicodeDecodeError for Reading files in Python

pythonNotes = open('E:\\Python Notes.docx','r')
read_it_now = pythonNotes.read()
print(read_it_now.encode('utf-16'))
When I try this code, I get:
UnicodeDecodeError: 'charmap' can't decode byte 0x8f in position 591 character maps to <undefined>
I am running this in visual studio with python tools - starting without debugging.
I have tried putting enc='utf-8' at the top, throwing it in as a parameter, I've looked at other questions and just couldn't find a solution to this simple issue.
Please assist.

This error can occur when text that is already in utf-8 format is read in as an 8-bit encoding, and python tries to "decode" it to Unicode: Bytes that have no meaning in the supposed encoding throw a UnicodeDecodeError. But you'll always get an error if you try to read a file as utf-8 that is not in the utf-8 encoding.
In your case, the problem is that a docx file is not a regular text file; no single text encoding can meaningfully import it. See this SO answer for directions on how to read it on a low level, or use python-docx to get access to the document in a way that resembles what you see in Word.

Standard Python libraries and Unicode

I have been reading left right and centre about unicode and python. I think I understand what encoding/decoding is, yet as soon as I try to use a standard library method manipulating a file name, I get the infamous:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 19:
ordinal not in range(128)
In this case \xe9 stands for 'é', and it doesn't matter if I call it from a os.path.join() or a shutil.copy(), it throws the same error. From what I understand it has to do with the default encoding of python. I try to change it with:
# -*- coding: utf-8 -*-
Nothing changes. If I type:
sys.setdefaultencoding('utf-8')
it tells me:
ImportError: cannot import name setdefaultencoding
What I really don't understand is why it works when I type it in the terminal, '\xe9' and all. Could someone please explain to me why this is happening/how to get around it?
Thank you

Filenames on *nix cannot be manipulated as unicode. The filename must be encoded to match the charset of the filesystem and then used.

you should decode manually the filename with the correct encoding (latin1?) before os.path.join
btw: # -- coding: utf-8 -- refers to the string literals in your .py file
effbot has some good infos

You should not touch the default encoding. It is best practice and highly recommendable to keep it with 'ascii' and convert your data properly to utf-8 on the output side.

error :: UnicodeDecodeError

I am getting
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 104: ordinal not in range(128)
I am using intgereproparty, stringproparty, datetimeproparty

That's because 0xb0 (decimal 176) is not a valid character code in ASCII (which defines only values between 0 and 127).
Check where you got that string from and use the proper encoding.
If you need further help, post the code.

You are trying to put Unicode data (probably text with accents) into an ASCII string.
You can use Python's codecs module to open a text file with UTF-8 encoding and write the Unicode data to it.
The .encode method may also help (u"õ".encode('utf-8') for example)

Python defaults to ASCII encoding - if you are dealing with chars outside of the ASCII range, you need to specify that in your code.
One way to do this is setting the defining the encoding at the top of your code.
This snippet sets the encoding at the top of the file to encoding to Latin-1 (which includes 0xb0):
#!/usr/bin/python
# -*- coding: latin-1 -*-
import os, sys
...
See PEP for more info on encoding.

When I write my foreign language "flashcard" programs, I always use python 3.x as its native encoding is utf-8. You're encoding problems will generally be far less frequent.
If you're working on a program that many people will share, however, you may want to consider using encode and decode with python 2.x, but only when storing and retrieving data elements in persistent storage. encode your non-ASCII characters, silently manipulate hexadecimal representations of those unicode strings in memory, and save them as hexadecimal. Finally, use decode when fetching unicode strings from persistant storage, but for end user display only. This will eliminate the need to constantly encode and decode your strings in your program.
#jcoon also has a pretty standard response to this problem.

Diacritic signs

How should I write "mąka" in Python without an exception?
I've tried var= u"mąka" and var= unicode("mąka") etc... nothing helps
I have coding definition in first line in my document, and still I've got that exception:
'utf8' codec can't decode byte 0xb1 in position 0: unexpected code byte

Save the following 2 lines into write_mako.py:
# -*- encoding: utf-8 -*-
open(u"mąka.txt", 'w').write("mąka\n")
Run:
$ python write_mako.py
mąka.txt file that contains the word mąka should be created in the current directory.
If it doesn't work then you can use chardet to detect actual encoding of the file (see chardet example usage):
import chardet
print chardet.detect(open('write_mako.py', 'rb').read())
In my case it prints:
{'confidence': 0.75249999999999995, 'encoding': 'utf-8'}

The # -- coding: -- line must specify the encoding the source file is saved in. This error message:
'utf8' codec can't decode byte 0xb1 in position 0: unexpected code byte
indicates you aren't saving the source file in UTF-8. You can save your source file in any encoding that supports the characters you are using in the source code, just make sure you know what it is and have an appropriate coding line.

What exception are you getting?
You might try saving your source code file as UTF-8, and putting this at the top of the file:
# coding=utf-8
That tells Python that the file’s saved as UTF-8.

This code works for me, saving the file as UTF-8:
v = u"mąka"
print repr(v)
The output I get is:
u'm\u0105ka'
Please copy and paste the exact error you are getting. If you are getting this error:
UnicodeEncodeError: 'charmap' codec can't encode character ... in position ...: character maps to <undefined>
Then you are trying to output the character somewhere that does not support UTF-8 (e.g. your shell's character encoding is set to something other than UTF-8).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

UnicodeDecodeError error writing .xlsx file using xlsxwriter - python

Related

'ascii' codec can't encode character when redirecting Python script to file through Bash [duplicate]

UnicodeDecodeError for Reading files in Python

Standard Python libraries and Unicode

error :: UnicodeDecodeError

Diacritic signs

Categories

Resources