UnicodeEncodeError when using scipy.io.savemat - python

I have a dateframe contain Chinese character like this:
I want to save as .mat file using
datanew = r'data/newmat.mat'
scio.savemat(datanew,{'date':df})
But got error:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
I have tried add code # -*- coding: utf-8 -*-in the first line and import sys
default_encoding = 'utf-8'. But didn't work.

Related

how to solved charmap' codec can't encode character '\u300b?

I'm trying to start a script, but I have run in to a problem.
[ERROR] 'charmap' codec can't encode character '\u300b' in position 11: character maps to
Your code is printing something that is unreadable for the machine. Try changing the output encoding to "utf-8" or just use code below at the very first lines of your code:
import sys
sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')

Python SyntaxError: Non-ASCII character

I have a string column in my pandas dataframe. I am facing error when i try to export this column to an excel file. Am able to export the dataframe if i delete that column.
I tried to decode all the values in that string column.I got the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 16: ordinal not in range(128)
I debugged a little and found a column value in that particular column.I am not able to decode this value ('Einen Use case fr Brems-').
I took that value and tried the following code , facing error again.
#!/usr/bin/python
# -*- coding: utf-8 -*-
text = 'Einen Use case fr Brems-'
print text.decode()
1) i have tried adding following lines as well.
import sys
reload(sys)
sys.setdefaultencoding('utf8')
2) I tried this in my system:
import sys
print sys.getdefaultencoding()
i got 'ascii' as output.

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-10: ordinal not in range(128) chinese characters

Im trying to write Chinese characters into a text file from a SQL output called result.
result looks like this:
[('你好吗', 345re4, '2015-07-20'), ('我很好',45dde2, '2015-07-20').....]
This is my code:
#result is a list of tuples
file = open("my.txt", "w")
for row in result:
print >> file, row[0].encode('utf-8')
file.close()
row[0] contains Chinese text like this: 你好吗
I also tried:
print >> file, str(row[0]).encode('utf-8')
and
print >> file, 'u'+str(row[0]).encode('utf-8')
but both gave the same error.
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-10: ordinal not in range(128)
Found a simple solution instead of doing encoding and decoding by formatting the file to "utf-8" from the beginning using codecs.
import codecs
file = codecs.open("my.txt", "w", "utf-8")
Don't forget to ad the UTF8 BOM on the file beginning if you wish to view your file in text editor correctly:
file = open(...)
file.write("\xef\xbb\xbf")
for row in result:
print >> file, u""+row[0].decode("mbcs").encode("utf-8")
file.close()
I think you'll have to decode from your machines default encoding to unicode(), then encode it as UTF-8.
mbcs represents (at least it did ages a go) default encoding on Windows.
But do not rely on that.
Did you try the codecs module?

codec can't encode character: character maps to <undefined>

Im' trying read a docx file in python 2.7 with this code:
import docx
document = docx.Document('sim_dir_administrativo.docx')
docText = '\n\n'.join([
paragraph.text.encode('utf-8') for paragraph in document.paragraphs])
And then I'm trying to decode the string inside the file with this code, because I have some special characters (e.g. ã):
print docText.decode("utf-8")
But, I'm getting this error:
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013' in position
494457: character maps to <undefined>
How can I solve this?
The print function can only print characters that are in your local encoding. You can find out what that is with sys.stdout.encoding. To print with special characters you must first encode to your local encoding.
# -*- coding: utf-8 -*-
import sys
print sys.stdout.encoding
print u"Stöcker".encode(sys.stdout.encoding, errors='replace')
print u"Стоескер".encode(sys.stdout.encoding, errors='replace')
This code snippet was taken from this stackoverflow response.

Diacritic signs

How should I write "mąka" in Python without an exception?
I've tried var= u"mąka" and var= unicode("mąka") etc... nothing helps
I have coding definition in first line in my document, and still I've got that exception:
'utf8' codec can't decode byte 0xb1 in position 0: unexpected code byte
Save the following 2 lines into write_mako.py:
# -*- encoding: utf-8 -*-
open(u"mąka.txt", 'w').write("mąka\n")
Run:
$ python write_mako.py
mąka.txt file that contains the word mąka should be created in the current directory.
If it doesn't work then you can use chardet to detect actual encoding of the file (see chardet example usage):
import chardet
print chardet.detect(open('write_mako.py', 'rb').read())
In my case it prints:
{'confidence': 0.75249999999999995, 'encoding': 'utf-8'}
The # -- coding: -- line must specify the encoding the source file is saved in. This error message:
'utf8' codec can't decode byte 0xb1 in position 0: unexpected code byte
indicates you aren't saving the source file in UTF-8. You can save your source file in any encoding that supports the characters you are using in the source code, just make sure you know what it is and have an appropriate coding line.
What exception are you getting?
You might try saving your source code file as UTF-8, and putting this at the top of the file:
# coding=utf-8
That tells Python that the file’s saved as UTF-8.
This code works for me, saving the file as UTF-8:
v = u"mąka"
print repr(v)
The output I get is:
u'm\u0105ka'
Please copy and paste the exact error you are getting. If you are getting this error:
UnicodeEncodeError: 'charmap' codec can't encode character ... in position ...: character maps to <undefined>
Then you are trying to output the character somewhere that does not support UTF-8 (e.g. your shell's character encoding is set to something other than UTF-8).

Categories

Resources