UnicodeEncodeError: 'charmap' codec can't encode character character maps to <undefined> - python

I have a problem with writing to file in unicode. I am using python 2.7.3. It gives me such an error:
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 1006: character maps to <undefined>
Here is a sample of my code: error is on line: f3.write(text)
f = codecs.open("PopupMessages.strings", encoding='utf-16')
text = f.read()
print text
f.close()
f3 = codecs.open("3.txt", encoding='utf-16', mode='w')
f3.write(text)
f3.close()
I tried to use 'utf-8' and 'utf-8-sig' also, but it doesn't helped me. I have such symbols in my source file to read: ['\",;?*&$##%] and symbols in different languages.
How can I solve this issue? Please help, I read info on stackoverflow firstly, but it didn't helped me.

delete this line:
print text
and it should work

Related

UnicodeEncodeError: 'latin-1' when reading a textfile

I am new to reading textfiles. i run the following code:
with open('sometext.txt', 'rb') as xy: txt = xy.read().decode('utf-8')
i get this error:
UnicodeEncodeError: 'latin-1' codec can't encode character '\u201e' in position 137: ordinal not in range(256)
i already tried to play around with encoding and decoding. but without success. the text in the file is german, may the error depends on that. thanks for help

I get char error while trying to pars CSV file into pandas

I am practicing pandas and i have next issue:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position
7190: invalid start byte
So its simple tryout of csv readout:
csvfile = open('file.csv', 'r', encoding="UTF-8")
csv_pandas = pd.read_csv(csvfile, sep=",")
print(csv_pandas)
However it works properly with csv module. With csv.reader i dont get same error.
Whats going on? And where can i learn more about charmap and encodings with python?
p.s. I tried out by removing encoding="UTF-8" and i got similar error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position
140378: character maps to

Cant encode ascii character u'\xe9'

I am getting a message 'ascii' codec can't encode character u'\xe9' when I am writing a string to my file, heres how I am writing my file
my_file = open(output_path, "w")
my_file.write(output_string)
my_file.close()
I have been searching and found answers like this UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128) and the first one didn't work and then this one I'm confused why I am encoding data I want to be able to read
import io
f = io.open(filename, 'w', encoding='utf8')
Thanks for the help
As mentioned, you're trying to write non-ASCII characters with the ASCII encoding. Since the built-in open function doesn't support the encoding parameter, then consider always using io.open in Python 2.7 (which is the default since Python 3.x).

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-10: ordinal not in range(128) chinese characters

Im trying to write Chinese characters into a text file from a SQL output called result.
result looks like this:
[('你好吗', 345re4, '2015-07-20'), ('我很好',45dde2, '2015-07-20').....]
This is my code:
#result is a list of tuples
file = open("my.txt", "w")
for row in result:
print >> file, row[0].encode('utf-8')
file.close()
row[0] contains Chinese text like this: 你好吗
I also tried:
print >> file, str(row[0]).encode('utf-8')
and
print >> file, 'u'+str(row[0]).encode('utf-8')
but both gave the same error.
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-10: ordinal not in range(128)
Found a simple solution instead of doing encoding and decoding by formatting the file to "utf-8" from the beginning using codecs.
import codecs
file = codecs.open("my.txt", "w", "utf-8")
Don't forget to ad the UTF8 BOM on the file beginning if you wish to view your file in text editor correctly:
file = open(...)
file.write("\xef\xbb\xbf")
for row in result:
print >> file, u""+row[0].decode("mbcs").encode("utf-8")
file.close()
I think you'll have to decode from your machines default encoding to unicode(), then encode it as UTF-8.
mbcs represents (at least it did ages a go) default encoding on Windows.
But do not rely on that.
Did you try the codecs module?

Strange characters in console Python

Reading the word "beyoncè" from a text file, python is handling it as "beyonc\xc3\xa9".
If I write it into a file, it shows correctly, but in console It shows like that.
Also If I try to use it in my program I get:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 17: ordinal not in range(128)
how can I let Python read beyoncè from a text file as beyonce and getting rid of this problem?
See if this helps:
f= open('mytextfile.txt', encoding='utf-8', 'w')
f.write(line)
try
string="beyonc\xc3\xa9"
string.decode("utf-8")
foo=open("foo.txt","wb")
foo.write(string)
foo.close()

Categories

Resources