I am new to reading textfiles. i run the following code:
with open('sometext.txt', 'rb') as xy: txt = xy.read().decode('utf-8')
i get this error:
UnicodeEncodeError: 'latin-1' codec can't encode character '\u201e' in position 137: ordinal not in range(256)
i already tried to play around with encoding and decoding. but without success. the text in the file is german, may the error depends on that. thanks for help
Related
I am trying to open a csv file with pandas but i get this error:
test_tweets = pd.read_csv(r"C:\Users\22587\Downloads\data\test_tweets.csv")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 75: invalid start byte
0xa0 is the non breaking space. You maybe copied your data from a website and there was such an invisible character
I am practicing pandas and i have next issue:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position
7190: invalid start byte
So its simple tryout of csv readout:
csvfile = open('file.csv', 'r', encoding="UTF-8")
csv_pandas = pd.read_csv(csvfile, sep=",")
print(csv_pandas)
However it works properly with csv module. With csv.reader i dont get same error.
Whats going on? And where can i learn more about charmap and encodings with python?
p.s. I tried out by removing encoding="UTF-8" and i got similar error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position
140378: character maps to
I'm using Python 2.7 and trying to work with the following code
import wikipedia
input = raw_input("Question: ")
print wikipedia.summary(input)
I see this error when the code is run:
Traceback (most recent call last): File "wik.py", line 5, in
print wikipedia.summary(input) File "C:\Anaconda2\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013'
in position 38: character maps to undefined
How can I fix this? Thanks in advance.
Python 2 defaults to ASCII, which only maps characters between \u0000 and \u007F1. You need to use a different encoding in order to properly get this character (\u2013 is a long dash) and many others outside of ASCII.
Using UTF-8 should work for you, and I believe this print statement will properly output text:
print wikipedia.summary(input).encode("utf8")
For more information on this, check this similar question: UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 3 2: ordinal not in range(128).
I have a problem with writing to file in unicode. I am using python 2.7.3. It gives me such an error:
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 1006: character maps to <undefined>
Here is a sample of my code: error is on line: f3.write(text)
f = codecs.open("PopupMessages.strings", encoding='utf-16')
text = f.read()
print text
f.close()
f3 = codecs.open("3.txt", encoding='utf-16', mode='w')
f3.write(text)
f3.close()
I tried to use 'utf-8' and 'utf-8-sig' also, but it doesn't helped me. I have such symbols in my source file to read: ['\",;?*&$##%] and symbols in different languages.
How can I solve this issue? Please help, I read info on stackoverflow firstly, but it didn't helped me.
delete this line:
print text
and it should work
I'm trying to print a string from an archived web crawl, but when I do I get this error:
print page['html']
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 17710: ordinal not in range(128)
When I try print unicode(page['html']) I get:
print unicode(page['html'],errors='ignore')
TypeError: decoding Unicode is not supported
Any idea how I can properly code this string, or at least get it to print? Thanks.
You need to encode the unicode you saved to display it, not decode it -- unicode is the unencoded form. You should always specify an encoding, so that your code will be portable. The "usual" pick is utf-8:
print page['html'].encode('utf-8')
If you don't specify an encoding, whether or not it works will depend on what you're printing to -- your editor, OS, terminal program, etc.