UnicodeEncodeError: 'latin-1' when reading a textfile - python

I am new to reading textfiles. i run the following code:
with open('sometext.txt', 'rb') as xy: txt = xy.read().decode('utf-8')
i get this error:
UnicodeEncodeError: 'latin-1' codec can't encode character '\u201e' in position 137: ordinal not in range(256)
i already tried to play around with encoding and decoding. but without success. the text in the file is german, may the error depends on that. thanks for help

Related

How to solve UnicodeDecodeError when reading csv

I am trying to open a csv file with pandas but i get this error:
test_tweets = pd.read_csv(r"C:\Users\22587\Downloads\data\test_tweets.csv")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 75: invalid start byte
0xa0 is the non breaking space. You maybe copied your data from a website and there was such an invisible character

I get char error while trying to pars CSV file into pandas

I am practicing pandas and i have next issue:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position
7190: invalid start byte
So its simple tryout of csv readout:
csvfile = open('file.csv', 'r', encoding="UTF-8")
csv_pandas = pd.read_csv(csvfile, sep=",")
print(csv_pandas)
However it works properly with csv module. With csv.reader i dont get same error.
Whats going on? And where can i learn more about charmap and encodings with python?
p.s. I tried out by removing encoding="UTF-8" and i got similar error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position
140378: character maps to

Error while importing Wikipedia using pip

I'm using Python 2.7 and trying to work with the following code
import wikipedia
input = raw_input("Question: ")
print wikipedia.summary(input)
I see this error when the code is run:
Traceback (most recent call last): File "wik.py", line 5, in
print wikipedia.summary(input) File "C:\Anaconda2\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013'
in position 38: character maps to undefined
How can I fix this? Thanks in advance.
Python 2 defaults to ASCII, which only maps characters between \u0000 and \u007F1. You need to use a different encoding in order to properly get this character (\u2013 is a long dash) and many others outside of ASCII.
Using UTF-8 should work for you, and I believe this print statement will properly output text:
print wikipedia.summary(input).encode("utf8")
For more information on this, check this similar question: UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 3 2: ordinal not in range(128).

UnicodeEncodeError: 'charmap' codec can't encode character character maps to <undefined>

I have a problem with writing to file in unicode. I am using python 2.7.3. It gives me such an error:
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 1006: character maps to <undefined>
Here is a sample of my code: error is on line: f3.write(text)
f = codecs.open("PopupMessages.strings", encoding='utf-16')
text = f.read()
print text
f.close()
f3 = codecs.open("3.txt", encoding='utf-16', mode='w')
f3.write(text)
f3.close()
I tried to use 'utf-8' and 'utf-8-sig' also, but it doesn't helped me. I have such symbols in my source file to read: ['\",;?*&$##%] and symbols in different languages.
How can I solve this issue? Please help, I read info on stackoverflow firstly, but it didn't helped me.
delete this line:
print text
and it should work

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 17710: ordinal not in range(128)

I'm trying to print a string from an archived web crawl, but when I do I get this error:
print page['html']
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 17710: ordinal not in range(128)
When I try print unicode(page['html']) I get:
print unicode(page['html'],errors='ignore')
TypeError: decoding Unicode is not supported
Any idea how I can properly code this string, or at least get it to print? Thanks.
You need to encode the unicode you saved to display it, not decode it -- unicode is the unencoded form. You should always specify an encoding, so that your code will be portable. The "usual" pick is utf-8:
print page['html'].encode('utf-8')
If you don't specify an encoding, whether or not it works will depend on what you're printing to -- your editor, OS, terminal program, etc.

Categories

Resources