Python 2.7 ascii' codec can't encode character u'\xe4 - python

I have experienced a code problem in Python 2.7, I already used UTF-8, but it still got the exception
"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 81: ordinal not in range(128)"
My files and contains so many this kind of shit, but for some reason, I'm not allowed to delete it.
desktop,[Search] Store | Automated Titles,google / cpc,Titles > Kesäkaverit,275285048,13
I have tried the below method to avoid, but still, haven't fix it. Can anyone help me ?
1.With "#!/usr/bin/python" in my file header
2.Set setdefaultencoding
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
3.content = unicode(s3core.download_file_to_memory(S3_PROFILE, S3_RAW + file), "utf-8", "ignore")
My code below
content = unicode(s3core.download_file_to_memory(S3_PROFILE, S3_RAW + file), "utf8", "ignore")
rows = content.split('\n')[1:]
for row in rows:
if not row:
continue
try:
# fetch variables
cols = row.rstrip('\n').split(',')
transaction = cols[0]
device_category = cols[1]
campaign = cols[2]
source = cols[3].split('/')[0].strip()
medium = cols[3].split('/')[1].strip()
ad_group = cols[4]
transactions = cols[5]
data_list.append('\t'.join(
['-'.join([dt[:4], dt[4:6], dt[6:]]), country, transaction, device_category, campaign, source,
medium, ad_group, transactions]))
except:
print 'ignoring row: ' + row

Related

Incorrect character in saving process for excel

I'm creating a new coloumn and this new file and want to save. But in there excel file a coloumn have a character. How can I skip this line the save process or change line to a correct character?
import pandas as pd
path = '/My Documents/Python/'
fileName = "test.xlsx"
# open the excel file
ef = pd.ExcelFile(path+fileName)
# read the contents
df = pd.read_excel(path+fileName, sheet_name=ef.sheet_names[0])
print(df['Content'])
print(df['Engine'])
i = 1
for test in df['Content']:
try:
print(i)
print(test)
except:
print("An exception occurred")
break
i += 1
df['Test'] = 'value'
df.to_excel('My Documents/Python/Test_NEW.xlsx')
Error message
data, consumed = self.encode(object, self.errors)
UnicodeEncodeError: 'utf-8' codec can't encode character '\ude7c' in position 470: surrogates not allowed
df['Content'] = df['Content'].astype(str)

Cannot parse through a file with latin-1 encoding

I'm trying to parse a large file of tweets from the Stanford Sentiment Database (see here: http://help.sentiment140.com/for-students/), with the following being my code:
def init_process(fin, fout):
outfile = open(fout, 'a')
with open(fin, buffering=200000, encoding='latin-1') as f:
try:
for line in f:
line = line.replace('"', '')
initial_polarity = line.split(',')[0]
if initial_polarity == '0':
initial_polarity = [1, 0]
elif initial_polarity == '4':
initial_polarity = [0, 1]
tweet = line.split(',')[-1]
outline = str(initial_polarity) + ':::' + tweet
outfile.write(outline)
except Exception as e:
print(str(e))
outfile.close()
init_process('training.1600000.processed.noemoticon.csv','train_set.csv')
I've run into this following issue:
'ascii' codec can't encode characters in position 12-14: ordinal not in range(128)
which doesn't make sense since I'm opening the file with a latin-1 encoding. How do I stop this error and successfully parse through the file?
It's probably the outfile encoding that's still ASCII. You should open it with the proper encoding, too (doesn't have to be latin-1, probably utf-8 is more appropriate depending on your environment).
Per comment from Åsmund: the file encoding is locale-specific, you should probably consider changing your locale to something that can handle non-ASCII text.

Python Encoding Issue with JSON and CSV

I am having an encoding issue when I run my script below:
Here is the error code:
-UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 9: ordinal not in range(128)
Here is my script:
import logging
import urllib
import csv
import json
import io
import codecs
with open('/home/local/apple.csv',
'rb') as csvinput:
reader = csv.reader(csvinput, delimiter=',')
firstline = True
for row in reader:
if firstline:
firstline = False
continue
address1 = row[0]
print row[0]
locality = row[1]
admin_area = row[2]
query = ' '.join(str(x) for x in (address1, locality, admin_area))
normalized = query.replace(" ", "+")
BaseURL = 'http://localhost:8080/verify?country=JP&freeform='
URL = BaseURL + normalized
print URL
data = urllib.urlopen(URL)
response = data.getcode()
print response
if response == 200:
file= json.load(data)
print file
output_f=open('output.csv','wb')
csvwriter=csv.writer(output_f)
count = 0
for f in file:
if count == 0:
header= f.keys()
csvwriter.writerow(header)
count += 1
csvwriter.writerow(f.values())
output_f.close()
else:
print 'error'
can anyone help me fix this its getting really annoying. I need to encode to utf8
Looks like you are using Python 2.x, instead of python's standard open, use codecs.open where you can optionally pass an encoding to use and what to do when there are errors. Gets a little less confusing in Python 3 where the standard Python open can do this.
So in your two lines where you are opening, do:
with codecs.open('/home/local/apple.csv',
'rb', 'utf-8') as csvinput:
output_f = codecs.open('output.csv','wb', 'utf-8')
The optional error parm defaults to "strict" which raises an exception if the bytes can't be mapped to the given encoding. In some contexts you may want to use 'ignore' or 'replace'.
See the python doc for a bit more info.

Python/Tweepy UnicodeEncodeError

I am trying to scrape through Twitter bios using the Twitter API with Python.
However I get this error:
newFile.writerow(info)
UnicodeEncodeError: 'ascii' codec can't
encode characters in position 0-4: ordinal not in range(128)
I assume this occurs when someone has an emoji in their bio or screen name, however none of the following solutions seem to stop the error:
.encode('unicode_escape')
.encode('UTF8')
.encode('UTF-8')
Here is the current code
for follower in followers.items():
info=[]
name =follower.name.encode('unicode_escape')
screen_name = follower.screen_name.encode('unicode_escape')
userId = userId + 1
#add values to array
values.append(userId)
values.append(name)
values.append(screen_name)
csvFile = open('followers.csv','a')
newFile =csv.writer(csvFile) #imported csv
#add list of headers as a new row
newFile.writerow(info)
#close file
csvFile.close()
A major problem is that Python's CSV module is not Unicode safe - See the warnings in https://docs.python.org/2/library/csv.html
The work around, as you've found is to encode all values to UTF-8 first:
name = follower.name.encode('UTF-8')
screen_name = follower.screen_name.encode('UTF-8')
The problem you're hitting now is Python is still trying to encode your values to ASCII. This is due to the way you've opened the file for writing. Add b for binary writing:
csvFile = open('followers.csv','ab')
In its complete form:
for follower in followers.items():
info=[]
name = follower.name.encode('UTF-8')
screen_name = follower.screen_name.encode('UTF-8')
userId = userId + 1
#add values to array
values.append(userId)
values.append(name)
values.append(screen_name)
csvFile = open('followers.csv','ab')
newFile =csv.writer(csvFile) #imported csv
#add list of headers as a new row
newFile.writerow(info)
#close file
csvFile.close()

Probleme encoding characters with Python 2.7

It works fine with regular characters but it doesn't work with
accented characters like é,à etc...
Here is the program:
def search():
connection = sqlite3.connect('vocab.sqlite')
cursor = connection.cursor()
sql = "SELECT French, English value FROM Ami "
cursor.execute(sql)
data = cursor.fetchall()
data=sorted(data)
file_open=open('vraiamis.html','w')
for i in data:
a='<a href="'+'http://www.google.fr/#hl=fr&gs_nf=1&cp=4&gs_id=o&xhr=t&q='
a=a+str(i[0]).encode('latin-1')+'">'+str(i[0]).encode('latin-1')+'</a>'+'<br>'
file_open.write(a)
file_open.close()
webbrowser.open('vraiamis.html')
when the value in the database contains special characters like é,à,ç ( it doesn't work I get the following error message:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)
Thanks in advance for your help
Try
a=a+i[0].encode('latin-1')+'">' + i[0].encode('latin-1')+'</a>'+'<br>'
etc - your str() calls are trying to convert the unicode to a bytestring before you've decoded it.
You may write your vraiamis.html in utf-8 encoding, so that your special characters may be encoded.
def search():
import codecs
connection = sqlite3.connect('vocab.sqlite')
cursor = connection.cursor()
sql = "SELECT French, English value FROM Ami "
cursor.execute(sql)
data = cursor.fetchall()
data=sorted(data)
file_open= codecs.open('vraiamis.html', 'w', encoding='utf-8')
for i in data:
a=u'<a href="' + u'http://www.google.fr/#hl=fr&gs_nf=1&cp=4&gs_id=o&xhr=t&q='
a=a + i[0] + u'">' + i[0] + u'</a>' + u'<br>'
file_open.write(a)
file_open.close()
webbrowser.open('vraiamis.html')

Categories

Resources