Unicode error reading Python log file (logging) - python

I am creating a log file using Pythons logging library. When I am trying to read it with python and print it on a html page (using Flask), I get:
<textarea cols="80" rows="20">{% for line in log %}{{line}}{% endfor %}
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 36: ordinal not in range(128)
I guess that this has to do with that the log file is decoded in some other decoding, but which?
This is the line setting the log file if it helps:
fileLogger = logging.handlers.TimedRotatingFileHandler(filename = 'log.log', when = 'midnight', backupCount = 30)
How do I solve this problem?

The logging package file handlers will encode any Unicode object you send to it to UTF-8, unless you specified a different encoding.
Use io.open() to read the file as UTF-8 data again, you'll get unicode objects again, ideal for Jinja2:
import io
log = io.open('log.log', encoding='utf8')
You could also specify a different encoding for the TimedRotatingFileHandler but UTF-8 is an excellent default. Use the encoding keyword argument if you wanted to pick a different encoding:
fileLogger = logging.handlers.TimedRotatingFileHandler(
filename='log.log', when='midnight', backupCount=30,
encoding='Latin1')

I'm not familiar with flask, but if you can grab the contents of the log as a string. You can encode it to utf-8 like so:
string = string.encode('utf-8') # string is the log's contents, now in utf-8

Related

Docx (xml) file parsing error on Python 'charmap' codec can't decode byte 0x98 in position 7618: character maps to <undefined>

im trying to parse docx file. I unziped it first, then tried to read Document.xml file with with open(..) and its raise error that "'charmap' codec can't decode byte 0x98 in position 7618: character maps to ". XML is "UTF-8" encoding:
Error:
I wrote the following code:
with open(self.tempDir + self.CONFIG['main_xml']) as xml_file:
self.dom_xml = etree.parse(xml_file)
I treid to force encode to UTF-8, but then i can't read etree.fromstring(..) correctly
7618 symbol (from error) is :
Please help me. How to read xml file correctly?
Thnks
This works without errors on your file:
import zipfile
import xml.etree.ElementTree as ET
zipfile.ZipFile('file.docx').extractall()
root = ET.parse('word/document.xml').getroot()

import utf-8 csv in python3.x - special german characters

I have been trying to import a csv file containing special characters (ä ö ü)
in python 2.x all special characters where automatically encoded without need of specifying econding attribute in the open command.
I can´t figure out how to get this to work in python 3.x
import csv
f = open('sample_1.csv', 'rU', encoding='utf-8')
csv_f = csv.reader(f, delimiter=';')
bla = list(csv_f)
print(type(bla))
print(bla[0])
print(bla[1])
print(bla[2])
print()
print(bla[3])
Console output (Sublime Build python3)
<class 'list'>
['\ufeffCat1', 'SEO Meta Text']
['Damen', 'Damen----']
['Damen', 'Damen-Accessoires-Beauty-Geschenk-Sets-']
Traceback (most recent call last):
File "/Users/xxx/importer_tree.py", line 13, in <module>
print(bla[3])
UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 37: ordinal not in range(128)
input sample_1.csv (excel file saved as utf-8 csv)
Cat1;SEO Meta Text
Damen;Damen----
Damen;Damen-Accessoires-Beauty-Geschenk-Sets-
Damen;Damen-Accessoires-Beauty-Körperpflege-
Männer;Männer-Sport-Sportschuhe-Trekkingsandalen-
Männer;Männer-Sport-Sportschuhe-Wanderschuhe-
Männer;Männer-Sport-Sportschuhe--
is this only an output format issue or am I also importing the data
wrongly?
how can I print out "Männer"?
thank you for your help/guidance!
thank you to juanpa-arrivillaga and to this answer: https://stackoverflow.com/a/44088439/9059135
Issue is due to my Sublime settings:
sys.stdout.encoding returns US-ASCII
in the terminal same command returns UTF-8
Setting up the Build system in Sublime properly will solve the issue

Errror in outputting CSV with Django?

I am trying to output my model as a CSV file.It is working fine with small data in model and it is very slow with large data.And secondly there are some error in outputting a model as CSV.My logic which I am using is:
def some_view(request):
# Create the HttpResponse object with the appropriate CSV header.
response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename="news.csv"'
writer = csv.writer(response)
news_obj = News.objects.using('cms').all()
for item in news_obj:
#writer.writerow([item.newsText])
writer.writerow([item.userId.name])
return response
and the error which I am facing is:
UnicodeEncodeError :--
'ascii' codec can't encode characters in position 0-6: ordinal not in
range(128)
and further it says:-
The string that could not be encoded/decoded was: عبدالله الحذ
Replace line
writer.writerow([item.userId.name])
with:
writer.writerow([item.userId.name.encode('utf-8')])
Before saving unicode string to a file you must encode it in some encoding. Most system use utf-8 by default, so it's a safe choice.
From the error, The write content of csv file is like ASCII character. So decode the character.
>>>u'aあä'.encode('ascii', 'ignore')
'a'
Can fix this error from ignoring the ASCII character:
writer.writerow([item.userId.name.encode('ascii', 'ignore')])

Program (twitter bot) works on Windows machine, but not on Linux machine [duplicate]

I was trying to read a file in python2.7, and it was readen perfectly. The problem that I have is when I execute the same program in Python3.4 and then appear the error:
'utf-8' codec can't decode byte 0xf2 in position 424: invalid continuation byte'
Also, when I run the program in Windows (with python3.4), the error doesn't appear. The first line of the document is:
Codi;Codi_lloc_anonim;Nom
and the code of my program is:
def lectdict(filename,colkey,colvalue):
f = open(filename,'r')
D = dict()
for line in f:
if line == '\n': continue
D[line.split(';')[colkey]] = D.get(line.split(';')[colkey],[]) + [line.split(';')[colvalue]]
f.close
return D
Traduccio = lectdict('Noms_departaments_centres.txt',1,2)
In Python2,
f = open(filename,'r')
for line in f:
reads lines from the file as bytes.
In Python3, the same code reads lines from the file as strings. Python3
strings are what Python2 call unicode objects. These are bytes decoded
according to some encoding. The default encoding in Python3 is utf-8.
The error message
'utf-8' codec can't decode byte 0xf2 in position 424: invalid continuation byte'
shows Python3 is trying to decode the bytes as utf-8. Since there is an error, the file apparently does not contain utf-8 encoded bytes.
To fix the problem you need to specify the correct encoding of the file:
with open(filename, encoding=enc) as f:
for line in f:
If you do not know the correct encoding, you could run this program to simply
try all the encodings known to Python. If you are lucky there will be an
encoding which turns the bytes into recognizable characters. Sometimes more
than one encoding may appear to work, in which case you'll need to check and
compare the results carefully.
# Python3
import pkgutil
import os
import encodings
def all_encodings():
modnames = set(
[modname for importer, modname, ispkg in pkgutil.walk_packages(
path=[os.path.dirname(encodings.__file__)], prefix='')])
aliases = set(encodings.aliases.aliases.values())
return modnames.union(aliases)
filename = '/tmp/test'
encodings = all_encodings()
for enc in encodings:
try:
with open(filename, encoding=enc) as f:
# print the encoding and the first 500 characters
print(enc, f.read(500))
except Exception:
pass
Ok, I did the same as #unutbu tell me. The result was a lot of encodings one of these are cp1250, for that reason I change :
f = open(filename,'r')
to
f = open(filename,'r', encoding='cp1250')
like #triplee suggest me. And now I can read my files.
In my case I can't change encoding because my file is really UTF-8 encoded. But some rows are corrupted and causes the same error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 7092: invalid continuation byte
My decision is to open file in binary mode:
open(filename, 'rb')

python requests issues with non ascii data

I'm having problem using non-ascii characters in a file I'm trying to send as an attachment using requests.
The exception pops at httplib module in the _send_output function.
see this image:
here is my code:
response = requests.post(url="https://api.mailgun.net/v2/%s/messages" % utils.config.mailDomain,
auth=("api", utils.config.mailApiKey),
data={
"from" : me,
"to" : recepients,
"subject" : subject,
"html" if html else "text" : message
},
files= [('attachment', open(f)) for f in attachments] if attachments and len(attachments) else []
)
The problem is with the files parameter, containing non ascii data (hebrew).
The exception as can be seen in the image is:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 673: ordinal not in range(128)
the open() function has a parameter encoding, used like f = open('t.txt', encoding='utf-8') which accepts a variety of parameters as outlined in the docs. Find out what encoding scheme your data uses (probably UTF-8) and see if opening with that encoding works.
Don't use the encoding parameter to open the files because you want to open them as binary data. The calls to open should look like open(f, 'rb'). The documentation for requests only shows examples like this purposefully and even documents this behaviour.

Categories

Resources