UnicodeEncodeError when logging to file in PyCharm

UnicodeEncodeError when logging to file in PyCharm - python

I'm logging some Unicode characters to a file using "logging" in Python 3. The code works in the terminal, but fails with a UnicodeEncodeError in PyCharm.
I load my logging configuration using logging.config.fileConfig. In the configuration, I specify a file handler with encoding = utf-8. Logging to console works fine.
I'm using PyCharm 2019.1.1 (Community Edition). I don't think I've changed any relevant setting, but when I ran the same code in the PyCharm on another computer, the error was not reproduced. Therefore, I suspect the problem is related to a PyCharm setting.
Here is a minimal example:
import logging
from logging.config import fileConfig
# ok
print('1. café')
# ok
logging.error('2. café')
# UnicodeEncodeError
fileConfig('logcfg.ini')
logging.error('3. café')
The content of logcfg.ini (in the same directory) is the following:
[loggers]
keys = root
[handlers]
keys = file_handler
[formatters]
keys = formatter
[logger_root]
level = INFO
handlers = file_handler
[handler_file_handler]
class = logging.handlers.RotatingFileHandler
formatter = formatter
args = ('/tmp/test.log',)
encoding = utf-8
[formatter_formatter]
format = %(levelname)s: %(message)s
I expect to see the first two logging messages in the console, and the third one in the logging file. The first two logging statements worked fine, but the third one failed. Here is the complete console output in PyCharm:
1. café
ERROR:root:2. café
--- Logging error ---
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/logging/__init__.py", line 996, in emit
stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 13: ordinal not in range(128)
Call stack:
File "/Users/klkh/test.py", line 12, in <module>
logging.error('3. café')
Message: '3. café'
Arguments: ()

You seem to use utf-8 as encode for your config file, python (when using pycharm) in your case seems to raise an encode error UnicodeEncodeError in place of guessing your config file encode wildly, because if it use a wrong encode all the config file gonna be decrypted differently from the original, so best to do is precising the encode type of the config in your python script
Notice : I can't seem to find documentation of fileConfig from logging.config so I'm using basicConfig
import logging
from logging.config import fileConfig
print('1. café')
logging.error('2. café')
logging.basicConfig(filename='your config' , encode='utf-8') # in your case the encode is utf-8
logging.error('3. café')
output:
1. café
ERROR:root:2. café
ERROR:root:3. café

I found a solution myself. I should pass the encoding like this:
args = ('/tmp/test.log', 'a', 0, 0, 'utf-8')
instead of
args = ('/tmp/test.log',)
encoding = utf-8
However, I'm still interested in knowing why the PyCharm on the other computer uses utf-8 by default. How do I set the default encoding for non-console streams in PyCharm?

Related

How to encode all logged messages as utf-8 in Python

I have a little logger function that returns potentially two handlers to log to a RotatingFileHandler and sys.stdout simultaneously.
import os, logging, sys
from logging.handlers import RotatingFileHandler
from config import *
def get_logger(filename, log_level_stdout=logging.WARNING, log_level_file=logging.INFO, echo=True):
logger = logging.getLogger(__name__)
if not os.path.exists(PATH + '/Logs'):
os.mkdir(PATH + '/Logs')
logger.setLevel(logging.DEBUG)
if echo:
prn_handler = logging.StreamHandler(sys.stdout)
prn_handler.setFormatter(logging.Formatter('%(asctime)s %(levelname)s: %(message)s'))
prn_handler.setLevel(log_level_stdout)
logger.addHandler(prn_handler)
file_handler = RotatingFileHandler(PATH + '/Logs/' + filename, maxBytes=1048576, backupCount=3)
file_handler.setFormatter(logging.Formatter('%(asctime)s %(levelname)s: %(message)s'))
file_handler.setLevel(log_level_file)
logger.addHandler(file_handler)
return logger
This works fine in general but certain strings being logged appear to be encoded in cp1252 and throw an (non-fatal) error when trying to print them to stdout via logger function. It should be noted that the very same characters can be printed just fine in the error message. Logging them to a file also causes no issues. It's only the console - sys.stdout - that throws this error.
--- Logging error ---
Traceback (most recent call last):
File "C:\Program Files\Python38\lib\logging\__init__.py", line 1084, in emit
stream.write(msg + self.terminator)
File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u1ecd' in position 65: character maps to <undefined>
Call stack:
File "script.py", line 147, in <module>
logger.info(f"F-String with a name in it: '{name}'.")
Message: "F-String with a name in it: 'Heimstọð'."
Arguments: ()
A fix to this has been to encode every single message getting as utf8 in the code that's calling the logger function like this:
logger.info((f"F-String with a name in it: '{name}'.").encode('utf8'))
However I feel like this is neither elegant nor efficient.
It should also be noted that the logging of the file works just fine and I already tried setting the PYTHONIOENCODING to utf-8 in the system variables of Windows without any noticeable effect.
Update:
Turns out I'm stupid. Just because an error message is printed in the console doesn't mean the printing to the console is the cause of the error. I was looking into the answers to the other question that has been suggested to me here and after a while realized that nothing I did to the "if echo" part of the function had any impact on the result. The last check was commenting out the whole block and I still got the error. That's when I realized that the issue was in fact caused by not enforcing UTF8 when writing to the file. Adding the simple kwarg encoding='utf-8' to the RotatingFileHandler as suggested by #michael-ruth fixed the issue for me.
P.S. I'm not sure how to handle this case because, while that answer fixed my problem, it wasn't really what I was asking for or what the question suggested because I originally misunderstood the root cause. I'll still check it as solution and upvote both answers. I'll also edit the question as to not mislead future readers into believing it would answer that question when it doesn't really.

Set the encoding while instantiating the handler instead of encoding the message explicitly.
file_handler = RotatingFileHandler(
PATH + '/Logs/' + filename,
maxBytes=1048576,
backupCount=3,
encoding='utf-8'
)
help(RotatingFileHandler) is your best friend.
Help on class RotatingFileHandler in module logging.handlers:
class RotatingFileHandler(BaseRotatingHandler)
| RotatingFileHandler(filename, mode='a', maxBytes=0, backupCount=0, encoding=None, delay=False)

Can you check the encoding of sys.stdout (sys.stdout.encoding)?
If it's not 'utf-8', this answer may help to reconfigure the encoding.

Rise UnicodeEncodeError in logging.StreamHandler

I migrated my python code from Win10 host to WS2012R2. Surprisingly it stops operating correctly and now shows warning message: "UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-2: character maps to "
I've tried to execute a command:
set PYTHONLEGACYWINDOWSSTDIO=yes
My code:
import logging
import sys
def get_console_handler():
console_handler = logging.StreamHandler(sys.stdout)
return console_handler
def get_logger():
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
logger.addHandler(get_console_handler())
return logger
my_logger = get_logger()
my_logger.debug("Это отладочное сообщение".encode("cp1252"))
What should I do to get rid of this warning?
Update
Colleagues, I am sorry for misleading you! I am obviously was tired after long hours of bug tracking )
The problem doesn't connect with "*.encode()" calling as such, it is connected with default python encoding while IO console operation (I suppose so)! The original code makes some requests from DB in cp1251 charset but the problem appears when python is trying to convert it to cp1252.
Here is another example of how to summon the error.
Create a plain text file, i.e. test.txt with text "Это отладочное сообщение" and save it cp1252.
Run python console and enter:
f = open("test.txt")
f.read()
Output:
f = open("test.txt")
f.read()
Traceback (most recent call last): File "<stdin>", line 1, in <module>
File "c:\project\venv\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 29: character maps to <undefined>

Use encode("utf-8"). Here is a list of python encodings: https://docs.python.org/2.4/lib/standard-encodings.html
my_logger.debug("Это отладочное сообщение".encode("utf-8"))
then use .decode("utf-8") to see the printable value of your string

The problem is how logging.StreamHandler performs console output, namely due to the fact that you couldn't change default encoding in contrast with FileHandler.
If the default system encoding doesn't match the needed one, you could face an issue.
For my example. I wanted to output cp1251 lines, while system default encoding was:
import locale
locale.getpreferredencoding()
'cp1252'
This question was solved by changing system locale (see https://stackoverflow.com/a/11234956/9851754). Choose "Change system locale..." for non-Unicode programs. No code changes needed.
import locale
locale.getpreferredencoding()
'cp1251'

I have tested your code with Python 3.6.8 and it worked for me (I didn't change anything).
Python 3.6.8:
>>> python3 -V
Python 3.6.8
>>> python3 test.py
Это отладочное сообщение
But when I have tested it with Python 2.7.15+, I got a similar error than you.
Python 2.7.15+ with your implementation:
>>> python2 -V
Python 2.7.15+
>>> python2 test.py
File "test.py", line 17
SyntaxError: Non-ASCII character '\xd0' in file test.py on line 17, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Then I have put the following line into the first line it worked for me.
Begging of code:
# -*- coding: utf-8 -*-
import logging
import sys
...
Output with Python 2.7.15+ and with modified code:
>>> python2 -V
Python 2.7.15+
>>> python2 test.py
Это отладочное сообщение

How to refer to a standard library in a logging configuration file?

I need to use a constant defined in the standard library socket in a logging configuration file. Problem, when reading the config file with logging.config.fileConfig() it ends with:
NameError: name 'socket' is not defined
My question is very close to this one, the difference is that if, as a workaround, I import the missing library (e.g. socket) from the main script reading this logging configuration file, it doesn't solve the problem (is this because I use python3?).
Complete logging configuration file:
[loggers]
keys=root,mainLogger
[handlers]
keys=mainHandler,nullHandler
[formatters]
keys=defaultFormatter,rawMessageFormatter
[logger_root]
level=INFO
handlers=nullHandler
[logger_mainLogger]
level=DEBUG
handlers=mainHandler
qualname=mainLogger
[handler_nullHandler]
class=NullHandler
args=(50,)
[handler_mainHandler]
class=logging.handlers.SysLogHandler
level=INFO
formatter=defaultFormatter
args=('/dev/log','myapp',socket.SOCK_STREAM)
[formatter_defaultFormatter]
format=%(asctime)s.%(msecs)d %(filename)s: %(funcName)s: %(message)s
datefmt=%Y/%m/%d %H:%M:%S
[formatter_rawMessageFormatter]
format=%(message)s
datefmt=
As another workaround I have tried the solution suggested here: How to use logging with python's fileConfig and configure the logfile filename but this neither works since socket.SOCK_STREAM is not a string (and I don't find any type that could work in the doc: https://docs.python.org/3.4/library/string.html#formatspec).
I have also tried to replace socket.SOCK_STREAM by 1 (since socket.SOCK_STREAM == 1 is True) but it doesn't work neither (socket.SOCK_STREAM not being an int...).
I would have liked to avoid converting my logging configuration file into a dictionary (but will do that if there's no other solution).

As documented in this section of the docs, the values are evaluated in the logging package's namespace. Hence, you can do something like this:
import logging
import socket
# The next line allows 'socket' in the logging package's namespace to pick up
# the stdlib socket module
logging.socket = socket
...
# when the config file is processed, it should work as expected
logging.config.fileConfig(...)
# remove the mapping from the logging package, as not needed any more
# (optional)
del logging.socket

First solution (should work but doesn't)
Well here there is a partial answer: https://docs.python.org/3.4/library/logging.config.html#access-to-external-objects
So I tried this:
args=('/dev/log','mathmaker','ext://socket.SOCK_STREAM')
But it does not work:
Traceback (most recent call last):
File "/usr/lib/python3.4/logging/__init__.py", line 1878, in shutdown
h.close()
File "/usr/lib/python3.4/logging/handlers.py", line 857, in close
self.socket.close()
AttributeError: 'SysLogHandler' object has no attribute 'socket'
It's like python expects the 'external' object to be an attribute of the class declared in the handler section (e.g. here: class=logging.handlers.SysLogHandler).
Second solution (but requires to turn the config file into yaml):
So, as the mechanism that seems dedicated to solve this problem does not work, I have tried with a configuration file written in yaml, and now it works. It requires to add a dependency (python-yaml or python3-yaml for ubuntu users...) and to load the configuration file as a dictionary:
with open(settings.logging_conf_file) as f:
logging.config.dictConfig(yaml.load(f))
and this way, it works.
Here is the same configuration file turned into working yaml (and notice that: 1. the import of socket is not required in the main script, looks like python will by itself 'magically' deal with the import; and 2. apart from the fact that yaml is easier to read than the old plain text config file, it also allows to define the keywords in a more readable way):
version: 1
formatters:
rawMessageFormatter:
format: '%(message)s'
datefmt: ''
defaultFormatter:
format: '%(asctime)s.%(msecs)d %(filename)s: %(funcName)s: %(message)s'
datefmt: '%Y/%m/%d %H:%M:%S'
handlers:
nullHandler:
class: logging.NullHandler
mainHandler:
class: logging.handlers.SysLogHandler
level: INFO
formatter: defaultFormatter
address: '/dev/log'
facility: 'myapp'
socktype: ext://socket.SOCK_DGRAM
loggers:
root:
level: INFO
handlers: [nullHandler]
mainLogger:
level: DEBUG
handlers: [mainHandler]
qualname: mainLogger

Multiline log records in syslog

So I've configured my Python application to log to syslog with Python's SysLogHandler, and everything works fine. Except for multi-line handling. Not that I need to emit multiline log records so badly (I do a little), but I need to be able to read Python's exceptions. I'm using Ubuntu with rsyslog 4.2.0. This is what I'm getting:
Mar 28 20:11:59 telemachos root: ERROR 'EXCEPTION'#012Traceback (most recent call last):#012 File "./test.py", line 22, in <module>#012 foo()#012 File "./test.py", line 13, in foo#012 bar()#012 File "./test.py", line 16, in bar#012 bla()#012 File "./test.py", line 19, in bla#012 raise Exception("EXCEPTION!")#012Exception: EXCEPTION!
Test code in case you need it:
import logging
from logging.handlers import SysLogHandler
logger = logging.getLogger()
logger.setLevel(logging.INFO)
syslog = SysLogHandler(address='/dev/log', facility='local0')
formatter = logging.Formatter('%(name)s: %(levelname)s %(message)r')
syslog.setFormatter(formatter)
logger.addHandler(syslog)
def foo():
bar()
def bar():
bla()
def bla():
raise Exception("EXCEPTION!")
try:
foo()
except:
logger.exception("EXCEPTION")

Alternatively, if you want to keep your syslog intact on one line for parsing, you can just replace the characters when viewing the log.
tail -f /var/log/syslog | sed 's/#012/\n\t/g'

OK, figured it out finally...
rsyslog by default escapes all weird characters (ASCII < 32), and this include newlines (as well as tabs and others).
$EscapeControlCharactersOnReceive:
This directive instructs rsyslogd to replace control characters during reception of the
message. The intent is to provide a way to stop non-printable
messages from entering the syslog system as whole. If this option is
turned on, all control-characters are converted to a 3-digit octal
number and be prefixed with the $ControlCharacterEscapePrefix
character (being ‘\’ by default). For example, if the BEL character
(ctrl-g) is included in the message, it would be converted to “\007”.
You can simply add this to your rsyslog config to turn it off:
$EscapeControlCharactersOnReceive off
or, with the "new" advanced syntax:
global(parser.escapeControlCharactersOnReceive="off")

Another option would be to subclass the SysLogHandler and override emit() - you could then call the superclass emit() for each line in the text you're sent. Something like:
from logging import LogRecord
from logging.handlers import SysLogHandler
class MultilineSysLogHandler(SysLogHandler):
def emit(self, record):
if '\n' in record.msg:
record_args = [record.args] if isinstance(record.args, dict) else record.args
for single_line in record.msg.split('\n'):
single_line_record = LogRecord(
name=record.name,
level=record.levelno,
pathname=record.pathname,
msg=single_line,
args=record_args,
exc_info=record.exc_info,
func=record.funcName
)
super(MultilineSysLogHandler, self).emit(single_line_record)
else:
super(MultilineSysLogHandler, self).emit(record)

UTF-8 In Python logging, how?

I'm trying to log a UTF-8 encoded string to a file using Python's logging package. As a toy example:
import logging
def logging_test():
handler = logging.FileHandler("/home/ted/logfile.txt", "w",
encoding = "UTF-8")
formatter = logging.Formatter("%(message)s")
handler.setFormatter(formatter)
root_logger = logging.getLogger()
root_logger.addHandler(handler)
root_logger.setLevel(logging.INFO)
# This is an o with a hat on it.
byte_string = '\xc3\xb4'
unicode_string = unicode("\xc3\xb4", "utf-8")
print "printed unicode object: %s" % unicode_string
# Explode
root_logger.info(unicode_string)
if __name__ == "__main__":
logging_test()
This explodes with UnicodeDecodeError on the logging.info() call.
At a lower level, Python's logging package is using the codecs package to open the log file, passing in the "UTF-8" argument as the encoding. That's all well and good, but it's trying to write byte strings to the file instead of unicode objects, which explodes. Essentially, Python is doing this:
file_handler.write(unicode_string.encode("UTF-8"))
When it should be doing this:
file_handler.write(unicode_string)
Is this a bug in Python, or am I taking crazy pills? FWIW, this is a stock Python 2.6 installation.

Having code like:
raise Exception(u'щ')
Caused:
File "/usr/lib/python2.7/logging/__init__.py", line 467, in format
s = self._fmt % record.__dict__
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
This happens because the format string is a byte string, while some of the format string arguments are unicode strings with non-ASCII characters:
>>> "%(message)s" % {'message': Exception(u'\u0449')}
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\u0449' in position 0: ordinal not in range(128)
Making the format string unicode fixes the issue:
>>> u"%(message)s" % {'message': Exception(u'\u0449')}
u'\u0449'
So, in your logging configuration make all format string unicode:
'formatters': {
'simple': {
'format': u'%(asctime)-s %(levelname)s [%(name)s]: %(message)s',
'datefmt': '%Y-%m-%d %H:%M:%S',
},
...
And patch the default logging formatter to use unicode format string:
logging._defaultFormatter = logging.Formatter(u"%(message)s")

Check that you have the latest Python 2.6 - some Unicode bugs were found and fixed since 2.6 came out. For example, on my Ubuntu Jaunty system, I ran your script copied and pasted, removing only the '/home/ted/' prefix from the log file name. Result (copied and pasted from a terminal window):
vinay#eta-jaunty:~/projects/scratch$ python --version
Python 2.6.2
vinay#eta-jaunty:~/projects/scratch$ python utest.py
printed unicode object: ô
vinay#eta-jaunty:~/projects/scratch$ cat logfile.txt
ô
vinay#eta-jaunty:~/projects/scratch$
On a Windows box:
C:\temp>python --version
Python 2.6.2
C:\temp>python utest.py
printed unicode object: ô
And the contents of the file:
This might also explain why Lennart Regebro couldn't reproduce it either.

I'm a little late, but I just came across this post that enabled me to set up logging in utf-8 very easily
Here the link to the post
or here the code:
root_logger= logging.getLogger()
root_logger.setLevel(logging.DEBUG) # or whatever
handler = logging.FileHandler('test.log', 'w', 'utf-8') # or whatever
formatter = logging.Formatter('%(name)s %(message)s') # or whatever
handler.setFormatter(formatter) # Pass handler as a parameter, not assign
root_logger.addHandler(handler)

I had a similar problem running Django in Python3: My logger died upon encountering some Umlauts (äöüß) but was otherwise fine. I looked through a lot of results and found none working. I tried
import locale;
if locale.getpreferredencoding().upper() != 'UTF-8':
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
which I got from the comment above.
It did not work. Looking at the current locale gave me some crazy ANSI thing, which turned out to mean basically just "ASCII". That sent me into totally the wrong direction.
Changing the logging format-strings to Unicode would not help.
Setting a magic encoding comment at the beginning of the script would not help.
Setting the charset on the sender's message (the text came from a HTTP-reqeust) did not help.
What DID work was setting the encoding on the file-handler to UTF-8 in settings.py. Because I had nothing set, the default would become None. Which apparently ends up being ASCII (or as I'd like to think about: ASS-KEY)
'handlers': {
'file': {
'level': 'DEBUG',
'class': 'logging.handlers.TimedRotatingFileHandler',
'encoding': 'UTF-8', # <-- That was missing.
....
},
},

Try this:
import logging
def logging_test():
log = open("./logfile.txt", "w")
handler = logging.StreamHandler(log)
formatter = logging.Formatter("%(message)s")
handler.setFormatter(formatter)
root_logger = logging.getLogger()
root_logger.addHandler(handler)
root_logger.setLevel(logging.INFO)
# This is an o with a hat on it.
byte_string = '\xc3\xb4'
unicode_string = unicode("\xc3\xb4", "utf-8")
print "printed unicode object: %s" % unicode_string
# Explode
root_logger.info(unicode_string.encode("utf8", "replace"))
if __name__ == "__main__":
logging_test()
For what it's worth I was expecting to have to use codecs.open to open the file with utf-8 encoding but either that's the default or something else is going on here, since it works as is like this.

If I understood your problem correctly, the same issue should arise on your system when you do just:
str(u'ô')
I guess automatic encoding to the locale encoding on Unix will not work until you have enabled locale-aware if branch in the setencoding function in your site module via locale. This file usually resides in /usr/lib/python2.x, it worth inspecting anyway. AFAIK, locale-aware setencoding is disabled by default (it's true for my Python 2.6 installation).
The choices are:
Let the system figure out the right way to encode Unicode strings to bytes or do it in your code (some configuration in site-specific site.py is needed)
Encode Unicode strings in your code and output just bytes
See also The Illusive setdefaultencoding by Ian Bicking and related links.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

UnicodeEncodeError when logging to file in PyCharm - python

Related

How to encode all logged messages as utf-8 in Python

Rise UnicodeEncodeError in logging.StreamHandler

How to refer to a standard library in a logging configuration file?

Multiline log records in syslog

UTF-8 In Python logging, how?

Categories

Resources