How to encode all logged messages as utf-8 in Python - python

I have a little logger function that returns potentially two handlers to log to a RotatingFileHandler and sys.stdout simultaneously.
import os, logging, sys
from logging.handlers import RotatingFileHandler
from config import *
def get_logger(filename, log_level_stdout=logging.WARNING, log_level_file=logging.INFO, echo=True):
logger = logging.getLogger(__name__)
if not os.path.exists(PATH + '/Logs'):
os.mkdir(PATH + '/Logs')
logger.setLevel(logging.DEBUG)
if echo:
prn_handler = logging.StreamHandler(sys.stdout)
prn_handler.setFormatter(logging.Formatter('%(asctime)s %(levelname)s: %(message)s'))
prn_handler.setLevel(log_level_stdout)
logger.addHandler(prn_handler)
file_handler = RotatingFileHandler(PATH + '/Logs/' + filename, maxBytes=1048576, backupCount=3)
file_handler.setFormatter(logging.Formatter('%(asctime)s %(levelname)s: %(message)s'))
file_handler.setLevel(log_level_file)
logger.addHandler(file_handler)
return logger
This works fine in general but certain strings being logged appear to be encoded in cp1252 and throw an (non-fatal) error when trying to print them to stdout via logger function. It should be noted that the very same characters can be printed just fine in the error message. Logging them to a file also causes no issues. It's only the console - sys.stdout - that throws this error.
--- Logging error ---
Traceback (most recent call last):
File "C:\Program Files\Python38\lib\logging\__init__.py", line 1084, in emit
stream.write(msg + self.terminator)
File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u1ecd' in position 65: character maps to <undefined>
Call stack:
File "script.py", line 147, in <module>
logger.info(f"F-String with a name in it: '{name}'.")
Message: "F-String with a name in it: 'Heimstọð'."
Arguments: ()
A fix to this has been to encode every single message getting as utf8 in the code that's calling the logger function like this:
logger.info((f"F-String with a name in it: '{name}'.").encode('utf8'))
However I feel like this is neither elegant nor efficient.
It should also be noted that the logging of the file works just fine and I already tried setting the PYTHONIOENCODING to utf-8 in the system variables of Windows without any noticeable effect.
Update:
Turns out I'm stupid. Just because an error message is printed in the console doesn't mean the printing to the console is the cause of the error. I was looking into the answers to the other question that has been suggested to me here and after a while realized that nothing I did to the "if echo" part of the function had any impact on the result. The last check was commenting out the whole block and I still got the error. That's when I realized that the issue was in fact caused by not enforcing UTF8 when writing to the file. Adding the simple kwarg encoding='utf-8' to the RotatingFileHandler as suggested by #michael-ruth fixed the issue for me.
P.S. I'm not sure how to handle this case because, while that answer fixed my problem, it wasn't really what I was asking for or what the question suggested because I originally misunderstood the root cause. I'll still check it as solution and upvote both answers. I'll also edit the question as to not mislead future readers into believing it would answer that question when it doesn't really.

Set the encoding while instantiating the handler instead of encoding the message explicitly.
file_handler = RotatingFileHandler(
PATH + '/Logs/' + filename,
maxBytes=1048576,
backupCount=3,
encoding='utf-8'
)
help(RotatingFileHandler) is your best friend.
Help on class RotatingFileHandler in module logging.handlers:
class RotatingFileHandler(BaseRotatingHandler)
| RotatingFileHandler(filename, mode='a', maxBytes=0, backupCount=0, encoding=None, delay=False)

Can you check the encoding of sys.stdout (sys.stdout.encoding)?
If it's not 'utf-8', this answer may help to reconfigure the encoding.

Related

Python logging failes with log-file on network drive (windows 10)

I want to log using python's logging module to a file on a network drive. My problem is that the logging fails at some random point giving me this error:
--- Logging error ---
Traceback (most recent call last):
File "c:\programme\anaconda3\lib\logging\__init__.py", line 1085, in emit
self.flush()
File "c:\programme\anaconda3\lib\logging\__init__.py", line 1065, in flush
self.stream.flush()
OSError: [Errno 22] Invalid argument
Call stack:
File "log_test.py", line 67, in <module>
logger_root.error('FLUSH!!!'+str(i))
Message: 'Minute:120'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "c:\programme\anaconda3\lib\logging\__init__.py", line 1085, in emit
self.flush()
File "c:\programme\anaconda3\lib\logging\__init__.py", line 1065, in flush
self.stream.flush()
OSError: [Errno 22] Invalid argument
Call stack:
File "log_test.py", line 67, in <module>
logger_root.error('FLUSH!!!'+str(i))
Message: 'FLUSH!!!120'
Arguments: ()
I am on a virtual machine with Windows 10 (Version 1909) and I am using Python 3.8.3 and logging 0.5.1.2. The script runs in an virtual environment on a network drive, where the log files are stored.
I am writing a script that is automating some data quality control tasks and I am not 100% sure, where (network drive, local drive, etc.) the script will end up on, so it should be able to log in every possible situation. The error does not appear at the same position/line in the script but randomly. Sometimes the program (~120 minutes in total) finishes without the error appearing at all.
What I tried so far:
I believe that the logfile is closed at some point so that no new logging messages can be written to it. I wrote a simple script that basically only does logs to check if it is related to my original script or the logging process itself. Since the "only-logs-script" also fails randomly, when running on the network drive but not when it is running on my local drive, I assume that it is related to the connection to the network drive. I thought about having the whole logging stored in the memory and then written to the file but the MemoryHandler will also open the file at the beginning of the script and therefore fail at some point.
Here is my code for the "only-logs-script" (log_test.py):
import logging
import logging.handlers
import os
import datetime
import time
##################################################################
# setting up a logger to create a log file with information about this programm
logfile_dir = 'logfiles_test'
CHECK_FOLDER = os.path.isdir(logfile_dir)
# if folder doesn't exist, create it
if not CHECK_FOLDER:
os.makedirs(logfile_dir)
print("created folder : ", logfile_dir)
log_path = '.\\'+logfile_dir+'\\'
Current_Date = datetime.datetime.today().strftime ('%Y-%m-%d_')
log_filename = log_path+Current_Date+'logtest.log'
print(log_filename)
# Create a root logger
logger_root = logging.getLogger()
# Create handlers
f1_handler = logging.FileHandler(log_filename, mode='w+')
f2_handler = logging.StreamHandler()
f1_handler.setLevel(logging.INFO)
f2_handler.setLevel(logging.INFO)
# Create formatters and add it to handlers
f1_format = logging.Formatter('%(asctime)s | %(name)s | %(levelname)s | %(message)s \n')
f2_format = logging.Formatter('%(asctime)s | %(name)s | %(levelname)s | %(message)s \n')
f1_handler.setFormatter(f1_format)
f2_handler.setFormatter(f2_format)
# create a memory handler
memoryhandler = logging.handlers.MemoryHandler(
capacity=1024*100,
flushLevel=logging.ERROR,
target=f1_handler,
flushOnClose=True
)
# Add handlers to the logger
logger_root.addHandler(memoryhandler)
logger_root.addHandler(f2_handler)
logger_root.setLevel(logging.INFO)
logger_root.info('Log-File initiated.')
fname = log_path+'test.log'
open(fname, mode='w+')
for i in range(60*4):
print(i)
logger_root.warning('Minute:'+str(i))
print('Write access:', os.access(fname, os.W_OK))
if(i%10==0):
logger_root.error('FLUSH!!!'+str(i))
time.sleep(60)
Is there something horribly wrong with my logging process or is it because of the network drive? And does anyone of you have any ideas on how to tackle this issue? Would storing the whole information in the memory and writing it to a file in the end solve the problem? How would I best achieve this?
Another idea would be to log on the local drive and then automatically copy the file to the network drive, when the script is done. Any help is strongly appreciated as I have tried to identify and solve this problem for several days now.
Thank you!
Since this is not really going anywhere atm I will post what I did to "solve" my problem. It is not a satisfactory solution as it fails when the code fails but it is better than not logging at all.
The solution is inspired by the answer to this question: log messages to an array/list with logging
So here is what I did:
import io
#####################################
# first create an in-memory file-like object to save the logs to
log_messages = io.StringIO()
# create a stream handler that saves the log messages to that object
s1_handler = logging.StreamHandler(log_messages)
s1_handler.setLevel(logging.INFO)
# create a file handler just in case
f1_handler = logging.FileHandler(log_filename, mode='w+')
f1_handler.setLevel(logging.INFO)
# set the format for the log messages
log_format = '%(asctime)s | %(name)s | %(levelname)s | %(message)s \n'
f1_format = logging.Formatter(log_format)
s1_handler.setFormatter(f1_format)
f1_format = logging.Formatter(log_format)
# add the handler to the logger
logger_root.addHandler(s1_handler)
logger_root.addHandler(f1_handler)
#####################################
# here would be the main code ...
#####################################
# at the end of my code I added this to write the in-memory-message to the file
contents = log_messages.getvalue()
# opening a file in 'w'
file = open(log_filename, 'w')
# write log message to file
file.write("{}\n".format(contents))
# closing the file and the in-memory object
file.close()
log_messages.close()
Obviously this fails when the code fails but the code tries to catch most errors, so I hope it will work. I got rid of the Memory handler but kept a file handler so that in case of a real failure at least some of the logs are recorded until the file handler fails. It is far from ideal but it works for me atm. If you have some other suggestions/improvements I would be happy to hear them!

UnicodeEncodeError when logging to file in PyCharm

I'm logging some Unicode characters to a file using "logging" in Python 3. The code works in the terminal, but fails with a UnicodeEncodeError in PyCharm.
I load my logging configuration using logging.config.fileConfig. In the configuration, I specify a file handler with encoding = utf-8. Logging to console works fine.
I'm using PyCharm 2019.1.1 (Community Edition). I don't think I've changed any relevant setting, but when I ran the same code in the PyCharm on another computer, the error was not reproduced. Therefore, I suspect the problem is related to a PyCharm setting.
Here is a minimal example:
import logging
from logging.config import fileConfig
# ok
print('1. café')
# ok
logging.error('2. café')
# UnicodeEncodeError
fileConfig('logcfg.ini')
logging.error('3. café')
The content of logcfg.ini (in the same directory) is the following:
[loggers]
keys = root
[handlers]
keys = file_handler
[formatters]
keys = formatter
[logger_root]
level = INFO
handlers = file_handler
[handler_file_handler]
class = logging.handlers.RotatingFileHandler
formatter = formatter
args = ('/tmp/test.log',)
encoding = utf-8
[formatter_formatter]
format = %(levelname)s: %(message)s
I expect to see the first two logging messages in the console, and the third one in the logging file. The first two logging statements worked fine, but the third one failed. Here is the complete console output in PyCharm:
1. café
ERROR:root:2. café
--- Logging error ---
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/logging/__init__.py", line 996, in emit
stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 13: ordinal not in range(128)
Call stack:
File "/Users/klkh/test.py", line 12, in <module>
logging.error('3. café')
Message: '3. café'
Arguments: ()
You seem to use utf-8 as encode for your config file, python (when using pycharm) in your case seems to raise an encode error UnicodeEncodeError in place of guessing your config file encode wildly, because if it use a wrong encode all the config file gonna be decrypted differently from the original, so best to do is precising the encode type of the config in your python script
Notice : I can't seem to find documentation of fileConfig from logging.config so I'm using basicConfig
import logging
from logging.config import fileConfig
print('1. café')
logging.error('2. café')
logging.basicConfig(filename='your config' , encode='utf-8') # in your case the encode is utf-8
logging.error('3. café')
output:
1. café
ERROR:root:2. café
ERROR:root:3. café
I found a solution myself. I should pass the encoding like this:
args = ('/tmp/test.log', 'a', 0, 0, 'utf-8')
instead of
args = ('/tmp/test.log',)
encoding = utf-8
However, I'm still interested in knowing why the PyCharm on the other computer uses utf-8 by default. How do I set the default encoding for non-console streams in PyCharm?

Logging basicConfig not creating log file when I run in PyCharm?

When I run below code in terminal its create a log file
import logging
logging.basicConfig(filename='ramexample.log',level=logging.DEBUG)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')
but when I run the same code (with different filename='ram.log') in PyCharm it's not creating any log file. Why?
import logging
logging.basicConfig(filename='ram.log',level=logging.DEBUG)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')
What I have to do to create a log file with PyCharm?
I encountered same issue and found none of the answers previously provided here would work. Maybe this issue had been solved long ago to Ramnath Reddy, but I could not find the correct answer anywhere online.
Luckily, I found a solution from a colleague's code by adding the following lines before logging.basicConfig().
# Remove all handlers associated with the root logger object.
for handler in logging.root.handlers[:]:
logging.root.removeHandler(handler)
Try and see if it helps for whomever had the same issue.
Python 3.8: A new option, force, has been made available to automatically remove the root handlers while calling basicConfig().
For example:
logging.basicConfig(filename='ramexample.log', level=logging.DEBUG, force=True)`
See logging.basicConfig parameters:
force: If this keyword argument is specified as true, any existing handlers attached to the root logger are removed and closed, before carrying out the configuration as specified by the other arguments.
I can't remember where I got this otherwise I would have provided a link. But had the same problem some time ago using in jupyter notebooks and this fixed it:
import logging
logger = logging.getLogger()
fhandler = logging.FileHandler(filename='mylog.log', mode='a')
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fhandler.setFormatter(formatter)
logger.addHandler(fhandler)
logger.setLevel(logging.DEBUG)
The answer why this error happens is this:
The call to basicConfig() should come before any calls to debug(), info() etc.
If you do so the basicConfig can not create and write a new file.
Here I called logging.info() right before logging.basicConfig().
Don't:
import logging
logging.info("root") # call to info too early
logging.basicConfig(filename="rec/test.log", level=logging.DEBUG) # no file created
Maximas is right. File path is relative to execution environment. However instead of writing down the absolute path you could try the dynamic path resolution approach:
filename = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'ram.log')
logging.basicConfig(filename=filename, level=logging.DEBUG)
This assumes that ram.log resides in the same directory with the one that contains the above code (that's why __file__ is used for).
By using force it get solve.
like logging.basicConfig(filename="test.log", force=True)
This does create a log within the pycharm terminal using the Py terminal within it. You need to check the location of where the terminal is (try dir on Windows or pwd on linux/mac). Instead of just putting in ram.log, use the full file path of where you would like the file to appear.
E.G.
logging.basicConfig(filename='/Users/Donkey/Test/ram.log', level=logging.DEBUG)
I used to get this error, but I solved by adding force=True in basicConfig:
logging.basicConfig(level=logging.INFO,filename='C:\\Users\\sukal\\PycharmProjects\\Test\\Logs\\Automation.log',format=Log_Format,force=True)
import logging
class LogGen:
#staticmethod
def loggen():
logger = logging.getLogger()
fhandler = logging.FileHandler(filename='.\\logs\\automation.log', mode='a')
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fhandler.setFormatter(formatter)
logger.addHandler(fhandler)
logger.setLevel(logging.INFO)
return logger

Multiline log records in syslog

So I've configured my Python application to log to syslog with Python's SysLogHandler, and everything works fine. Except for multi-line handling. Not that I need to emit multiline log records so badly (I do a little), but I need to be able to read Python's exceptions. I'm using Ubuntu with rsyslog 4.2.0. This is what I'm getting:
Mar 28 20:11:59 telemachos root: ERROR 'EXCEPTION'#012Traceback (most recent call last):#012 File "./test.py", line 22, in <module>#012 foo()#012 File "./test.py", line 13, in foo#012 bar()#012 File "./test.py", line 16, in bar#012 bla()#012 File "./test.py", line 19, in bla#012 raise Exception("EXCEPTION!")#012Exception: EXCEPTION!
Test code in case you need it:
import logging
from logging.handlers import SysLogHandler
logger = logging.getLogger()
logger.setLevel(logging.INFO)
syslog = SysLogHandler(address='/dev/log', facility='local0')
formatter = logging.Formatter('%(name)s: %(levelname)s %(message)r')
syslog.setFormatter(formatter)
logger.addHandler(syslog)
def foo():
bar()
def bar():
bla()
def bla():
raise Exception("EXCEPTION!")
try:
foo()
except:
logger.exception("EXCEPTION")
Alternatively, if you want to keep your syslog intact on one line for parsing, you can just replace the characters when viewing the log.
tail -f /var/log/syslog | sed 's/#012/\n\t/g'
OK, figured it out finally...
rsyslog by default escapes all weird characters (ASCII < 32), and this include newlines (as well as tabs and others).
$EscapeControlCharactersOnReceive:
This directive instructs rsyslogd to replace control characters during reception of the
message. The intent is to provide a way to stop non-printable
messages from entering the syslog system as whole. If this option is
turned on, all control-characters are converted to a 3-digit octal
number and be prefixed with the $ControlCharacterEscapePrefix
character (being ‘\’ by default). For example, if the BEL character
(ctrl-g) is included in the message, it would be converted to “\007”.
You can simply add this to your rsyslog config to turn it off:
$EscapeControlCharactersOnReceive off
or, with the "new" advanced syntax:
global(parser.escapeControlCharactersOnReceive="off")
Another option would be to subclass the SysLogHandler and override emit() - you could then call the superclass emit() for each line in the text you're sent. Something like:
from logging import LogRecord
from logging.handlers import SysLogHandler
class MultilineSysLogHandler(SysLogHandler):
def emit(self, record):
if '\n' in record.msg:
record_args = [record.args] if isinstance(record.args, dict) else record.args
for single_line in record.msg.split('\n'):
single_line_record = LogRecord(
name=record.name,
level=record.levelno,
pathname=record.pathname,
msg=single_line,
args=record_args,
exc_info=record.exc_info,
func=record.funcName
)
super(MultilineSysLogHandler, self).emit(single_line_record)
else:
super(MultilineSysLogHandler, self).emit(record)

UTF-8 In Python logging, how?

I'm trying to log a UTF-8 encoded string to a file using Python's logging package. As a toy example:
import logging
def logging_test():
handler = logging.FileHandler("/home/ted/logfile.txt", "w",
encoding = "UTF-8")
formatter = logging.Formatter("%(message)s")
handler.setFormatter(formatter)
root_logger = logging.getLogger()
root_logger.addHandler(handler)
root_logger.setLevel(logging.INFO)
# This is an o with a hat on it.
byte_string = '\xc3\xb4'
unicode_string = unicode("\xc3\xb4", "utf-8")
print "printed unicode object: %s" % unicode_string
# Explode
root_logger.info(unicode_string)
if __name__ == "__main__":
logging_test()
This explodes with UnicodeDecodeError on the logging.info() call.
At a lower level, Python's logging package is using the codecs package to open the log file, passing in the "UTF-8" argument as the encoding. That's all well and good, but it's trying to write byte strings to the file instead of unicode objects, which explodes. Essentially, Python is doing this:
file_handler.write(unicode_string.encode("UTF-8"))
When it should be doing this:
file_handler.write(unicode_string)
Is this a bug in Python, or am I taking crazy pills? FWIW, this is a stock Python 2.6 installation.
Having code like:
raise Exception(u'щ')
Caused:
File "/usr/lib/python2.7/logging/__init__.py", line 467, in format
s = self._fmt % record.__dict__
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
This happens because the format string is a byte string, while some of the format string arguments are unicode strings with non-ASCII characters:
>>> "%(message)s" % {'message': Exception(u'\u0449')}
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\u0449' in position 0: ordinal not in range(128)
Making the format string unicode fixes the issue:
>>> u"%(message)s" % {'message': Exception(u'\u0449')}
u'\u0449'
So, in your logging configuration make all format string unicode:
'formatters': {
'simple': {
'format': u'%(asctime)-s %(levelname)s [%(name)s]: %(message)s',
'datefmt': '%Y-%m-%d %H:%M:%S',
},
...
And patch the default logging formatter to use unicode format string:
logging._defaultFormatter = logging.Formatter(u"%(message)s")
Check that you have the latest Python 2.6 - some Unicode bugs were found and fixed since 2.6 came out. For example, on my Ubuntu Jaunty system, I ran your script copied and pasted, removing only the '/home/ted/' prefix from the log file name. Result (copied and pasted from a terminal window):
vinay#eta-jaunty:~/projects/scratch$ python --version
Python 2.6.2
vinay#eta-jaunty:~/projects/scratch$ python utest.py
printed unicode object: ô
vinay#eta-jaunty:~/projects/scratch$ cat logfile.txt
ô
vinay#eta-jaunty:~/projects/scratch$
On a Windows box:
C:\temp>python --version
Python 2.6.2
C:\temp>python utest.py
printed unicode object: ô
And the contents of the file:
This might also explain why Lennart Regebro couldn't reproduce it either.
I'm a little late, but I just came across this post that enabled me to set up logging in utf-8 very easily
Here the link to the post
or here the code:
root_logger= logging.getLogger()
root_logger.setLevel(logging.DEBUG) # or whatever
handler = logging.FileHandler('test.log', 'w', 'utf-8') # or whatever
formatter = logging.Formatter('%(name)s %(message)s') # or whatever
handler.setFormatter(formatter) # Pass handler as a parameter, not assign
root_logger.addHandler(handler)
I had a similar problem running Django in Python3: My logger died upon encountering some Umlauts (äöüß) but was otherwise fine. I looked through a lot of results and found none working. I tried
import locale;
if locale.getpreferredencoding().upper() != 'UTF-8':
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
which I got from the comment above.
It did not work. Looking at the current locale gave me some crazy ANSI thing, which turned out to mean basically just "ASCII". That sent me into totally the wrong direction.
Changing the logging format-strings to Unicode would not help.
Setting a magic encoding comment at the beginning of the script would not help.
Setting the charset on the sender's message (the text came from a HTTP-reqeust) did not help.
What DID work was setting the encoding on the file-handler to UTF-8 in settings.py. Because I had nothing set, the default would become None. Which apparently ends up being ASCII (or as I'd like to think about: ASS-KEY)
'handlers': {
'file': {
'level': 'DEBUG',
'class': 'logging.handlers.TimedRotatingFileHandler',
'encoding': 'UTF-8', # <-- That was missing.
....
},
},
Try this:
import logging
def logging_test():
log = open("./logfile.txt", "w")
handler = logging.StreamHandler(log)
formatter = logging.Formatter("%(message)s")
handler.setFormatter(formatter)
root_logger = logging.getLogger()
root_logger.addHandler(handler)
root_logger.setLevel(logging.INFO)
# This is an o with a hat on it.
byte_string = '\xc3\xb4'
unicode_string = unicode("\xc3\xb4", "utf-8")
print "printed unicode object: %s" % unicode_string
# Explode
root_logger.info(unicode_string.encode("utf8", "replace"))
if __name__ == "__main__":
logging_test()
For what it's worth I was expecting to have to use codecs.open to open the file with utf-8 encoding but either that's the default or something else is going on here, since it works as is like this.
If I understood your problem correctly, the same issue should arise on your system when you do just:
str(u'ô')
I guess automatic encoding to the locale encoding on Unix will not work until you have enabled locale-aware if branch in the setencoding function in your site module via locale. This file usually resides in /usr/lib/python2.x, it worth inspecting anyway. AFAIK, locale-aware setencoding is disabled by default (it's true for my Python 2.6 installation).
The choices are:
Let the system figure out the right way to encode Unicode strings to bytes or do it in your code (some configuration in site-specific site.py is needed)
Encode Unicode strings in your code and output just bytes
See also The Illusive setdefaultencoding by Ian Bicking and related links.

Categories

Resources