Python File Handler with Rotating Content of File - python

I have written a simple logging program that attaches anything I send to it to a file:
def log(message):
with open ("log.txt", 'a+') as f:
f.write(message + "\n")
However, I would like to limit how big this file gets. When it gets to the maximum size, I would like for it to remove the first lines and append at the bottom.
Is this possible with a file handler or do I need to code it myself? I am also fine using a rotating file handler, but all the examples I have seen let the environment write exceptions automatically after setting a level, and I need to control what is written to the file.
Many thanks in advance!

This is an example of using python's built in RotatingFileHandler:
import logging
from logging.handlers import RotatingFileHandler
# change to a file you want to log to
logFile = 'log_r.log'
my_handler = RotatingFileHandler(logFile, mode='a', maxBytes=5*1024*1024,
backupCount=2, encoding=None, delay=0)
my_handler.setLevel(logging.INFO)
app_log = logging.getLogger('root')
app_log.setLevel(logging.INFO)
app_log.addHandler(my_handler)
def bad():
raise Exception("Something bad")
if __name__ == "__main__":
app_log.info("something")
try:
app_log.info("trying to run bad")
bad()
except Exception as e:
app_log.info("That was bad...")
finally:
app_log.info("Ran bad...")
The behaviour is slightly different to your proposed behaviour as it doesn't delete from the start of the file, instead moving the file to a different filename and starting from scratch.
Note that the only things that show in the log file when you run this are the pieces of text we're logging explicitly - i.e. no system junk you don't want.

Related

Python logging failes with log-file on network drive (windows 10)

I want to log using python's logging module to a file on a network drive. My problem is that the logging fails at some random point giving me this error:
--- Logging error ---
Traceback (most recent call last):
File "c:\programme\anaconda3\lib\logging\__init__.py", line 1085, in emit
self.flush()
File "c:\programme\anaconda3\lib\logging\__init__.py", line 1065, in flush
self.stream.flush()
OSError: [Errno 22] Invalid argument
Call stack:
File "log_test.py", line 67, in <module>
logger_root.error('FLUSH!!!'+str(i))
Message: 'Minute:120'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "c:\programme\anaconda3\lib\logging\__init__.py", line 1085, in emit
self.flush()
File "c:\programme\anaconda3\lib\logging\__init__.py", line 1065, in flush
self.stream.flush()
OSError: [Errno 22] Invalid argument
Call stack:
File "log_test.py", line 67, in <module>
logger_root.error('FLUSH!!!'+str(i))
Message: 'FLUSH!!!120'
Arguments: ()
I am on a virtual machine with Windows 10 (Version 1909) and I am using Python 3.8.3 and logging 0.5.1.2. The script runs in an virtual environment on a network drive, where the log files are stored.
I am writing a script that is automating some data quality control tasks and I am not 100% sure, where (network drive, local drive, etc.) the script will end up on, so it should be able to log in every possible situation. The error does not appear at the same position/line in the script but randomly. Sometimes the program (~120 minutes in total) finishes without the error appearing at all.
What I tried so far:
I believe that the logfile is closed at some point so that no new logging messages can be written to it. I wrote a simple script that basically only does logs to check if it is related to my original script or the logging process itself. Since the "only-logs-script" also fails randomly, when running on the network drive but not when it is running on my local drive, I assume that it is related to the connection to the network drive. I thought about having the whole logging stored in the memory and then written to the file but the MemoryHandler will also open the file at the beginning of the script and therefore fail at some point.
Here is my code for the "only-logs-script" (log_test.py):
import logging
import logging.handlers
import os
import datetime
import time
##################################################################
# setting up a logger to create a log file with information about this programm
logfile_dir = 'logfiles_test'
CHECK_FOLDER = os.path.isdir(logfile_dir)
# if folder doesn't exist, create it
if not CHECK_FOLDER:
os.makedirs(logfile_dir)
print("created folder : ", logfile_dir)
log_path = '.\\'+logfile_dir+'\\'
Current_Date = datetime.datetime.today().strftime ('%Y-%m-%d_')
log_filename = log_path+Current_Date+'logtest.log'
print(log_filename)
# Create a root logger
logger_root = logging.getLogger()
# Create handlers
f1_handler = logging.FileHandler(log_filename, mode='w+')
f2_handler = logging.StreamHandler()
f1_handler.setLevel(logging.INFO)
f2_handler.setLevel(logging.INFO)
# Create formatters and add it to handlers
f1_format = logging.Formatter('%(asctime)s | %(name)s | %(levelname)s | %(message)s \n')
f2_format = logging.Formatter('%(asctime)s | %(name)s | %(levelname)s | %(message)s \n')
f1_handler.setFormatter(f1_format)
f2_handler.setFormatter(f2_format)
# create a memory handler
memoryhandler = logging.handlers.MemoryHandler(
capacity=1024*100,
flushLevel=logging.ERROR,
target=f1_handler,
flushOnClose=True
)
# Add handlers to the logger
logger_root.addHandler(memoryhandler)
logger_root.addHandler(f2_handler)
logger_root.setLevel(logging.INFO)
logger_root.info('Log-File initiated.')
fname = log_path+'test.log'
open(fname, mode='w+')
for i in range(60*4):
print(i)
logger_root.warning('Minute:'+str(i))
print('Write access:', os.access(fname, os.W_OK))
if(i%10==0):
logger_root.error('FLUSH!!!'+str(i))
time.sleep(60)
Is there something horribly wrong with my logging process or is it because of the network drive? And does anyone of you have any ideas on how to tackle this issue? Would storing the whole information in the memory and writing it to a file in the end solve the problem? How would I best achieve this?
Another idea would be to log on the local drive and then automatically copy the file to the network drive, when the script is done. Any help is strongly appreciated as I have tried to identify and solve this problem for several days now.
Thank you!
Since this is not really going anywhere atm I will post what I did to "solve" my problem. It is not a satisfactory solution as it fails when the code fails but it is better than not logging at all.
The solution is inspired by the answer to this question: log messages to an array/list with logging
So here is what I did:
import io
#####################################
# first create an in-memory file-like object to save the logs to
log_messages = io.StringIO()
# create a stream handler that saves the log messages to that object
s1_handler = logging.StreamHandler(log_messages)
s1_handler.setLevel(logging.INFO)
# create a file handler just in case
f1_handler = logging.FileHandler(log_filename, mode='w+')
f1_handler.setLevel(logging.INFO)
# set the format for the log messages
log_format = '%(asctime)s | %(name)s | %(levelname)s | %(message)s \n'
f1_format = logging.Formatter(log_format)
s1_handler.setFormatter(f1_format)
f1_format = logging.Formatter(log_format)
# add the handler to the logger
logger_root.addHandler(s1_handler)
logger_root.addHandler(f1_handler)
#####################################
# here would be the main code ...
#####################################
# at the end of my code I added this to write the in-memory-message to the file
contents = log_messages.getvalue()
# opening a file in 'w'
file = open(log_filename, 'w')
# write log message to file
file.write("{}\n".format(contents))
# closing the file and the in-memory object
file.close()
log_messages.close()
Obviously this fails when the code fails but the code tries to catch most errors, so I hope it will work. I got rid of the Memory handler but kept a file handler so that in case of a real failure at least some of the logs are recorded until the file handler fails. It is far from ideal but it works for me atm. If you have some other suggestions/improvements I would be happy to hear them!

Python Log File is not created unless basicConfig is called on top before any functions

I have a script that processes csvs and load them to database. My intern mentor wanted us to use log file to capture what's going on and he wanted it to be flexible so one can use a config.ini file to edit where they want the log file to be created. As a result I did just that, using a config file that use key value pairs in a dict that i can extract the path to the log file from. These are excepts from my code where log file is created and used:
dirconfig_file = r"C:\Users\sys_nsgprobeingestio\Documents\dozie\odfs\venv\odfs_tester_history_dirs.ini"
start_time = datetime.now()
def process_dirconfig_file(config_file_from_sysarg):
try:
if Path.is_file(dirconfig_file_Pobj):
parseddict = {}
configsects_set = set()
for sect in config.sections():
configsects_set.add(sect)
for k, v in config.items(sect):
# print('{} = {}'.format(k, v))
parseddict[k] = v
print(parseddict)
try:
if ("log_dir" not in parseddict or parseddict["log_dir"] == "" or "log_dir" not in configsects_set):
raise Exception(f"Error: Your config file is missing 'logfile path' or properly formatted [log_file] section for this script to run. Please edit config file to include logfile path to capture errors")
except Exception as e:
#raise Exception(e)
logging.exception(e)
print(e)
parse_dict = process_dirconfig_file(dirconfig_file)
logfilepath = parse_dict["log_dir"]
log_file_name = start_time.strftime(logfilepath)
print(log_file_name)
logging.basicConfig(
filename=log_file_name,
level=logging.DEBUG,
format='[Probe Data Quality] %(asctime)s - %(name)s %(levelname)-7.7s %(message)s'
# can you explain this Tenzin?
)
if __name__ == '__main__':
try:
startTime = datetime.now()
db_instance = dbhandler(parse_dict["db_string"])
odfs_tabletest_dict = db_instance['odfs_tester_history_files']
odf_history_from_csv_to_dbtable(db_instance)
#print("test exception")
print(datetime.now() - startTime)
except Exception as e:
logging.exception(e)
print(e)
Doing this, no file is created. The script runs with no errors but no log file is created. I've tried several things including using a hardcoded log file name, instead of calling it from the config file but it didn't work
The only thing that works is when the log file is created up top before any method. Why is this?
When you are calling your process_dirconfig_file function, the logging configuration has not been set yet, so no file could have been created. The script executes top to bottom. It would be similar to doing something like this:
import sys
# default logging points to stdout/stderr kind of like this
my_logger = sys.stdout
my_logger.write("Something")
# Then you've pointed logging to a file
my_logger = open("some_file.log", 'w')
my_logger.write("Something else")
Only Something else would be written to our some_file.log, because my_logger pointed somewhere else beforehand.
Much the same is happening here. By default, the logging.<debug/info> functions do nothing because logging won't do anything with them without additional configuration. logging.error, logging.warning, and logging.exception will always at least write to stdout out of the box.
Also, I don't think the inner try is valid Python, you need a matching except. And I wouldn't just print an exception raised by that function, I'd probably raise and have the program crash:
def process_dirconfig_file(config_file_from_sysarg):
try:
# Don't use logging.<anything> yet
~snip~
except Exception as e:
# Just raise or don't use try/except at all until
# you have a better idea of what you want to do in this circumstance
raise
Especially since you are trying to use the logger while validating that its configuration is correct.
The fix? Don't use the logger until after you've determined it's ready.

In the logging module's RotatingFileHandler, how to set the backupCount to a practically infinite number

I'm writing a program which backs up a database using Python's RotatingFileHandler. This has two parameters, maxBytes and backupCount: the former is the maximum size of each log file, and the latter the maximum number of log files.
I would like to effectively never delete data, but still have each log file a certain size (say, 2 kB for the purpose of illustration). So I tried to set the backupCount parameter to sys.maxint:
import msgpack
import json
from faker import Faker
import logging
from logging.handlers import RotatingFileHandler
import os, glob
import itertools
import sys
fake = Faker()
fake.seed(0)
data_file = "my_log.log"
logger = logging.getLogger('my_logger')
logger.setLevel(logging.DEBUG)
handler = RotatingFileHandler(data_file, maxBytes=2000, backupCount=sys.maxint)
logger.addHandler(handler)
fake_dicts = [{'name': fake.name(), 'email': fake.email()} for _ in range(100)]
def dump(item, mode='json'):
if mode == 'json':
return json.dumps(item)
elif mode == 'msgpack':
return msgpack.packb(item)
mode = 'json'
# Generate the archive log
for item in fake_dicts:
dump_string = dump(item, mode=mode)
logger.debug(dump_string)
However, this leads to several MemoryErrors which look like this:
Traceback (most recent call last):
File "/usr/lib/python2.7/logging/handlers.py", line 77, in emit
self.doRollover()
File "/usr/lib/python2.7/logging/handlers.py", line 129, in doRollover
for i in range(self.backupCount - 1, 0, -1):
MemoryError
Logged from file json_logger.py, line 37
It seems like making this parameter large causes the system to use lots of memory, which is not desirable. Is there any way around this trade-off?
An improvement to the solution suggested by #Asiel
Instead of using itertools and os.path.exists to determine what the nextName should be in doRollOver, the solution below simply remembers the number of last backup done and increments it to get the nextName.
from logging.handlers import RotatingFileHandler
import os
class RollingFileHandler(RotatingFileHandler):
def __init__(self, filename, mode='a', maxBytes=0, backupCount=0, encoding=None, delay=False):
self.last_backup_cnt = 0
super(RollingFileHandler, self).__init__(filename=filename,
mode=mode,
maxBytes=maxBytes,
backupCount=backupCount,
encoding=encoding,
delay=delay)
# override
def doRollover(self):
if self.stream:
self.stream.close()
self.stream = None
# my code starts here
self.last_backup_cnt += 1
nextName = "%s.%d" % (self.baseFilename, self.last_backup_cnt)
self.rotate(self.baseFilename, nextName)
# my code ends here
if not self.delay:
self.stream = self._open()
This class will still save your backups in an ascendant order (ex. first backup will end with ".1", the second one will end with ".2", and so on). Modifying this to do also do gzip is straight forward.
The problem here is that RotatingFileHandler is intended to... well rotate, and actually if you set its backupCount to a big number the RotatingFileHandler.doRollover method will loop in a backward range from backupCount-1 to zero trying to find the last created backup, the bigger the backupCount the slower it will be (when you have an small number of backups)
Also the RotatingFileHandler will keep renaming your backups which isn't necessary for what you want and actually it is an overhead, instead of simply putting your latest backup with the next ".n+1" extension it will rename all your backups and put the latest backup with the extension ".1" (will shift all backup names)
Solution:
You could code the next class (probably with a better name):
from logging.handlers import RotatingFileHandler
import itertools
import os
class RollingFileHandler(RotatingFileHandler):
# override
def doRollover(self):
if self.stream:
self.stream.close()
self.stream = None
# my code starts here
for i in itertools.count(1):
nextName = "%s.%d" % (self.baseFilename, i)
if not os.path.exists(nextName):
self.rotate(self.baseFilename, nextName)
break
# my code ends here
if not self.delay:
self.stream = self._open()
This class will save your backups in an ascendant order (ex. first backup will end with ".1", the second one will end with ".2", and so on)
Since RollingFileHandler extends RotatingFileHandler you can simply replace RotatingFileHandler for RollingFileHandler in your code, you don't need to provide the backupCount argument since it is ignored by this new class.
Bonus solution: (compressing backups)
Since you will have an ever growing amount of log backups, you may want to compress them to save disk space. So you could create a class similar to RollingFileHandler:
from logging.handlers import RotatingFileHandler
import gzip
import itertools
import os
import shutil
class RollingGzipFileHandler(RotatingFileHandler):
# override
def doRollover(self):
if self.stream:
self.stream.close()
self.stream = None
# my code starts here
for i in itertools.count(1):
nextName = "%s.%d.gz" % (self.baseFilename, i)
if not os.path.exists(nextName):
with open(self.baseFilename, 'rb') as original_log:
with gzip.open(nextName, 'wb') as gzipped_log:
shutil.copyfileobj(original_log, gzipped_log)
os.remove(self.baseFilename)
break
# my code ends here
if not self.delay:
self.stream = self._open()
This class will save your compressed backups with extensions ".1.gz", ".2.gz", etc. Also there are other compression algorithms available in the standard library if you don't want to use gzip.
This is an old question, but hope this help.
Try having a different base filename, every time like the following and have backupCount as 0.
import datetime
timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S%f")
data_file = "my_log_%s.log" % timestamp

Logging module: too many open file descriptors

I am using Python logging module to print logs to a file, but I encountered the issue that "too many open file descriptors", I did remember to close the log file handlers, but the issue was still there.
Below is my code
class LogService(object):
__instance = None
def __init__(self):
self.__logger = logging.getLogger('ddd')
self.__handler = logging.FileHandler('/var/log/ddd/ddd.log')
self.__formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
self.__handler.setFormatter(self.__formatter)
#self.__logger.addHandler(self.__handler)
#classmethod
def getInstance(cls):
if cls.__instance == None:
cls.__instance = LogService()
return cls.__instance
# log Error
def logError(self, msg):
self.__logger.addHandler(self.__handler)
self.__logger.setLevel(logging.ERROR)
self.__logger.error(msg)
# Remember to close the file handler
self.closeHandler()
# log Warning
def logWarning(self, msg):
self.__logger.addHandler(self.__handler)
self.__logger.setLevel(logging.WARNING)
self.__logger.warn(msg)
# Remember to close the file handler
self.closeHandler()
# log Info
def logInfo(self, msg):
self.__logger.addHandler(self.__handler)
self.__logger.setLevel(logging.INFO)
self.__logger.info(msg)
# Remember to close the file handler
self.closeHandler()
def closeHandler(self):
self.__logger.removeHandler(self.__handler)
self.__handler.close()
And after running this code for a while, the following showed that there were too many open file descriptors.
[root#my-centos ~]# lsof | grep ddd | wc -l
11555
No no. The usage is far simpler
import logging
logging.basicConfig()
logger = logging.getLogger("mylogger")
logger.info("test")
logger.debug("test")
In your case you are appending the handler in every logging operation, which is at least overkill.
Check the documentation https://docs.python.org/2/library/logging.html
Each time you log anything, you add another instance of the handler.
Yes, you close it every time. But this just means it takes slightly longer to blow up. Closing it doesn't remove it from the logger.
The first message, you have one handler, so you open one file descriptor and then close it.
The next message, you have two handlers, so you open two file descriptors and close them.
The next message, you open three file descriptors and close them.
And so on, until you're opening more file descriptors than you're allowed to, and you get an error.
To solution is just to not do that.

Read from a log file as it's being written using python

I'm trying to find a nice way to read a log file in real time using python. I'd like to process lines from a log file one at a time as it is written. Somehow I need to keep trying to read the file until it is created and then continue to process lines until I terminate the process. Is there an appropriate way to do this? Thanks.
Take a look at this PDF starting at page 38, ~slide I-77 and you'll find all the info you need. Of course the rest of the slides are amazing, too, but those specifically deal with your issue:
import time
def follow(thefile):
thefile.seek(0,2) # Go to the end of the file
while True:
line = thefile.readline()
if not line:
time.sleep(0.1) # Sleep briefly
continue
yield line
You could try with something like this:
import time
while 1:
where = file.tell()
line = file.readline()
if not line:
time.sleep(1)
file.seek(where)
else:
print line, # already has newline
Example was extracted from here.
As this is Python and logging tagged, there is another possibility to do this.
I assume this is based on a Python logger, logging.Handler based.
You can just create a class that gets the (named) logger instance and overwrite the emit function to put it onto a GUI (if you need console just add a console handler to the file handler)
Example:
import logging
class log_viewer(logging.Handler):
""" Class to redistribute python logging data """
# have a class member to store the existing logger
logger_instance = logging.getLogger("SomeNameOfYourExistingLogger")
def __init__(self, *args, **kwargs):
# Initialize the Handler
logging.Handler.__init__(self, *args)
# optional take format
# setFormatter function is derived from logging.Handler
for key, value in kwargs.items():
if "{}".format(key) == "format":
self.setFormatter(value)
# make the logger send data to this class
self.logger_instance.addHandler(self)
def emit(self, record):
""" Overload of logging.Handler method """
record = self.format(record)
# ---------------------------------------
# Now you can send it to a GUI or similar
# "Do work" starts here.
# ---------------------------------------
# just as an example what e.g. a console
# handler would do:
print(record)
I am currently using similar code to add a TkinterTreectrl.Multilistbox for viewing logger output at runtime.
Off-Side: The logger only gets data as soon as it is initialized, so if you want to have all your data available, you need to initialize it at the very beginning. (I know this is what is expected, but I think it is worth being mentioned.)
Maybe you could do a system call to
tail -f
using os.system()

Categories

Resources