Python logging failes with log-file on network drive (windows 10) - python

I want to log using python's logging module to a file on a network drive. My problem is that the logging fails at some random point giving me this error:
--- Logging error ---
Traceback (most recent call last):
File "c:\programme\anaconda3\lib\logging\__init__.py", line 1085, in emit
self.flush()
File "c:\programme\anaconda3\lib\logging\__init__.py", line 1065, in flush
self.stream.flush()
OSError: [Errno 22] Invalid argument
Call stack:
File "log_test.py", line 67, in <module>
logger_root.error('FLUSH!!!'+str(i))
Message: 'Minute:120'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "c:\programme\anaconda3\lib\logging\__init__.py", line 1085, in emit
self.flush()
File "c:\programme\anaconda3\lib\logging\__init__.py", line 1065, in flush
self.stream.flush()
OSError: [Errno 22] Invalid argument
Call stack:
File "log_test.py", line 67, in <module>
logger_root.error('FLUSH!!!'+str(i))
Message: 'FLUSH!!!120'
Arguments: ()
I am on a virtual machine with Windows 10 (Version 1909) and I am using Python 3.8.3 and logging 0.5.1.2. The script runs in an virtual environment on a network drive, where the log files are stored.
I am writing a script that is automating some data quality control tasks and I am not 100% sure, where (network drive, local drive, etc.) the script will end up on, so it should be able to log in every possible situation. The error does not appear at the same position/line in the script but randomly. Sometimes the program (~120 minutes in total) finishes without the error appearing at all.
What I tried so far:
I believe that the logfile is closed at some point so that no new logging messages can be written to it. I wrote a simple script that basically only does logs to check if it is related to my original script or the logging process itself. Since the "only-logs-script" also fails randomly, when running on the network drive but not when it is running on my local drive, I assume that it is related to the connection to the network drive. I thought about having the whole logging stored in the memory and then written to the file but the MemoryHandler will also open the file at the beginning of the script and therefore fail at some point.
Here is my code for the "only-logs-script" (log_test.py):
import logging
import logging.handlers
import os
import datetime
import time
##################################################################
# setting up a logger to create a log file with information about this programm
logfile_dir = 'logfiles_test'
CHECK_FOLDER = os.path.isdir(logfile_dir)
# if folder doesn't exist, create it
if not CHECK_FOLDER:
os.makedirs(logfile_dir)
print("created folder : ", logfile_dir)
log_path = '.\\'+logfile_dir+'\\'
Current_Date = datetime.datetime.today().strftime ('%Y-%m-%d_')
log_filename = log_path+Current_Date+'logtest.log'
print(log_filename)
# Create a root logger
logger_root = logging.getLogger()
# Create handlers
f1_handler = logging.FileHandler(log_filename, mode='w+')
f2_handler = logging.StreamHandler()
f1_handler.setLevel(logging.INFO)
f2_handler.setLevel(logging.INFO)
# Create formatters and add it to handlers
f1_format = logging.Formatter('%(asctime)s | %(name)s | %(levelname)s | %(message)s \n')
f2_format = logging.Formatter('%(asctime)s | %(name)s | %(levelname)s | %(message)s \n')
f1_handler.setFormatter(f1_format)
f2_handler.setFormatter(f2_format)
# create a memory handler
memoryhandler = logging.handlers.MemoryHandler(
capacity=1024*100,
flushLevel=logging.ERROR,
target=f1_handler,
flushOnClose=True
)
# Add handlers to the logger
logger_root.addHandler(memoryhandler)
logger_root.addHandler(f2_handler)
logger_root.setLevel(logging.INFO)
logger_root.info('Log-File initiated.')
fname = log_path+'test.log'
open(fname, mode='w+')
for i in range(60*4):
print(i)
logger_root.warning('Minute:'+str(i))
print('Write access:', os.access(fname, os.W_OK))
if(i%10==0):
logger_root.error('FLUSH!!!'+str(i))
time.sleep(60)
Is there something horribly wrong with my logging process or is it because of the network drive? And does anyone of you have any ideas on how to tackle this issue? Would storing the whole information in the memory and writing it to a file in the end solve the problem? How would I best achieve this?
Another idea would be to log on the local drive and then automatically copy the file to the network drive, when the script is done. Any help is strongly appreciated as I have tried to identify and solve this problem for several days now.
Thank you!

Since this is not really going anywhere atm I will post what I did to "solve" my problem. It is not a satisfactory solution as it fails when the code fails but it is better than not logging at all.
The solution is inspired by the answer to this question: log messages to an array/list with logging
So here is what I did:
import io
#####################################
# first create an in-memory file-like object to save the logs to
log_messages = io.StringIO()
# create a stream handler that saves the log messages to that object
s1_handler = logging.StreamHandler(log_messages)
s1_handler.setLevel(logging.INFO)
# create a file handler just in case
f1_handler = logging.FileHandler(log_filename, mode='w+')
f1_handler.setLevel(logging.INFO)
# set the format for the log messages
log_format = '%(asctime)s | %(name)s | %(levelname)s | %(message)s \n'
f1_format = logging.Formatter(log_format)
s1_handler.setFormatter(f1_format)
f1_format = logging.Formatter(log_format)
# add the handler to the logger
logger_root.addHandler(s1_handler)
logger_root.addHandler(f1_handler)
#####################################
# here would be the main code ...
#####################################
# at the end of my code I added this to write the in-memory-message to the file
contents = log_messages.getvalue()
# opening a file in 'w'
file = open(log_filename, 'w')
# write log message to file
file.write("{}\n".format(contents))
# closing the file and the in-memory object
file.close()
log_messages.close()
Obviously this fails when the code fails but the code tries to catch most errors, so I hope it will work. I got rid of the Memory handler but kept a file handler so that in case of a real failure at least some of the logs are recorded until the file handler fails. It is far from ideal but it works for me atm. If you have some other suggestions/improvements I would be happy to hear them!

Related

Python execution log

I'd like to create a log for a Python script execution. For example:
import pandas as pd
data = pd.read_excel('example.xlsx')
data.head()
How can I create a log for this script un order to know who run the script, when was executed, when did it finish. And ir for example, suppossing I take a sample of the df, how can I make to create a seed so I can share it to another person to execute it and have the same result?
You could use the logging module that comes by default with Python.
You'll have to add a few extra lines of code to configure it to log the information you require (time of execution and user executing the script) and specify a file name where the log messages should be stored at.
In respect to adding the information of "who" ran the script, it will depend on how you want to differentiate users. If your script is intended to be executed on some server, you might want to differentiate users by their IP addresses. Another solution is to use the getpass module, like I did in the example below.
Finally, when generating a sample from data, you can set an integer as seed to the parameter random_state to make the sample always contain the same rows.
Here's a modified version of your script with the previously mentioned changes:
# == Necessary Imports =========================================================
import logging
import pandas as pd
import getpass
# == Script Configuration ======================================================
# Set a seed to enable reproducibility
SEED = 1
# Get the username of the person who is running the script.
USERNAME = getpass.getuser()
# Set a format to the logs.
LOG_FORMAT = '[%(levelname)s | ' + USERNAME + ' | %(asctime)s] - %(message)s'
# Name of the file to store the logs.
LOG_FILENAME = 'script_execution.log'
# Level in which messages are to be logged. Logging, by default has the
# following levels, ordered by ranking of severity:
# 1. DEBUG: detailed information, useful only when diagnosing a problem.
# 2. INFO: message that confirms that everything is working as it should.
# 3. WARNING: message with information that requires user attention
# 4. ERROR: an error has occurred and script is unable to perform some function.
# 5. CRITICAL: serious error occurred and script may stop running properly.
LOG_LEVEL = logging.INFO
# When you set the level, all messages from a higher level of severity are also
# logged. For example, when you set the log level to `INFO`, all `WARNING`,
# `ERROR` and `CRITICAL` messages are also logged, but `DEBUG` messages are not.
# == Set up logging ============================================================
logging.basicConfig(
level=LOG_LEVEL,
format=LOG_FORMAT,
force=True,
datefmt="%Y-%m-%d %H:%M:%S",
handlers=[logging.FileHandler(LOG_FILENAME, "a", "utf-8"),
logging.StreamHandler()]
)
# == Script Start ==============================================================
# Log the script execution start
logging.info('Script started execution!')
# Read data from the Excel file
data = pd.read_excel('example.xlsx')
# Retrieve a sample with 50% of the rows from `data`.
# When a `random_state` is set, `pd.DataFrame.sample` will always return
# the same dataframe, given that `data` doesn't change.
sample_data = data.sample(frac=0.5, random_state=SEED)
# Other stuff
# ...
# Log when the script finishes execution
logging.info('Script finished execution!')
Running the above code prints to the console the following messages:
[INFO | erikingwersen | 2023-02-13 23:17:14] - Script started execution!
[INFO | erikingwersen | 2023-02-13 23:17:14] - Script finished execution!
It also creates or updates a file named 'script_execution.log', located at the same directory as the script with the same information that gets printed to the console.
To create a log
You could use python's standard logging moudle.
Logging HOWTO — Python 3.11.2 documentation
import logging
logging.basicConfig(filename='example.log', encoding='utf-8', level=logging.DEBUG)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')
logging.error('And non-ASCII stuff, too, like Øresund and Malmö')
1.1 To know who ran the script
import getpass
getpass.getuser()
1.2 To know when it ran
FORMAT = '%(asctime)s %(clientip)-15s %(user)-8s %(message)s'
logging.basicConfig(format=FORMAT)
d = {'clientip': '192.168.0.1', 'user': 'fbloggs'}
logger = logging.getLogger('tcpserver')
logger.warning('Protocol problem: %s', 'connection reset', extra=d)
Create a seed so you can share it with another person to execute it and have the same result
You can use a parameter random_state
df['one_col'].sample(n=10, random_state=1)

Python File Handler with Rotating Content of File

I have written a simple logging program that attaches anything I send to it to a file:
def log(message):
with open ("log.txt", 'a+') as f:
f.write(message + "\n")
However, I would like to limit how big this file gets. When it gets to the maximum size, I would like for it to remove the first lines and append at the bottom.
Is this possible with a file handler or do I need to code it myself? I am also fine using a rotating file handler, but all the examples I have seen let the environment write exceptions automatically after setting a level, and I need to control what is written to the file.
Many thanks in advance!
This is an example of using python's built in RotatingFileHandler:
import logging
from logging.handlers import RotatingFileHandler
# change to a file you want to log to
logFile = 'log_r.log'
my_handler = RotatingFileHandler(logFile, mode='a', maxBytes=5*1024*1024,
backupCount=2, encoding=None, delay=0)
my_handler.setLevel(logging.INFO)
app_log = logging.getLogger('root')
app_log.setLevel(logging.INFO)
app_log.addHandler(my_handler)
def bad():
raise Exception("Something bad")
if __name__ == "__main__":
app_log.info("something")
try:
app_log.info("trying to run bad")
bad()
except Exception as e:
app_log.info("That was bad...")
finally:
app_log.info("Ran bad...")
The behaviour is slightly different to your proposed behaviour as it doesn't delete from the start of the file, instead moving the file to a different filename and starting from scratch.
Note that the only things that show in the log file when you run this are the pieces of text we're logging explicitly - i.e. no system junk you don't want.

Create log file named after filename of caller script

I have a logger.py file which initialises logging.
import logging
logger = logging.getLogger(__name__)
def logger_init():
import os
import inspect
global logger
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
logger.addHandler(ch)
fh = logging.FileHandler(os.getcwd() + os.path.basename(__file__) + ".log")
fh.setLevel(level=logging.DEBUG)
logger.addHandler(fh)
return None
logger_init()
I have another script caller.py that calls the logger.
from logger import *
logger.info("test log")
What happens is a log file called logger.log will be created containing the logged messages.
What I want is the name of this log file to be named after the caller script filename. So, in this case, the created log file should have the name caller.log instead.
I am using python 3.7
It is immensely helpful to consolidate logging to one location. I learned this the hard way. It is easier to debug when events are sorted by time and it is thread-safe to log to the same file. There are solutions for multiprocessing logging.
The log format can, then, contain the module name, function name and even line number from where the log call was made. This is invaluable. You can find a list of attributes you can include automatically in a log message here.
Example format:
format='[%(asctime)s] [%(module)s.%(funcName)s] [%(levelname)s] %(message)s
Example log message
[2019-04-03 12:29:48,351] [caller.work_func] [INFO] Completed task 1.
You can get the filename of the main script from the first item in sys.argv, but if you want to get the caller module not the main script, check the answers on this question.

python: logging all errors into single log file

I am coding a tool in python and I want to put all the errors -and only the errors-(computations which didn't go through as they should have) into a single log file. Additionally I would want to have a different text in the error log file for each section of my code in order to make the error log file easy to interpret. How do I code this? Much appreciation for who could help with this!
Check out the python module logging. This is a core module for unifying logging not only in your own project but potentially in third party modules too.
For a minimal logging file example, this is taken directly from the documentation:
import logging
logging.basicConfig(filename='example.log',level=logging.DEBUG)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')
Which results in the contents of example.log:
DEBUG:root:This message should go to the log file
INFO:root:So should this
WARNING:root:And this, too
However, I personally recommend using the yaml configuration method (requires pyyaml):
#logging_config.yml
version: 1
disable_existing_loggers: False
formatters:
standard:
format: '%(asctime)s [%(levelname)s] %(name)s - %(message)s'
handlers:
console:
class: logging.StreamHandler
level: INFO
formatter: standard
stream: ext://sys.stdout
file:
class: logging.FileHandler
level: DEBUG
formatter: standard
filename: output.log
email:
class: logging.handlers.SMTPHandler
level: WARNING
mailhost: smtp.gmail.com
fromaddr: to#address.co.uk
toaddrs: to#address.co.uk
subject: Oh no, something's gone wrong!
credentials: [email, password]
secure: []
root:
level: DEBUG
handlers: [console, file, email]
propagate: True
Then to use, for example:
import logging.config
import yaml
with open('logging_config.yml', 'r') as config:
logging.config.dictConfig(yaml.safe_load(config))
logger = logging.getLogger(__name__)
logger.info("This will go to the console and the file")
logger.debug("This will only go to the file")
logger.error("This will go everywhere")
try:
list = [1, 2, 3]
print(list[10])
except IndexError:
logger.exception("This will also go everywhere")
This prints:
2018-07-18 13:29:21,434 [INFO] __main__ - This will go to the console and the file
2018-07-18 13:29:21,434 [ERROR] __main__ - This will go everywhere
2018-07-18 13:29:21,434 [ERROR] __main__ - This will also go everywhere
Traceback (most recent call last):
File "C:/Users/Chris/Desktop/python_scratchpad/a.py", line 16, in <module>
print(list[10])
IndexError: list index out of range
While the contents of the log file is:
2018-07-18 13:35:55,423 [INFO] __main__ - This will go to the console and the file
2018-07-18 13:35:55,424 [DEBUG] __main__ - This will only go to the file
2018-07-18 13:35:55,424 [ERROR] __main__ - This will go everywhere
2018-07-18 13:35:55,424 [ERROR] __main__ - This will also go everywhere
Traceback (most recent call last):
File "C:/Users/Chris/Desktop/python_scratchpad/a.py", line 15, in <module>
print(list[10])
IndexError: list index out of range
Of course, you can add or remove handlers, formatters, etc, or do all of this in code (see the Python documentation) but this is my starting point whenever I use logging in a project. I find it helpful to have the configuration in a dedicated config file rather than polluting my project with defining logging in code.
If I understand the question correctly, the request was to capture only the errors in a dedicated log file, and I would do that differently.
I would stick to the BKM that all modules in the package define their own logger objects (logger = logging.getLogger(__name__)).
I'd let them be without any handlers and whenever they will emit they will look up the hierarchy tree for handlers to actually take care of the emitted messages.
At the root logger, I would add a dedicated FileHandler(filename='errors.log') and I would set the log level of that handler to logging.ERROR.
That means, whenever a logger from the package will emit something, this dedicated file-handler will discard anything below ERROR and will log into the files only ERROR and CRITICAL messages.
You could still add global StreamHandler and regular FileHandler to your root logger. Since you'll not change their log levels, they will be set to logging.NOTSET and will log everything that is emitted from the loggers in the package.
And to answer the second part of the question, the logger handlers can define their own formatting. So for the handler that handles only the errors, you could set the formatter to something like this: %(name)s::%(funcName)s:%(lineno)d - %(message)s which basically means, it will print:
the logger name (and if you used the convention to define loggers in every *.py file using __name__, then name will actually hold the hierarchical path to your module file (e.g. my_pkg.my_sub_pkg.module))
the funcName will hold the function from where the log was emitted and
lineno is the line number in the module file where the log was emitted.

Logging module: too many open file descriptors

I am using Python logging module to print logs to a file, but I encountered the issue that "too many open file descriptors", I did remember to close the log file handlers, but the issue was still there.
Below is my code
class LogService(object):
__instance = None
def __init__(self):
self.__logger = logging.getLogger('ddd')
self.__handler = logging.FileHandler('/var/log/ddd/ddd.log')
self.__formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
self.__handler.setFormatter(self.__formatter)
#self.__logger.addHandler(self.__handler)
#classmethod
def getInstance(cls):
if cls.__instance == None:
cls.__instance = LogService()
return cls.__instance
# log Error
def logError(self, msg):
self.__logger.addHandler(self.__handler)
self.__logger.setLevel(logging.ERROR)
self.__logger.error(msg)
# Remember to close the file handler
self.closeHandler()
# log Warning
def logWarning(self, msg):
self.__logger.addHandler(self.__handler)
self.__logger.setLevel(logging.WARNING)
self.__logger.warn(msg)
# Remember to close the file handler
self.closeHandler()
# log Info
def logInfo(self, msg):
self.__logger.addHandler(self.__handler)
self.__logger.setLevel(logging.INFO)
self.__logger.info(msg)
# Remember to close the file handler
self.closeHandler()
def closeHandler(self):
self.__logger.removeHandler(self.__handler)
self.__handler.close()
And after running this code for a while, the following showed that there were too many open file descriptors.
[root#my-centos ~]# lsof | grep ddd | wc -l
11555
No no. The usage is far simpler
import logging
logging.basicConfig()
logger = logging.getLogger("mylogger")
logger.info("test")
logger.debug("test")
In your case you are appending the handler in every logging operation, which is at least overkill.
Check the documentation https://docs.python.org/2/library/logging.html
Each time you log anything, you add another instance of the handler.
Yes, you close it every time. But this just means it takes slightly longer to blow up. Closing it doesn't remove it from the logger.
The first message, you have one handler, so you open one file descriptor and then close it.
The next message, you have two handlers, so you open two file descriptors and close them.
The next message, you open three file descriptors and close them.
And so on, until you're opening more file descriptors than you're allowed to, and you get an error.
To solution is just to not do that.

Categories

Resources