I'd like to create a log for a Python script execution. For example:
import pandas as pd
data = pd.read_excel('example.xlsx')
data.head()
How can I create a log for this script un order to know who run the script, when was executed, when did it finish. And ir for example, suppossing I take a sample of the df, how can I make to create a seed so I can share it to another person to execute it and have the same result?
You could use the logging module that comes by default with Python.
You'll have to add a few extra lines of code to configure it to log the information you require (time of execution and user executing the script) and specify a file name where the log messages should be stored at.
In respect to adding the information of "who" ran the script, it will depend on how you want to differentiate users. If your script is intended to be executed on some server, you might want to differentiate users by their IP addresses. Another solution is to use the getpass module, like I did in the example below.
Finally, when generating a sample from data, you can set an integer as seed to the parameter random_state to make the sample always contain the same rows.
Here's a modified version of your script with the previously mentioned changes:
# == Necessary Imports =========================================================
import logging
import pandas as pd
import getpass
# == Script Configuration ======================================================
# Set a seed to enable reproducibility
SEED = 1
# Get the username of the person who is running the script.
USERNAME = getpass.getuser()
# Set a format to the logs.
LOG_FORMAT = '[%(levelname)s | ' + USERNAME + ' | %(asctime)s] - %(message)s'
# Name of the file to store the logs.
LOG_FILENAME = 'script_execution.log'
# Level in which messages are to be logged. Logging, by default has the
# following levels, ordered by ranking of severity:
# 1. DEBUG: detailed information, useful only when diagnosing a problem.
# 2. INFO: message that confirms that everything is working as it should.
# 3. WARNING: message with information that requires user attention
# 4. ERROR: an error has occurred and script is unable to perform some function.
# 5. CRITICAL: serious error occurred and script may stop running properly.
LOG_LEVEL = logging.INFO
# When you set the level, all messages from a higher level of severity are also
# logged. For example, when you set the log level to `INFO`, all `WARNING`,
# `ERROR` and `CRITICAL` messages are also logged, but `DEBUG` messages are not.
# == Set up logging ============================================================
logging.basicConfig(
level=LOG_LEVEL,
format=LOG_FORMAT,
force=True,
datefmt="%Y-%m-%d %H:%M:%S",
handlers=[logging.FileHandler(LOG_FILENAME, "a", "utf-8"),
logging.StreamHandler()]
)
# == Script Start ==============================================================
# Log the script execution start
logging.info('Script started execution!')
# Read data from the Excel file
data = pd.read_excel('example.xlsx')
# Retrieve a sample with 50% of the rows from `data`.
# When a `random_state` is set, `pd.DataFrame.sample` will always return
# the same dataframe, given that `data` doesn't change.
sample_data = data.sample(frac=0.5, random_state=SEED)
# Other stuff
# ...
# Log when the script finishes execution
logging.info('Script finished execution!')
Running the above code prints to the console the following messages:
[INFO | erikingwersen | 2023-02-13 23:17:14] - Script started execution!
[INFO | erikingwersen | 2023-02-13 23:17:14] - Script finished execution!
It also creates or updates a file named 'script_execution.log', located at the same directory as the script with the same information that gets printed to the console.
To create a log
You could use python's standard logging moudle.
Logging HOWTO — Python 3.11.2 documentation
import logging
logging.basicConfig(filename='example.log', encoding='utf-8', level=logging.DEBUG)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')
logging.error('And non-ASCII stuff, too, like Øresund and Malmö')
1.1 To know who ran the script
import getpass
getpass.getuser()
1.2 To know when it ran
FORMAT = '%(asctime)s %(clientip)-15s %(user)-8s %(message)s'
logging.basicConfig(format=FORMAT)
d = {'clientip': '192.168.0.1', 'user': 'fbloggs'}
logger = logging.getLogger('tcpserver')
logger.warning('Protocol problem: %s', 'connection reset', extra=d)
Create a seed so you can share it with another person to execute it and have the same result
You can use a parameter random_state
df['one_col'].sample(n=10, random_state=1)
Related
so I am doing an analysis where we process a lot of different files using multiprocessing.Pool in order to speed up the process:
with multiprocessing.Pool(processes=num_cores) as p:
output_mp = p.map(clean_file, doc_mp_list_xlsx)
Where read_file is the function to read a file and doc_mp_list is a list of the complete paths.
Logging is determined using the logging module, configured such that it tracks what CPU is used:
import logging # for logging
import multiprocessing # for getting cpu count
from utils import read_config # for reading config
config = read_config.config
num_cores = multiprocessing.cpu_count()
# CONFIGURE LOGGING FOR EACH POOL
for process in range(1, num_cores+1):
handler = logging.FileHandler(config['logging_path'] +
'\\' +
str(process) + '.log')
handler.setFormatter(logging.Formatter(
'%(asctime)s*|*%(levelname)s*|*%(message)s*|*%(module)s*|*%(funcName)s*|*%(lineno)d*|*%(name)s'))
logger = logging.getLogger("SpawnPoolWorker-" + str(process))
logger.setLevel(logging.INFO)
logger.addHandler(handler)
def getLogger(name):
"""
Function to return a logger for the current process
Arguments:
name: ignored argument for backwards compatibility
Returns:
logger: logger for current process
"""
return logging.getLogger(str(multiprocessing.current_process().name))
And this works correctly. However, logging creates duplicate values:
date and time
type
message
module
function
line
worker
2022-12-05 16:42:31,199*
INFO
*Beginning to clean file x.pdf *
clean_pdf
clean_pdf
22
*SpawnPoolWorker-3
2022-12-05 16:42:30,400*
INFO
*Beginning to clean file x.pdf *
clean_pdf
clean_pdf
22
*SpawnPoolWorker-4
I do know understand why this happens. Also, it does not create multiple outputs. It just seems that it is a duplicate message (but with a different worker attached to it).
Does anybody have a clue why this happens? Is the configuration of mp.logging incorrect? Or is it the result of something else?
Thanks in advance.
I think the problem comes from you misunderstanding how logging works in the case of multiprocessing, and in general.
Each process has its own Python, which includes its own logging. In the main process, you configured it. But not in the others. Their own logging never got configured. What you did instead was configuring the main process' logging many times. Each time you provided a different name to a new handler, so that when a message was received in the main process it was handled many times, writing different names. But a log message emitted in the other processes did not get handled at all.
And you sometimes use logging.getLogger("SpawnPoolWorker-" + str(process)), other times logging.getLogger(str(multiprocessing.current_process().name)), so that you may not even use the same logger' objects.
I am not surprised it does not work. Multi-process code is harder than usually expected.
What you need to do is to include the setup of the logging in each process (for example at the start of clean_file) instead, in a multiprocess-compatible way.
The code I used to reproduce (does not include the code for the solution) :
import logging
import multiprocessing
num_cores = multiprocessing.cpu_count()
for process in range(1, num_cores+1):
handler = logging.FileHandler('log.log') # modified the file path
handler.setFormatter(logging.Formatter(
'%(asctime)s*|*%(levelname)s*|*%(message)s*|*%(module)s*|*%(funcName)s*|*%(lineno)d*|*%(name)s'))
logger = logging.getLogger("SpawnPoolWorker-" + str(process))
logger.setLevel(logging.INFO)
logger.addHandler(handler)
def getLogger():
"""
Function to return a logger for the current process
Arguments:
name: ignored argument for backwards compatibility
Returns:
logger: logger for current process
"""
return logging.getLogger(str(multiprocessing.current_process().name))
def clean_file(something): # FAKE
getLogger().debug(f"calling clean_file with {something=!r}")
doc_mp_list_xlsx = [1, 2, 3, 4, 5] # FAKE
with multiprocessing.Pool(processes=num_cores) as p:
output_mp = p.map(clean_file, doc_mp_list_xlsx)
I'm using the python logging module. How can I get all of the previously outputted logs that have been written-out by logger since the application started?
Let's say I have some large application. When the application first starts, it sets-up logging with something like this
import loggging
logging.basicConfig(
filename = '/path/to/log/file.log,
filemode = 'a',
format = '%(asctime)s,%(msecs)d %(name)s %(levelname)s %(message)s',
datefmt = '%H:%M:%S',
level = logging.DEBUG
)
logging.info("===============================================================================")
logging.info( "INFO: Starting Application Logging" )
A few hours or days later I want to be able to get the history of all of the log messages ever written by logging and store that to a variable.
How can I get the history of messages written by the python logging module since the application started?
If you're writing to a log file, then you can simply get the path to the log file and then read its contents
import logging
logger = logging.getLogger( __name__ )
# attempt to get debug log contents
if logger.root.hasHandlers():
logfile_path = logger.root.handlers[0].baseFilename
with open(logfile_path) as log_file:
log_history = log_file.read()
I made a little game as my school project & i want to save critical errors to my .log file that I had created using logging module, but i just can't figure out how to log them.
Logging configuration:
logging.basicConfig(level=logging.DEBUG, filename='log.log', filemode='w', format='%(asctime)s - %(name)s:%(levelname)s: %(message)s')
What i want to do:
Every time a program will encounter exception, that can't be handled (like TypeError or SyntaxError) it will be saved in log.log file. Then program will exit. But it usually just close program and pritn error in terminal, which is not intended.
I have tried to use:
import sys
class ShutdownHandler(logging.Handler):
def emit(self, record):
logging.critical(record.msg)
logging.shutdown()
sys.exit(1)
logging.basicConfig(level=logging.DEBUG, filename='log.log', filemode='w', format='%(asctime)s - %(name)s:%(levelname)s: %(message)s')
But it didn't saved output to log.log file. Instead i have got error message in terminal.
As discussed above you can use the following to record any errors which will be returned to you in the terminal :
import subprocess
a = subprocess.run(["python3","hello.py"], stdout=subprocess.PIPE)
print(a.stdout.decode('utf-8'))
Once you have the output, you can search for errors ad log your experiments accordingly .
How can I log everything using Python 'logging' to 1 text file, over multiple modules?
Main.py:
import logging
logging.basicConfig(format='localhost - - [%(asctime)s] %(message)s', level=logging.DEBUG)
log_handler = logging.handlers.RotatingFileHandler('debug.out', maxBytes=2048576)
log = logging.getLogger('logger')
log.addHandler(log_handler)
import test
Test.py:
import logging
log = logging.getLogger('logger')
log.error('test')
debug.out stays empty. I'm not sure what to try next, even after reading the logging documentation.
Edit: Fixed with the code above.
Set the correct logging level (at least ERROR if you want to get all messages with level ERROR or higher) and add a handler to write all messages into a file. For more details have a look at https://docs.python.org/2/howto/logging-cookbook.html.
I wrote a small function to log events to a file. This python script is imported in the main script. The mainscript runs as a daemon (actually it is polling a database).
MainScript.py:
import logger
logger.logmessage(module = module, message = "SomeMessage")
logger.py:
def logmessage(message, module, level = 'INFO'):
today = str(datetime.date.today())
logFile = '/path/to/log/myapplog.'+today+'.log'
logging.basicConfig(format='%(asctime)s - %(levelname)s - '+ module + ' - %(message)s',level=logging.INFO,filename=logFile)
if level is "INFO":
logging.info(message)
elif level is "WARNING":
logging.warning(message)
elif level is "CRITICAL":
logging.critical(message)
My intention: get logfiles like myapplog.2014-01-23.log, 2014-01-24.log, ...
My proplem: the logfile stays the same. It constantly logs to myapplog.2014-01-23.log and only after a restart of the daemon, the proper log with correct date is created and used.
It sounds like you need to use TimedRotatingFileHandler as documented here.
Also, you shouldn't call basicConfig() more than once (I presume you're calling logmessage more than once). As documented, basicConfig() won't do anything except set up a basic configuration if there is none (so only the first call does anything - subsequent calls find there is a configuration, so don't do anything).