Python logging with rsyslog - python

I've inherited the following python file:
import logging
from logging.handlers import SysLogHandler
class Logger(object):
# Return a logging instance used throughout the library
def __init__(self):
self.logger = logging.getLogger('my_daemon')
# Log an info message
def info(self, message, *args, **kwargs):
self.__log(logging.INFO, message, *args, **kwargs)
# Configure the logger to log to syslog
def log_to_syslog(self):
formatter = logging.Formatter('my_daemon: [%(levelname)s] %(message)s')
handler = SysLogHandler(address='/dev/log', facility=SysLogHandler.LOG_DAEMON)
handler.setFormatter(formatter)
self.logger.addHandler(handler)
self.logger.setLevel(logging.INFO)
I see that the init method looks for a logger called my_daemon, which I can't find anywhere on my system. Do I have to manually create the file and if so where should I put it?
Also, log_to_syslog appears to listen to socket /dev/log, and when I run sudo lsof /dev/log I get the following:
[vagrant#server]$ sudo lsof /dev/log
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rsyslogd 989 root 0u unix 0xffff880037a880c0 0t0 8099 /dev/log
When I look at /etc/rsyslog.conf I see the following:
# rsyslog v5 configuration file
# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none /var/log/messages
So I'm a bit lost here. My init function seems to be instructing python to use a logfile called my_daemon, which I can't find anywhere, and /etc/rsyslog.conf seems to be telling the machine to use /var/log/messages, which does not contain any logs from my app.
Update: here's how I'm trying to log messages
import os
from logger import Logger
class Server(object):
def __init__(self, options):
self.logger = Logger()
def write(self, data):
self.logger.info('Received new data from controller, applying')
print 'hello'
The write method from server.py does print 'hello' to the screen so I know we're getting close, it's just the logger that's not doing what I would expect

my_daemon is not a log file - it is just a developer-specified name indicating an "area" in an application. See this information on what loggers are.
The log_to_syslog is not listening on a socket - the rsyslog daemon is doing that. Also, I don't see where log_to_syslog is called - is it? If it isn't called, that would explain why no messages are being seen in the syslog.

Related

Python module "logging" double output

I want to use the logging module, but I'm having some trouble because it is outputting twice. I've read a lot of posts with people having the same issue and log.propagate = False or `log.handlers.pop()´ fixed it for them. This doesn't work for me.
I have a file called logger.py that looks like this:
import logging
def __init__():
log = logging.getLogger("output")
filelog = logging.FileHandler("output.log")
formatlog = logging.Formatter("%(asctime)s %(levelname)s %(message)s")
filelog.setFormatter(formatlog)
log.addHandler(filelog)
log.setLevel(logging.INFO)
return log
So that I from multiple files can write:
import logger
log = logger.__init__()
But this is giving me issues. So I've seen several solutions, but I don't know how to incorporate it in multiple scripts without defining the logger in all of them.
I found a solution which was really simple. All I needed to do was to add an if statement checking if the handlers already existed. So my logger.py file now looks like this:
import logging
def __init__():
log = logging.getLogger("output")
if not log.handlers:
filelog = logging.FileHandler("output.log")
formatlog = logging.Formatter("%(asctime)s %(levelname)s %(message)s")
filelog.setFormatter(formatlog)
log.addHandler(filelog)
log.setLevel(logging.INFO)
return log
multiple scripts lead to multiple processes. so you unluckily create multiple objects returned from the logger.__init__() function.
usually, you have one script which creates the logger and the different processes,
but as i understand, you like to have multiple scripts logging to the same destination.
if you like to have multiple processes to log to the same destination, i recommend using inter process communication as "named pipes" or otherwise using a UDP/TCP port for logging.
There are also queue modules available in python to send a (atomic) logging entry to be logged in one part (compared to append to a file by multiple processes - which may lead from 111111\n and 22222\n to 11212121221\n\n in the file)
otherwise think about named pipes...
Code snippet for logging server
Note: i just assumed to log everything as error...
import socket
import logging
class mylogger():
def __init__(self, port=5005):
log = logging.getLogger("output")
filelog = logging.FileHandler("output.log")
formatlog = logging.Formatter("%(asctime)s %(levelname)s %(message)s")
filelog.setFormatter(formatlog)
log.addHandler(filelog)
log.setLevel(logging.INFO)
self.log = log
UDP_IP = "127.0.0.1" # localhost
self.port = port
self.UDPClientSocket = socket.socket(family=socket.AF_INET, type=socket.SOCK_DGRAM)
self.UDPClientSocket.bind((UDP_IP, self.port))
def pollReceive(self):
data, addr = self.UDPClientSocket.recvfrom(1024) # buffer size is 1024 bytes
print("received message:", data)
self.log.error( data )
pass
log = mylogger()
while True:
log.pollReceive()
You have to be very carefully by adding new handler to your logger with:
log.addHandler(...)
If you add more than one handler to your logger, you will get more than one outputs. Be aware that this is only truth for logger which are running in the same thread. If you are running a class which is derived from Thread, then it is the same thread. But if you are running a class which is derived from Process, then it is another thread. To ensure that you only have one handler for your logger according to this thread, you should use the following code (this is an example for SocketHandler):
logger_root = logging.getLogger()
if not logger_root.hasHandlers():
socket_handler = logging.handlers.SocketHandler('localhost', logging.handlers.DEFAULT_TCP_LOGGING_PORT)
logger_root.addHandler(socket_handler)

How to define a logger in python once for the whole program?

I want to set up my logger once in my Python project and use that throughout the project.
main.py:
import logging
import test
hostname = {"hostname": socket.gethostname()}
logger = logging.getLogger()
syslog = logging.StreamHandler()
formatter = logging.Formatter("{\"label\":\"%(name)s\", \"level\":\"%(levelname)s\", \"hostname\":\"%(hostname)s\", \"logEntry\": %(message)s, \"timestamp\", \"%(asctime)s\"}")
syslog.setFormatter(formatter)
logger.setLevel(logging.DEBUG)
logger.addHandler(syslog)
logger = logging.LoggerAdapter(logger, hostname)
def entry_point():
logger.debug("entry_point")
test.test_function()
entry_point()
test.py:
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
def test_function():
logger.debug("test_function")
This should give me:
entry_point
test_function
... both formatted with the format provider.
However I actually get an error KeyError: 'hostname' because it would seem the second logger does not know about the hostname format provider. I've also tried initialising both loggers with __name__ but then I get No handlers could be found for logger "test".
Is there a way I can define my logging configuration once and re-use it throughout my application?
A LoggingAdapter is a separate object, it does not replace the result of logging.getLogger() calls. You'll need to store it somewhere that it can be shared between different modules. You could use a separate module that everything else in your project imports from, for example.
I'll detail below how to handle this, but there is also an alternative that doesn't involve adapters at all, and instead uses a filter which is attached to the handler you created, letting you avoid having to deal with adapters altogether. See further down.
I'd separate out configuration and log object handling too; have the main module call the 'setup' function to configure the handlers and log levels, and set up the adapter in the module for others to import:
log.py:
import logging
import socket
def setup_logging():
"""Adds a configured stream handler to the root logger"""
syslog = logging.StreamHandler()
formatter = logging.Formatter(
'{"label":"%(name)s", "level":"%(levelname)s",'
' "hostname":"%(hostname)s", "logEntry": %(message)s,'
' "timestamp", "%(asctime)s"}')
syslog.setFormatter(formatter)
logger = logging.getLogger()
logger.addHandler(syslog)
logger.setLevel(logging.DEBUG)
def host_log_adapter(logger):
hostname = {"hostname": socket.gethostname()}
return logging.LoggerAdapter(logger, hostname)
logger = host_log_adapter(logging.getLogger())
Then in main do:
import log
import test
log.setup_logging()
def entry_point():
log.logger.debug("entry_point")
test.test_function()
entry_point()
and in test:
from log import logger
def test_function():
logger.debug("test_function")
Alternatively, rather than use an adapter, you could add information to log records by (ab)using a filter:
class HostnameInjectingFilter(logging.Filter):
def __init__(self):
self.hostname = socket.gethostname()}
def filter(self, record):
record.hostname = self.hostname
return True
This adds the extra field directly on the record object and always returns True. Then just add this filter to the handler that needs to have the hostname available:
syslog = logging.StreamHandler()
formatter = logging.Formatter(
'{"label":"%(name)s", "level":"%(levelname)s",'
' "hostname":"%(hostname)s", "logEntry": %(message)s,'
' "timestamp", "%(asctime)s"}')
syslog.setFormatter(formatter)
syslog.setFilter(HostnameInjectingFilter())
This removes the need to use an adapter entirely, so you can remove the host_log_adapter() function entirely and just use logging.getLogger() everywhere.

Logging with WSGI server and flask application

I am using wsgi server to spawn the servers for my web application. I am having problem with information logging.
This is how I am running the app
from gevent import monkey; monkey.patch_all()
from logging.handlers import RotatingFileHandler
import logging
from app import app # this imports app
# create a file to store weblogs
log = open(ERROR_LOG_FILE, 'w'); log.seek(0); log.truncate();
log.write("Web Application Log\n"); log.close();
log_handler = RotatingFileHandler(ERROR_LOG_FILE, maxBytes =1000000, backupCount=1)
formatter = logging.Formatter(
"[%(asctime)s] {%(pathname)s:%(lineno)d} %(levelname)s - %(message)s"
)
log_handler.setFormatter(formatter)
app.logger.setLevel(logging.DEBUG)
app.logger.addHandler(log_handler)
# run the application
server= wsgi.WSGIServer(('0.0.0.0', 8080), app)
server.serve_forever()
However, on running the application it is not logging anything. I guess it must be because of WSGI server because app.logger works in the absence of WSGI. How can I log information when using WSGI?
According to the gevent uwsgi documentation you need to pass your log handler object to the WSGIServer object at creation:
log – If given, an object with a write method to which request (access) logs will be written. If not given, defaults to sys.stderr. You may pass None to disable request logging. You may use a wrapper, around e.g., logging, to support objects that don’t implement a write method. (If you pass a Logger instance, or in general something that provides a log method but not a write method, such a wrapper will automatically be created and it will be logged to at the INFO level.)
error_log – If given, a file-like object with write, writelines and flush methods to which error logs will be written. If not given, defaults to sys.stderr. You may pass None to disable error logging (not recommended). You may use a wrapper, around e.g., logging, to support objects that don’t implement the proper methods. This parameter will become the value for wsgi.errors in the WSGI environment (if not already set). (As with log, wrappers for Logger instances and the like will be created automatically and logged to at the ERROR level.)
so you should be able to do wsgi.WSGIServer(('0.0.0.0', 8080), app, log=app.logger)
You can log like this -
import logging
import logging.handlers as handlers
.
.
.
logger = logging.getLogger('MainProgram')
logger.setLevel(10)
logHandler = handlers.RotatingFileHandler(filename.log,maxBytes =1000000, backupCount=1)
logger.addHandler(logHandler)
logger.info("Logging configuration done")
.
.
.
# run the application
server= wsgi.WSGIServer(('0.0.0.0', 8080), app, log=logger)
server.serve_forever()

How do I log from my Python Spark script

I have a Python Spark program which I run with spark-submit. I want to put logging statements in it.
logging.info("This is an informative message.")
logging.debug("This is a debug message.")
I want to use the same logger that Spark is using so that the log messages come out in the same format and the level is controlled by the same configuration files. How do I do this?
I've tried putting the logging statements in the code and starting out with a logging.getLogger(). In both cases I see Spark's log messages but not mine. I've been looking at the Python logging documentation, but haven't been able to figure it out from there.
Not sure if this is something specific to scripts submitted to Spark or just me not understanding how logging works.
You can get the logger from the SparkContext object:
log4jLogger = sc._jvm.org.apache.log4j
LOGGER = log4jLogger.LogManager.getLogger(__name__)
LOGGER.info("pyspark script logger initialized")
You need to get the logger for spark itself, by default getLogger() will return the logger for you own module. Try something like:
logger = logging.getLogger('py4j')
logger.info("My test info statement")
It might also be 'pyspark' instead of 'py4j'.
In case the function that you use in your spark program (and which does some logging) is defined in the same module as the main function it will give some serialization error.
This is explained here and an example by the same person is given here
I also tested this on spark 1.3.1
EDIT:
To change logging from STDERR to STDOUT you will have to remove the current StreamHandler and add a new one.
Find the existing Stream Handler (This line can be removed when finished)
print(logger.handlers)
# will look like [<logging.StreamHandler object at 0x7fd8f4b00208>]
There will probably only be a single one, but if not you will have to update position.
logger.removeHandler(logger.handlers[0])
Add new handler for sys.stdout
import sys # Put at top if not already there
sh = logging.StreamHandler(sys.stdout)
sh.setLevel(logging.DEBUG)
logger.addHandler(sh)
We needed to log from the executors, not from the driver node. So we did the following:
We created a /etc/rsyslog.d/spark.conf on all of the nodes (using a Bootstrap method with Amazon Elastic Map Reduceso that the Core nodes forwarded sysloglocal1` messages to the master node.
On the Master node, we enabled the UDP and TCP syslog listeners, and we set it up so that all local messages got logged to /var/log/local1.log.
We created a Python logging module Syslog logger in our map function.
Now we can log with logging.info(). ...
One of the things we discovered is that the same partition is being processed simultaneously on multiple executors. Apparently Spark does this all the time, when it has extra resources. This handles the case when an executor is mysteriously delayed or fails.
Logging in the map functions has taught us a lot about how Spark works.
In my case, I am just happy to get my log messages added to the workers stderr, along with the usual spark log messages.
If that suits your needs, then the trick is to redirect the particular Python logger to stderr.
For example, the following, inspired from this answer, works fine for me:
def getlogger(name, level=logging.INFO):
import logging
import sys
logger = logging.getLogger(name)
logger.setLevel(level)
if logger.handlers:
# or else, as I found out, we keep adding handlers and duplicate messages
pass
else:
ch = logging.StreamHandler(sys.stderr)
ch.setLevel(level)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)
return logger
Usage:
def tst_log():
logger = getlogger('my-worker')
logger.debug('a')
logger.info('b')
logger.warning('c')
logger.error('d')
logger.critical('e')
...
Output (plus a few surrounding lines for context):
17/05/03 03:25:32 INFO MemoryStore: Block broadcast_24 stored as values in memory (estimated size 5.8 KB, free 319.2 MB)
2017-05-03 03:25:32,849 - my-worker - INFO - b
2017-05-03 03:25:32,849 - my-worker - WARNING - c
2017-05-03 03:25:32,849 - my-worker - ERROR - d
2017-05-03 03:25:32,849 - my-worker - CRITICAL - e
17/05/03 03:25:32 INFO PythonRunner: Times: total = 2, boot = -40969, init = 40971, finish = 0
17/05/03 03:25:32 INFO Executor: Finished task 7.0 in stage 20.0 (TID 213). 2109 bytes result sent to driver
import logging
# Logger
logging.basicConfig(format='%(asctime)s %(filename)s %(funcName)s %(lineno)d %(message)s')
logger = logging.getLogger('driver_logger')
logger.setLevel(logging.DEBUG)
Simplest way to log from pyspark !
The key of interacting pyspark and java log4j is the jvm.
This below is python code, the conf is missing the url, but this is about logging.
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
my_jars = os.environ.get("SPARK_HOME")
myconf = SparkConf()
myconf.setMaster("local").setAppName("DB2_Test")
myconf.set("spark.jars","%s/jars/log4j-1.2.17.jar" % my_jars)
spark = SparkSession\
.builder\
.appName("DB2_Test")\
.config(conf = myconf) \
.getOrCreate()
Logger= spark._jvm.org.apache.log4j.Logger
mylogger = Logger.getLogger(__name__)
mylogger.error("some error trace")
mylogger.info("some info trace")
You can implement the logging.Handler interface in a class that forwards log messages to log4j under Spark. Then use logging.root.addHandler() (and, optionally, logging.root.removeHandler()) to install that handler.
The handler should have a method like the following:
def emit(self, record):
"""Forward a log message for log4j."""
Logger = self.spark_session._jvm.org.apache.log4j.Logger
logger = Logger.getLogger(record.name)
if record.levelno >= logging.CRITICAL:
# Fatal and critical seem about the same.
logger.fatal(record.getMessage())
elif record.levelno >= logging.ERROR:
logger.error(record.getMessage())
elif record.levelno >= logging.WARNING:
logger.warn(record.getMessage())
elif record.levelno >= logging.INFO:
logger.info(record.getMessage())
elif record.levelno >= logging.DEBUG:
logger.debug(record.getMessage())
else:
pass
Installing the handler should go immediately after you initialise your Spark session:
spark = SparkSession.builder.appName("Logging Example").getOrCreate()
handler = CustomHandler(spark_session)
# Replace the default handlers with the log4j forwarder.
root_handlers = logging.root.handlers[:]
for h in self.root_handlers:
logging.root.removeHandler(h)
logging.root.addHandler(handler)
# Now you can log stuff.
logging.debug("Installed log4j log handler.")
There's a more complete example here: https://gist.github.com/thsutton/65f0ec3cf132495ef91dc22b9bc38aec
You need to make spark log is reachable for driver and all executors so we have create logging class and deal it as job dependency and load it on each executors.
class Log4j:
def __init__(this, spark_session):
conf = spark_session.SparkContext.getConf()
app_id = conf.get('spark.app.id')
app_name = conf.get('spark.app.name')
log4jlogger = spark_session._jvm.org.apache.log4j
prefix_msg = '<'+app_id + ' : ' + app_name +'> '
print(prefix_msg)
self.logger = log4jlogger.logManager.getLogger(prefix_msg)
def warn(this, msg):
# log warning
self.logger.warn(msg)
def error(this, msg):
#log error
self.logger.error(msg)
def info(this, msg):
# log information message
self.logger.info(msg)

Python logging module doesn't work within installed windows service

Why is it that calls to logging framework within a python service do not produce output to the log (file, stdout,...)?
My python service has the general form:
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
fh = logging.FileHandler('out.log')
logger.addHandler(fh)
logger.error("OUTSIDE")
class Service (win32serviceutil.ServiceFramework):
_svc_name_ = "example"
_svc_display_name_ = "example"
_svc_description_ = "example"
def __init__(self,args):
logger.error("NOT LOGGED")
win32serviceutil.ServiceFramework.__init__(self,args)
self.hWaitStop = win32event.CreateEvent(None,0,0,None)
servicemanager.LogMsg(servicemanager.EVENTLOG_INFORMATION_TYPE,
servicemanager.PYS_SERVICE_STARTED,
(self._svc_name_,''))
def SvcStop(self):
self.ReportServiceStatus(win32service.SERVICE_STOP_PENDING)
win32event.SetEvent(self.hWaitStop)
self.stop = True
def SvcDoRun(self):
self.ReportServiceStatus(win32service.SERVICE_RUNNING)
self.main()
def main(self):
# Service Logic
logger.error("NOT LOGGED EITHER")
pass
The first call to logger.error produces output, but not the two inside the service class (even after installing the service and making sure it is running).
I've found that only logging within the actual service loop works with the logging module and the log file ends up in something like C:\python27\Lib\site-packages\win32.
I abandoned logging with the logging module for Windows as it didn't seem very effective. Instead I started to use the Windows logging service, e.g. servicemanager.LogInfoMsg() and related functions. This logs events to the Windows Application log, which you can find in the Event Viewer (start->run->Event Viewer, Windows Logs folder, Application log).
You have to write the full path of the log file.
e.g.
fh = logging.FileHandler('C:\\out.log')
actually outside logger initialized twice.
these 2 outside loggers are in different processes. one is the python process and another one is windows-service process.
for some reason, the 2nd one didn't configured success, and inside loggers in this process too. that's why u cant find the inside logs.

Categories

Resources