python logging.Formatter adding additional values to message - python

I've created a method that logs to my syslog server and am using 'logging.Formatter' to format the message. However, it seems that Formatter is adding additional text to my syslog message and I am unsure of how to get rid of it.
Here's the method located in logger.py:
import logging
import logging.handlers
import os
def sendLog(msg):
address = "192.168.1.200"
port = 514
syslogger = logging.getLogger("app_processing LEVEL='{}'".format(os.environ['ENV']))
syslogger.setLevel(logging.DEBUG)
# f = ContextFilter()
# syslogger.addFilter(f)
handler = logging.handlers.SysLogHandler(address=(address, port))
formatter = logging.Formatter("%(asctime)s %(name)s %(message)s", datefmt='%m/%d/%Y %H:%M:%S')
handler.setFormatter(formatter)
syslogger.addHandler(handler)
syslogger.debug(msg)
And I'm calling it from other modules with:
logger.sendLog("FUNCTION='get_inventory_db' MESSAGE='Retrieving inventory information'")
Yet, when it's logged to the server the result is:
2017-10-20 14:48:35 -04:00 192.168.1.101 192.168.1.101 user.debug 10/20/2017 18:48:35 app_processing LEVEL='development' FUNCTION='get_inventory_db' MESSAGE='Retrieving inventory information'
Note the 2017-10-20 14:48:35 -04:00 192.168.1.101 192.168.1.101 user.debug that shows before what should be the start of the syslog message.
I'm expecting output like:
10/20/2017 18:48:35 app_processing LEVEL='development' FUNCTION='get_inventory_db' MESSAGE='Retrieving inventory information'
Am I not using the Formatter correctly?

You do everything completely correct with the formatters & logging library.
The mentioned prefix has nothing to do with the python's logging library. The whole line starting with "10/20/2017 18:48:35 ………" is being sent by python to the syslog.
Whatever is added on top of that, is the syslog's configuration, and should be investigated there, on the syslog's server side. It cannot be controlled from the app and its logging.
The only way you can "fix" these lines is by removing any datetimes from your own log formats, and rely on the syslog's proper prefix.

Related

Python module "logging" double output

I want to use the logging module, but I'm having some trouble because it is outputting twice. I've read a lot of posts with people having the same issue and log.propagate = False or `log.handlers.pop()´ fixed it for them. This doesn't work for me.
I have a file called logger.py that looks like this:
import logging
def __init__():
log = logging.getLogger("output")
filelog = logging.FileHandler("output.log")
formatlog = logging.Formatter("%(asctime)s %(levelname)s %(message)s")
filelog.setFormatter(formatlog)
log.addHandler(filelog)
log.setLevel(logging.INFO)
return log
So that I from multiple files can write:
import logger
log = logger.__init__()
But this is giving me issues. So I've seen several solutions, but I don't know how to incorporate it in multiple scripts without defining the logger in all of them.
I found a solution which was really simple. All I needed to do was to add an if statement checking if the handlers already existed. So my logger.py file now looks like this:
import logging
def __init__():
log = logging.getLogger("output")
if not log.handlers:
filelog = logging.FileHandler("output.log")
formatlog = logging.Formatter("%(asctime)s %(levelname)s %(message)s")
filelog.setFormatter(formatlog)
log.addHandler(filelog)
log.setLevel(logging.INFO)
return log
multiple scripts lead to multiple processes. so you unluckily create multiple objects returned from the logger.__init__() function.
usually, you have one script which creates the logger and the different processes,
but as i understand, you like to have multiple scripts logging to the same destination.
if you like to have multiple processes to log to the same destination, i recommend using inter process communication as "named pipes" or otherwise using a UDP/TCP port for logging.
There are also queue modules available in python to send a (atomic) logging entry to be logged in one part (compared to append to a file by multiple processes - which may lead from 111111\n and 22222\n to 11212121221\n\n in the file)
otherwise think about named pipes...
Code snippet for logging server
Note: i just assumed to log everything as error...
import socket
import logging
class mylogger():
def __init__(self, port=5005):
log = logging.getLogger("output")
filelog = logging.FileHandler("output.log")
formatlog = logging.Formatter("%(asctime)s %(levelname)s %(message)s")
filelog.setFormatter(formatlog)
log.addHandler(filelog)
log.setLevel(logging.INFO)
self.log = log
UDP_IP = "127.0.0.1" # localhost
self.port = port
self.UDPClientSocket = socket.socket(family=socket.AF_INET, type=socket.SOCK_DGRAM)
self.UDPClientSocket.bind((UDP_IP, self.port))
def pollReceive(self):
data, addr = self.UDPClientSocket.recvfrom(1024) # buffer size is 1024 bytes
print("received message:", data)
self.log.error( data )
pass
log = mylogger()
while True:
log.pollReceive()
You have to be very carefully by adding new handler to your logger with:
log.addHandler(...)
If you add more than one handler to your logger, you will get more than one outputs. Be aware that this is only truth for logger which are running in the same thread. If you are running a class which is derived from Thread, then it is the same thread. But if you are running a class which is derived from Process, then it is another thread. To ensure that you only have one handler for your logger according to this thread, you should use the following code (this is an example for SocketHandler):
logger_root = logging.getLogger()
if not logger_root.hasHandlers():
socket_handler = logging.handlers.SocketHandler('localhost', logging.handlers.DEFAULT_TCP_LOGGING_PORT)
logger_root.addHandler(socket_handler)

How do I log from my Python Spark script

I have a Python Spark program which I run with spark-submit. I want to put logging statements in it.
logging.info("This is an informative message.")
logging.debug("This is a debug message.")
I want to use the same logger that Spark is using so that the log messages come out in the same format and the level is controlled by the same configuration files. How do I do this?
I've tried putting the logging statements in the code and starting out with a logging.getLogger(). In both cases I see Spark's log messages but not mine. I've been looking at the Python logging documentation, but haven't been able to figure it out from there.
Not sure if this is something specific to scripts submitted to Spark or just me not understanding how logging works.
You can get the logger from the SparkContext object:
log4jLogger = sc._jvm.org.apache.log4j
LOGGER = log4jLogger.LogManager.getLogger(__name__)
LOGGER.info("pyspark script logger initialized")
You need to get the logger for spark itself, by default getLogger() will return the logger for you own module. Try something like:
logger = logging.getLogger('py4j')
logger.info("My test info statement")
It might also be 'pyspark' instead of 'py4j'.
In case the function that you use in your spark program (and which does some logging) is defined in the same module as the main function it will give some serialization error.
This is explained here and an example by the same person is given here
I also tested this on spark 1.3.1
EDIT:
To change logging from STDERR to STDOUT you will have to remove the current StreamHandler and add a new one.
Find the existing Stream Handler (This line can be removed when finished)
print(logger.handlers)
# will look like [<logging.StreamHandler object at 0x7fd8f4b00208>]
There will probably only be a single one, but if not you will have to update position.
logger.removeHandler(logger.handlers[0])
Add new handler for sys.stdout
import sys # Put at top if not already there
sh = logging.StreamHandler(sys.stdout)
sh.setLevel(logging.DEBUG)
logger.addHandler(sh)
We needed to log from the executors, not from the driver node. So we did the following:
We created a /etc/rsyslog.d/spark.conf on all of the nodes (using a Bootstrap method with Amazon Elastic Map Reduceso that the Core nodes forwarded sysloglocal1` messages to the master node.
On the Master node, we enabled the UDP and TCP syslog listeners, and we set it up so that all local messages got logged to /var/log/local1.log.
We created a Python logging module Syslog logger in our map function.
Now we can log with logging.info(). ...
One of the things we discovered is that the same partition is being processed simultaneously on multiple executors. Apparently Spark does this all the time, when it has extra resources. This handles the case when an executor is mysteriously delayed or fails.
Logging in the map functions has taught us a lot about how Spark works.
In my case, I am just happy to get my log messages added to the workers stderr, along with the usual spark log messages.
If that suits your needs, then the trick is to redirect the particular Python logger to stderr.
For example, the following, inspired from this answer, works fine for me:
def getlogger(name, level=logging.INFO):
import logging
import sys
logger = logging.getLogger(name)
logger.setLevel(level)
if logger.handlers:
# or else, as I found out, we keep adding handlers and duplicate messages
pass
else:
ch = logging.StreamHandler(sys.stderr)
ch.setLevel(level)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)
return logger
Usage:
def tst_log():
logger = getlogger('my-worker')
logger.debug('a')
logger.info('b')
logger.warning('c')
logger.error('d')
logger.critical('e')
...
Output (plus a few surrounding lines for context):
17/05/03 03:25:32 INFO MemoryStore: Block broadcast_24 stored as values in memory (estimated size 5.8 KB, free 319.2 MB)
2017-05-03 03:25:32,849 - my-worker - INFO - b
2017-05-03 03:25:32,849 - my-worker - WARNING - c
2017-05-03 03:25:32,849 - my-worker - ERROR - d
2017-05-03 03:25:32,849 - my-worker - CRITICAL - e
17/05/03 03:25:32 INFO PythonRunner: Times: total = 2, boot = -40969, init = 40971, finish = 0
17/05/03 03:25:32 INFO Executor: Finished task 7.0 in stage 20.0 (TID 213). 2109 bytes result sent to driver
import logging
# Logger
logging.basicConfig(format='%(asctime)s %(filename)s %(funcName)s %(lineno)d %(message)s')
logger = logging.getLogger('driver_logger')
logger.setLevel(logging.DEBUG)
Simplest way to log from pyspark !
The key of interacting pyspark and java log4j is the jvm.
This below is python code, the conf is missing the url, but this is about logging.
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
my_jars = os.environ.get("SPARK_HOME")
myconf = SparkConf()
myconf.setMaster("local").setAppName("DB2_Test")
myconf.set("spark.jars","%s/jars/log4j-1.2.17.jar" % my_jars)
spark = SparkSession\
.builder\
.appName("DB2_Test")\
.config(conf = myconf) \
.getOrCreate()
Logger= spark._jvm.org.apache.log4j.Logger
mylogger = Logger.getLogger(__name__)
mylogger.error("some error trace")
mylogger.info("some info trace")
You can implement the logging.Handler interface in a class that forwards log messages to log4j under Spark. Then use logging.root.addHandler() (and, optionally, logging.root.removeHandler()) to install that handler.
The handler should have a method like the following:
def emit(self, record):
"""Forward a log message for log4j."""
Logger = self.spark_session._jvm.org.apache.log4j.Logger
logger = Logger.getLogger(record.name)
if record.levelno >= logging.CRITICAL:
# Fatal and critical seem about the same.
logger.fatal(record.getMessage())
elif record.levelno >= logging.ERROR:
logger.error(record.getMessage())
elif record.levelno >= logging.WARNING:
logger.warn(record.getMessage())
elif record.levelno >= logging.INFO:
logger.info(record.getMessage())
elif record.levelno >= logging.DEBUG:
logger.debug(record.getMessage())
else:
pass
Installing the handler should go immediately after you initialise your Spark session:
spark = SparkSession.builder.appName("Logging Example").getOrCreate()
handler = CustomHandler(spark_session)
# Replace the default handlers with the log4j forwarder.
root_handlers = logging.root.handlers[:]
for h in self.root_handlers:
logging.root.removeHandler(h)
logging.root.addHandler(handler)
# Now you can log stuff.
logging.debug("Installed log4j log handler.")
There's a more complete example here: https://gist.github.com/thsutton/65f0ec3cf132495ef91dc22b9bc38aec
You need to make spark log is reachable for driver and all executors so we have create logging class and deal it as job dependency and load it on each executors.
class Log4j:
def __init__(this, spark_session):
conf = spark_session.SparkContext.getConf()
app_id = conf.get('spark.app.id')
app_name = conf.get('spark.app.name')
log4jlogger = spark_session._jvm.org.apache.log4j
prefix_msg = '<'+app_id + ' : ' + app_name +'> '
print(prefix_msg)
self.logger = log4jlogger.logManager.getLogger(prefix_msg)
def warn(this, msg):
# log warning
self.logger.warn(msg)
def error(this, msg):
#log error
self.logger.error(msg)
def info(this, msg):
# log information message
self.logger.info(msg)

Unable to log debug messages to the system log in Python

I'm using Python 3.4 on Mac OSX. I have the following code to setup a logger:
LOGGER = logging.getLogger(PROGRAM_NAME)
LOGGER.setLevel(logging.DEBUG)
LOGGER.propagate = False
LOGGER_FH = logging.FileHandler(WORKING_DIR + "/syslog.log", 'a')
LOGGER_FH.setLevel(logging.DEBUG)
LOGGER_FH.setFormatter(logging.Formatter('%(name)s: [%(levelname)s] %(message)s'))
LOGGER.addHandler(LOGGER_FH)
LOGGER_SH = logging.handlers.SysLogHandler(address='/var/run/syslog',
facility=logging.handlers.SysLogHandler.LOG_USER)
LOGGER_SH.setLevel(logging.DEBUG)
LOGGER_SH.setFormatter(logging.Formatter('%(name)s: [%(levelname)s] %(message)s'))
LOGGER.addHandler(LOGGER_SH)
The FileHandler works perfectly, and I'm able to see all expected messages at all logging levels show up in the log. The SysLogHandler doesn't work correctly. I'm unable to see any LOGGER.info() or LOGGER.debug() messages in the syslog output. I can see error and warning messages, but not info or debug. Even tweaking the /etc/syslog.conf file has no effect (even after explicitly reloading the syslog daemon with launchctl). What am I missing here ?
Try: address='/dev/log'
It's a bit confusing from the doc, but "address" is expected to be a unix domain socket, not a file.

Try to write into syslog

I am working in linux and the process rsyslogd is listening to port 514.
The following code can't write into /var/log/syslog.
Is anybody know what is the problem?
import logging
import logging.handlers
root_logger = logging.getLogger()
root_logger.setLevel(config.get_value("log_level"))
syslog_hdlr = SysLogHandler(address='/dev/log', facility=SysLogHandler.LOG_DAEMON)
syslog_hdlr.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(name)s: %(levelname)s %(message)s')
syslog_hdlr.setFormatter(formatter)
root_logger.addHandler(syslog_hdlr)
logger = logging.getLogger("imapcd.daemon")
logger.debug('test')
This code works fine in my system if I make some changes:
import logging.handlers as sh
syslog_hdlr = sh.SysLogHandler(address='/dev/log', facility=sh.SysLogHandler.LOG_DAEMON)
and
root_logger.setLevel(logging.DEBUG)
So check the logging level you are getting from config is not more restrictive than DEBUG (ex: if it is set to INFO no debug messages are printed).
If you still don't see anything on syslog try to use the syslog module and see if you get anything from there:
import syslog
syslog.syslog(syslog.LOG_ERR, "MY MESSAGE")

Syslog messages show up as "Unknown" when I use Python's logging.handlers.SysLogHandler

When I run this on my mac:
import logging.handlers
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
syslog_address = '/var/run/syslog'
logger.addHandler(logging.handlers.SysLogHandler(syslog_address))
logger.error("What the crap?")
It shows up like this in the syslog:
Oct 18 19:02:06 nick Unknown[4294967295] <Error>: What the crap?
Why is it Unknown? Shouldn't it be smart enough to name itself after the script's name?
I think the APP-NAME (which indicates the source) is an optional component in the syslog header. The latest version of SysLogHandler (for Python 3.3) includes support for an APP-NAME (called ident as per the C syslog API), but it's not available in earlier versions. See this Python issue.
If you prepend your script name to all messages, you will get the desired effect. For example,
logger.error('foo: What\'s up?')
will display e.g.
19/10/2011 13:51:17 foo[2147483647] What's up?
in the log.
To get what you need, you should add a formatter to the handler so you don't have to manually format every message yourself.
import logging.handlers
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
syslog_address = '/var/run/syslog'
handler = logging.handlers.SysLogHandler(syslog_address)
# create the formatter
formatter = logging.Formatter('%(name)s: [%(levelname)s] %(message)s')
# plug this formatter into your handler(s)
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.error("What the crap?")
You should now find that you see entries in syslog as you'd expect:
Jul 4 14:34:40 ip-127-0-0-1 script_name.py: [ERROR] What the crap?
It can be found all list which word matches what in the list you can find at this link
If you need more you can have a look further example:
from logging.handlers import SysLogHandler
import logging
def log(self, severity=logging.DEBUG, message=None):
"""
Log utility for system wide logging needs
#param severity Log severity
#param message Log message
#return
"""
logger = logging.getLogger()
logger.setLevel(severity)
syslog = SysLogHandler(address="/dev/log")
syslog.setFormatter(
logging.Formatter("%(module)s-%(processName)s[%(process)d]: %(name)s: %(message)s")
)
logger.addHandler(syslog)
logger.log(severity, message)
It is quite simple and I use this method as a global logging package in my projects.

Categories

Resources