I'm relatively new to confluent-kafka python and I am having an hard time figuring out how to send the logs from the kafka library to the program logger.
I am using a SerializingProducer and, according to the docs here: https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#serde-producer I should be able to pass a logger as config entry.
Something like:
logger = logging.getLogger()
producer_config = {
"bootstrap.servers" : "https....",
...,
"logger" : logger
}
However this does not seem to work as intended. The only log visible is the one producer directly by the program. Removing the line
"logger" : logger
allows the kafka library to correctly show the logs in the console.
My producer looks as follow:
log = logging.getLogger(__name__)
class MyProducer(confluent_kafka.SerializingProducer):
def __init__(self, topic, producer_config, source_folder) -> None:
producer_config["logger"] = log
super(MyProducer, self).__init__(producer_config)
log.info(f"Producer initialized with the following conf::\n{producer_config}")
Any idea on why it happens and how to fix this?
Other information available:
kafka library logs to stderr
passing directly an Handler rather than a Logger does not fix the problem, but rather duplicates the output of the program
Thanks in advance.
Related
I deploy greengrass components into my EC2 instance. The deploy greengrass components have been generating logs which wraps around my python log.
what is causing the "wrapping" around it? how can I remove these wraps.
For example, the logs in bold are wraps the original python log.
The log in emphasis is generated by my python log formatter.
2022-12-13T23:59:56.926Z [INFO] (Copier) com.bolt-data.iot.RulesEngineCore: stdout. [2022-12-13 23:59:56,925][DEBUG ][iot-ipc] checking redis pub-sub health (io_thread[140047617824320]:pub_sub_redis:_connection_health_nanny_task:61).
{scriptName=services.com.bolt-data.iot.RulesEngineCore.lifecycle.Run, serviceName=com.bolt-data.iot.RulesEngineCore, currentState=RUNNING}
The following is my python log formatter.
formatter = logging.Formatter(
fmt="[%(asctime)s][%(levelname)-7s][%(name)s] %(message)s (%(threadName)s[%(thread)d]:%(module)s:%(funcName)s:%(lineno)d)"
)
# TODO: when we're running in the lambda function, don't stream to stdout
_handler = logging.StreamHandler(stream=stdout)
_handler.setLevel(get_level_from_environment())
_handler.setFormatter(formatter)
by default Greengrass Nucleus captures the stdoud and stderr streams from the processes it manages, including custom components. It then outputs each line of the logs with the prefix and suffix you have highlighted in bold. This cannot be changed. You can switch the format from TEXT to JSON which can make the log easier to parse by a machine (check greengrass-nucleus-component-configuration - logging.format)
import logging
from logging.handlers import RotatingFileHandler
logger = logging.Logger(__name__)
_handler = RotatingFileHandler("mylog.log")
# additional _handler configuration
logger.addHandler(_handler)
If you want to output a log containing only what your application generates, change the logger configuration output to file. You can write the file in the work folder of the component or in another location, such as /var/log
I set up a basic python logger that writes to a log file and to stdout. When I run my python program locally, log messages with logging.info appear as expected in the file and in the console. However, when I run the same program remotely via ssh -n user#server python main.py neither the console nor the file show any logging.info messages.
This is the code used to set up the logger:
def setup_logging(model_type, dataset):
file_name = dataset + "_" + model_type + time.strftime("_%Y%m%d_%H%M")
logging.basicConfig(
level=logging.INFO,
format="[%(levelname)-5.5s %(asctime)s] %(message)s",
datefmt='%H:%M:%S',
handlers=[
logging.FileHandler("log/{0}.log".format(file_name)),
logging.StreamHandler()
])
I already tried the following things:
Sending a message to logging.warning: Those appear as expected on the root logger. However, even without setting up the logger and falling back to the default logging.info messages do not show up.
The file and folder permissions seem to be alright and an empty file is created on disk.
Using print works as usual as well
If you look into the source code of basicConfig function, you will see that the function is applied only when there are no handlers on the root logger:
_acquireLock()
try:
force = kwargs.pop('force', False)
if force:
for h in root.handlers[:]:
root.removeHandler(h)
h.close()
if len(root.handlers) == 0:
handlers = kwargs.pop("handlers", None)
if handlers is None:
...
I think, one of the libraries you use configures logging on import. And as you see from the sample above, one of the solutions is to use force=True argument.
A possible disadvantage is that several popular data-science libraries keep a reference to the loggers they configure, so that when you reconfigure logging yourselves their old loggers with the handlers are still there and do not see your changes. In which case you will also need to clean the handlers for those loggers as well.
I've had to extend the existing logging library to add some features. One of these is a feature which lets a handler listen to any existing log without knowing if it exists beforehand. The following allows the handler to listen, but does not check if the log exists:
def listen_to_log(target, handler):
logging.getLogger(target).addHandler(handler)
The issue is that I don't want any handler to listen to a log which isn't being logged to, and want to raise a ValueError if the log doesn't already exist. Ideally I'd do something like the following:
def listen_to_log(target, handler):
if not logging.logExists(target):
raise ValueError('Log not found')
logging.getLogger(target).addHandler(handler)
So far I have been unable to find a function like logging.logExists, is there such a function, or a convenient workaround?
Workaround that works for me and possibly you:
When you create a logger for your own code, you will almost certainly create a logger with handlers (file handler and/or console handler).
When you have not yet created a logger and you get the 'root' logger by
logger = logging.getLogger()
then this logger will have no handlers yet.
Therefore, I always check whether the logger above results in a logger that has handlers by
logging.getLogger().hasHandlers()
So I create a logger only if the root logger does not have handlers:
logger = <create_my_logger> if not logging.getLogger().hasHandlers() else logging.getLogger()
The "create_my_logger" snippet represents my code/function that returns a logger (with handlers).
WARNING. This is not documented. It may change without notice.
The logging module internally uses a single Manager object to hold the hierarchy of loggers in a dict accessible as:
import logging
logging.Logger.manager.loggerDict
All loggers except the root logger are stored in that dict under their names.
There are few examples on the net:
http://code.activestate.com/lists/python-list/621740/
and
https://michaelgoerz.net/notes/use-of-the-logging-module-in-python.html (uses Python 2 print)
Of course, if you attach a handler to the root logger, you will (generally) get all messages (unless a library author specifically chooses not to propagate to root handlers).
Why should you care if a handler listens to a log which hasn't been created yet? You can apply filters to a handler that ignore certain events.
This doesn't specifically answer your question, because I see your question as an instance of an XY Problem.
My python code generates a log file using logging framework and all INFO messages are captured in the log file. I integrated my program with ROBOT framework and now the log file is not generated. Instead the INFO messages are printed in the log.html. I understand this is because robot existing logger is being called and hence INFO are directed to log.html. I don't want the behavior to change, I still want the user defined log file to be generated separately with just the INFO level messages.
How can I achieve this?
Python Code --> Logging Library --> "Log File"
RobotFramework --> Python Code --> Logging Library --> by default "log.html"
When you run using python code it will allow you set log file name.
But when you run using robotframework, the file is by default set to log.html (since robot uses the same logging library internally that you are using) so your logging function is overridden by that of robotframework.
That is why you see it in log.html instead of your file.
You can also refer Robot Framework not creating file or writing to it
Hope it helps!
The issue has been fixed now, which was a very minor one. But am still analyzing it deeper, will update when I am clear on the exact cause.
This was the module that I used,
def call_logger(logger_name, logFile):
level = logging.INFO
l = logging.getLogger(logger_name)
if not getattr(l, 'handler_set', None):
formatter = logging.Formatter('%(asctime)s : %(message)s')
fileHandler = logging.FileHandler(logFile, mode = 'a')
fileHandler.setFormatter(formatter)
streamHandler = logging.StreamHandler()
streamHandler.setFormatter(formatter)
l.setLevel(level)
l.addHandler(fileHandler)
l.addHandler(streamHandler)
l.handler_set = True
When I changed the parameter "logFile" to a different name "log_file" it worked.
Looks like "logFile" was a built in robot keyword.
I seem to be running into a problem when I am logging data after invoking another module in an application I am working on. I'd like assistance in understanding what may be happening here.
To replicate the issue, I have developed the following script...
#!/usr/bin/python
import sys
import logging
from oletools.olevba import VBA_Parser, VBA_Scanner
from cloghandler import ConcurrentRotatingFileHandler
# set up logger for application
dbg_h = logging.getLogger('dbg_log')
dbglog = '%s' % 'dbg.log'
dbg_rotateHandler = ConcurrentRotatingFileHandler(dbglog, "a")
dbg_h.addHandler(dbg_rotateHandler)
dbg_h.setLevel(logging.ERROR)
# read some document as a buffer
buff = sys.stdin.read()
# generate issue
dbg_h.error('Before call to module....')
vba = VBA_Parser('None', data=buff)
dbg_h.error('After call to module....')
When I run this, I get the following...
cat somedocument.doc | ./replicate.py
ERROR:dbg_log:After call to module....
For some reason, my last dbg_h logger write attempt is getting output to the console as well as getting written to my dbg.log file? This only appears to happen AFTER the call to VBA_Parser.
cat dbg.log
Before call to module....
After call to module....
Anyone have any idea as to why this might be happening? I reviewed the source code of olevba and did not see anything that stuck out to me specifically.
Could this be a problem I should raise with the module author? Or am I doing something wrong with how I am using the cloghandler?
The oletools codebase is littered with calls to the root logger though calls to logging.debug(...), logging.error(...), and so on. Since the author didn't bother to configure the root logger, the default behavior is to dump to sys.stderr. Since sys.stderr defaults to the console when running from the command line, you get what you're seeing.
You should contact the author of oletools since they're not using the logging system effectively. Ideally they would use a named logger and push the messages to that logger. As a work-around to suppress the messages you could configure the root logger to use your handler.
# Set a handler
logger.root.addHandler(dbg_rotateHandler)
Be aware that this may lead to duplicated log messages.