So Celery is a super great library but its documentation for the logging section isn't the best, which bring me here to request for assistance.
My Script as of now is as so (well in summary):
import logging
from celery import Celery
from celery.utils.log import get_logger
from task import process
import config
logger = get_logger(__name__)
timber_handler = timber.TimberHandler(api_key=config.key,
level=logging.INFO)
logger.addHandler(timber_handler)
app = Celery('task',
broker=config.url,
backend='rpc://')
#app.task
def run_task():
status = get_status() # get alive or dead status
if status == 1:
logger.info("Task is running")
process()
#app.on_after_configure.connect
def task_periodic(**kwargs):
app.add_periodic_task(2.0, run_task.s(), name="Run Constantly")
# More tasks
The process function in the tasks.py file is very basic function hitting up APIs and DBs for some info and I want to be able to log that to a logger (timber.io) which attaches to the python logging library and is an online storage for logs.
However, my major issue is that the logs are getting sent to stdout and not to the timber logs. I have looked at celery.signals but the documentation isn't great. Any assistance here would be greatly appreciated. Thank you.
Can you try this?
import logging
import os
import sys
from celery import Celery
from celery.signals import after_setup_logger
app = Celery('app')
app.conf.update({
'broker_url': 'filesystem://',
'broker_transport_options': {
'data_folder_in': './broker/out',
'data_folder_out': './broker/out',
'data_folder_processed': './broker/processed'
},
'result_persistent': False,
'task_serializer': 'json',
'result_serializer': 'json',
'accept_content': ['json']})
logger = logging.getLogger(__name__)
for f in ['./broker/out', './broker/processed']:
if not os.path.exists(f):
os.makedirs(f)
#after_setup_logger.connect
def setup_loggers(logger, *args, **kwargs):
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
# add filehandler
fh = logging.FileHandler('logs.log')
fh.setLevel(logging.DEBUG)
fh.setFormatter(formatter)
logger.addHandler(fh)
#app.task()
def add(x, y):
logger.info('Found addition')
logger.info('Added {0} and {1} to result, '.format(x,y))
return x+y
if __name__ == '__main__':
task = add.s(x=2, y=3).delay()
Start the worker like this:
celery worker --app=app.app --concurrency=1 --loglevel=INFO
And kick off the task asynchronously:
python app.py
I've changed it so it's a stand-alone script that just uses the filesystem as a message broker (also, I've deliberately replaced the timber.io handler with a filehandler).
This writes the logs to logs.log (replace the filehandler with the timber.io handler and that should solve your issue).
I had a bit of a hard time as the I couldn't get it working with worker_hijack_root_logger=False and a custom logger defined in setup_logging.
However, after revisiting the docs, I came to the conclusion it's a better option to not override the logger but just augment it:
If you’d like to augment the logging configuration setup by Celery
then you can use the after_setup_logger and after_setup_task_logger
signals.
See also: http://docs.celeryproject.org/en/latest/userguide/signals.html#after-setup-logger
Related
Please take time to read full question to understand the exact issue. Thankyou.
I have a runner/driver program that listens to a Kafka topic and dispatches tasks using a ThreadPoolExecuter whenever a new message is received on the topic ( as shown below ) :
consumer = KafkaConsumer(CONSUMER_TOPIC, group_id='ME2',
bootstrap_servers=[f"{KAFKA_SERVER_HOST}:{KAFKA_SERVER_PORT}"],
value_deserializer=lambda x: json.loads(x.decode('utf-8')),
enable_auto_commit=False,
auto_offset_reset='latest',
max_poll_records=1,
max_poll_interval_ms=300000)
with ThreadPoolExecutor(max_workers=10) as executor:
futures = []
for message in consumer:
futures.append(executor.submit(SOME_FUNCTION, ARG1, ARG2))
There is a bunch of code in between but that code is not important here so I have skipped it.
Now, the SOME_FUNCTION is from another python script that is imported ( infact there is a hierarchy of imports that happen in later stages ). What is important is that at some point in these scripts, I call the Multiprocessing Pool because I need to do parallel processing on data ( SIMD - single instruction multiple data ) and use the apply_async function to do so.
for loop_message_chunk in loop_message_chunks:
res_list.append(self.pool.apply_async(self.one_matching.match, args=(hash_set, loop_message_chunk, fields)))
Now, I have 2 versions of the runner/driver program :
Kafka based ( the one shown above )
This version spawns threads that start multiprocessing
Listen To Kafka -> Start A Thread -> Start Multiprocessing
REST based ( using flask to achieve same task with a REST call )
This version does not start any threads and calls multiprocessing right away
Listen to REST endpoint -> Start Multiprocessing
Why 2 runner/driver scripts you ask? - this microservice will be used by multiple teams and some want synchronous REST based while some teams want a real time and asynchronous system that is KAFKA based
When I do logging from the parallelized function ( self.one_matching.match in above example ) it works when called through the REST version but not when called using the KAFKA version ( basically when multiprocessing is kicked off by a thread - it does not work ).
Also notice that only the logging from the parallelized function does not work. rest of the scripts in the hierarchy from runner to the script that calls apply_async - which includes scripts that are called from within the thread - log successfully.
Other details :
I configure loggers using yaml file
I configure the logger in the runner script itself for either KAFKA or REST version
I do a logging.getLogger in every other script called after the runner script to get specific loggers to log to different files
Logger Config ( values replaced with generic since I cannot chare exact names ):
version: 1
formatters:
simple:
format: '%(asctime)s | %(name)s | %(filename)s : %(funcName)s : %(lineno)d | %(levelname)s :: %(message)s'
custom1:
format: '%(asctime)s | %(filename)s :: %(message)s'
time-message:
format: '%(asctime)s | %(message)s'
handlers:
console:
class: logging.StreamHandler
level: DEBUG
formatter: simple
stream: ext://sys.stdout
handler1:
class: logging.handlers.TimedRotatingFileHandler
when: midnight
backupCount: 5
formatter: simple
level: DEBUG
filename: logs/logfile1.log
handler2:
class: logging.handlers.TimedRotatingFileHandler
when: midnight
backupCount: 30
formatter: custom1
level: INFO
filename: logs/logfile2.log
handler3:
class: logging.handlers.TimedRotatingFileHandler
when: midnight
backupCount: 30
formatter: time-message
level: DEBUG
filename: logs/logfile3.log
handler4:
class: logging.handlers.TimedRotatingFileHandler
when: midnight
backupCount: 30
formatter: time-message
level: DEBUG
filename: logs/logfile4.log
handler5:
class: logging.handlers.TimedRotatingFileHandler
when: midnight
backupCount: 5
formatter: simple
level: DEBUG
filename: logs/logfile5.log
loggers:
logger1:
level: DEBUG
handlers: [console, handler1]
propagate: no
logger2:
level: DEBUG
handlers: [console, handler5]
propagate: no
logger3:
level: INFO
handlers: [handler2]
propagate: no
logger4:
level: DEBUG
handlers: [console, handler3]
propagate: no
logger5:
level: DEBUG
handlers: [console, handler4]
propagate: no
kafka:
level: WARNING
handlers: [console]
propogate: no
root:
level: INFO
handlers: [console]
propogate: no
Possible answer: get rid of the threads and use asyncio instead
example pseudocode structure (cobbled together from these examples)
#pseudocode example structure: probably has bugs...
from aiokafka import AIOKafkaConsumer
import asyncio
from concurrent.futures import ProcessPoolExecutor
from functools import partial
async def SOME_FUNCTION_CO(executor, **kwargs):
res_list = []
for loop_message_chunk in loop_message_chunks:
res_list.append(executor.submit(self.one_matching.match, hash_set, loop_message_chunk, fields))
#call concurrent.futures.wait on res_list later, and cancel unneeded futures (regarding one of your prior questions)
return res_list
async def consume():
consumer = AIOKafkaConsumer(
'my_topic', 'my_other_topic',
bootstrap_servers='localhost:9092',
group_id="my-group")
# Get cluster layout and join group `my-group`
await consumer.start()
#Global executor:
#I would also suggest using a "spawn" context unless you really need the
#performance of "fork".
ctx = multiprocessing.get_context("spawn")
tasks = [] #similar to futures in your example (Task subclasses asyncio.Future which is similar to concurrent.futures.Future as well)
with ProcessPoolExecutor(mp_context=ctx) as executor:
try:
# Consume messages
async for msg in consumer:
tasks.append(asyncio.create_task(SOME_FUNCTION_CO(executor, **kwargs)))
finally:
# Will leave consumer group; perform autocommit if enabled.
await consumer.stop()
if __name__ == "__main__":
asyncio.run(consume())
I keep going back and forth on how I think I should represent SOME_FUNCTION in this example, but the key point here is that in the loop over msg in consumer, you are scheduling the tasks to be complete eventually. If any of these tasks take a long time it could block the main loop (which is also running the async for msg in consumer line). Instead; any of these tasks that could take a long time should return a future of some type quickly so you can simply access the result once it's ready.
First of all, I'm not using exactly the same stack. I'm using fastaapi and Redis pubsub and it would be tedious for me to replicate it for flask and Kafka now. I think in principle it should work the same way. At least it might point you tome some misconfiguration in your code. Also, I'm hardcoding the logger config.
I'm sorry to paste a lot of code but I want to provide a complete working example, maybe I'm missing something in your description, you haven't provided a minimal working example.
I have four files:
app.py (fastapi application)
config.py (setup config variables and logger)
redis_ps (redis consumer/listener)
utils (processing function (some_function), redis publish function)
and redis container
docker pull redis
Run
docker run --restart unless-stopped --publish 6379:6379 --name redis -d redis
python3 app.py (will run server and pubsub listener)
python3 utils.py (will publish message over pubsub)
curl -X 'POST' \
'http://0.0.0.0:5000/sync' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '[[2,4],[6, 8]]'
Output
[2021-12-08 17:54:32,688] DEBUG in utils: Run some_function, caller: pubsub
[2021-12-08 17:54:32,688] DEBUG in utils: Run some_function, caller: pubsub
[2021-12-08 17:54:32,698] DEBUG in utils: caller: pubsub, Processing 1, result 1
[2021-12-08 17:54:32,698] DEBUG in utils: caller: pubsub, Processing 3, result 9
[2021-12-08 17:54:32,698] DEBUG in utils: caller: pubsub, Processing 5, result 25
[2021-12-08 17:54:32,698] DEBUG in utils: caller: pubsub, Processing 7, result 49
[2021-12-08 17:54:39,519] DEBUG in utils: Run some_function, caller: rest api
[2021-12-08 17:54:39,520] DEBUG in utils: Run some_function, caller: rest api
[2021-12-08 17:54:39,531] DEBUG in utils: caller: rest api, Processing 8, result 64
[2021-12-08 17:54:39,531] DEBUG in utils: caller: rest api, Processing 6, result 36
[2021-12-08 17:54:39,531] DEBUG in utils: caller: rest api, Processing 2, result 4
[2021-12-08 17:54:39,531] DEBUG in utils: caller: rest api, Processing 4, result 16
Source code
app.py
from concurrent import futures
from typing import List
import uvicorn
from fastapi import FastAPI, APIRouter
from redis_ps import PubSubWorkerThreadListen
from utils import some_function
router = APIRouter()
#router.post("/sync")
def sync_process(data: List[List[int]]):
with futures.ThreadPoolExecutor(max_workers=2) as executor:
future_all = [executor.submit(some_function, loop_message_chunks=d, caller="rest api") for d in data]
return [future.result() for future in future_all]
def create_app():
app = FastAPI(title="app", openapi_url="/openapi.json", docs_url="/")
app.include_router(router)
thread = PubSubWorkerThreadListen()
thread.start()
return app
if __name__ == "__main__":
_app = create_app()
uvicorn.run(_app, host="0.0.0.0", port=5000, debug=True, log_level="debug")
config.py
import sys
import logging
COMPONENT_NAME = "test_logger"
REDIS_URL = "redis://localhost:6379"
def setup_logger(logger_name: str, log_level=logging.DEBUG, fmt: logging.Formatter = None):
fmt = fmt or logging.Formatter("[%(asctime)s] %(levelname)s in %(module)s: %(message)s")
handler = logging.StreamHandler(sys.stdout)
handler.name = "h_console"
handler.setFormatter(fmt)
handler.setLevel(log_level)
logger_ = logging.getLogger(logger_name)
logger_.addHandler(handler)
logger_.setLevel(log_level)
return logger_
setup_logger(COMPONENT_NAME)
redis.ps
import json
import logging
import threading
import time
from concurrent import futures
from typing import Dict, List, Union
import redis
from config import COMPONENT_NAME, REDIS_URL
from utils import some_function
logger = logging.getLogger(COMPONENT_NAME)
class PubSubWorkerThreadListen(threading.Thread):
def __init__(self):
super().__init__()
self._running = threading.Event()
#staticmethod
def connect_pubsub() -> redis.client.PubSub:
while True:
try:
r = redis.Redis.from_url(REDIS_URL)
p = r.pubsub()
p.psubscribe(["*:*:*"])
logger.info("Connected to Redis")
return p
except Exception:
time.sleep(0.1)
def run(self):
if self._running.is_set():
return
self._running.set()
while self._running.is_set():
p = self.connect_pubsub()
try:
listen(p)
except Exception as e:
logger.error(f"Failed to process Redis message or failed to connect: {e}")
time.sleep(0.1)
def stop(self):
self._running.clear()
def get_data(msg) -> Union[Dict, List]:
data = msg.get("data")
if isinstance(data, int):
# the first message has {'data': 1}
return []
try:
return json.loads(data)
except Exception as e:
logger.warning("Failed to parse data in the message (%s) with error %s", msg, e)
return []
def listen(p_):
logger.debug("Start listening")
while True:
for msg_ in p_.listen():
data = get_data(msg_)
if data:
with futures.ThreadPoolExecutor(max_workers=2) as executor:
future_all = [executor.submit(some_function, loop_message_chunks=d, caller="pubsub") for d in data]
[future.result() for future in future_all]
utils.py
import json
import logging
from multiprocessing import Pool
from typing import List
import redis
from config import COMPONENT_NAME, REDIS_URL
logger = logging.getLogger(COMPONENT_NAME)
def one_matching(v, caller: str = ""):
logger.debug(f"caller: {caller}, Processing {v}, result {v*v}")
return v * v
def some_function(loop_message_chunks: List[int], caller: str):
logger.debug(f"Run some_function, caller: {caller}")
with Pool(2) as pool:
v = [pool.apply_async(one_matching, args=(i, caller)) for i in loop_message_chunks]
res_list = [res.get(timeout=1) for res in v]
return res_list
def publish():
data = [[1, 3], [5, 7]]
r_ = redis.Redis.from_url(REDIS_URL)
logger.debug("Published message %s %s", "test", data)
r_.publish("test:test:test", json.dumps(data).encode())
if __name__ == "__main__":
publish()
I have a Heroku worker dyno that is not printing or logging anything to the Heroku logs.
I set up the worker in my procfile so that all logging.info() commands should work:
worker: celery -A tasks worker -B --loglevel=info
Here is the tasks.py file:
from celery import Celery
from celery.decorators import periodic_task
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
import json
import settings, logging
import datetime
from mongoengine import DoesNotExist
app = Celery('tasks',
broker=settings.get('rabbitmq_bigwig_url'),
backend='amqp')
#periodic_task(run_every=datetime.timedelta(minutes=1))
def test():
print 'Not printing!'
logging.info('Also not printing!')
How do I get print/logging messages to write to Heroku's logs? I've tried all the Heroku log commands (heroku logs, heroku logs --ps worker, etc.)
In the example you provided, you initialize logger = get_task_logger(__name__), but then when you mean to log something, you're using logging.info(..).
In your final line, replace logging. with logger..
I have a Python Spark program which I run with spark-submit. I want to put logging statements in it.
logging.info("This is an informative message.")
logging.debug("This is a debug message.")
I want to use the same logger that Spark is using so that the log messages come out in the same format and the level is controlled by the same configuration files. How do I do this?
I've tried putting the logging statements in the code and starting out with a logging.getLogger(). In both cases I see Spark's log messages but not mine. I've been looking at the Python logging documentation, but haven't been able to figure it out from there.
Not sure if this is something specific to scripts submitted to Spark or just me not understanding how logging works.
You can get the logger from the SparkContext object:
log4jLogger = sc._jvm.org.apache.log4j
LOGGER = log4jLogger.LogManager.getLogger(__name__)
LOGGER.info("pyspark script logger initialized")
You need to get the logger for spark itself, by default getLogger() will return the logger for you own module. Try something like:
logger = logging.getLogger('py4j')
logger.info("My test info statement")
It might also be 'pyspark' instead of 'py4j'.
In case the function that you use in your spark program (and which does some logging) is defined in the same module as the main function it will give some serialization error.
This is explained here and an example by the same person is given here
I also tested this on spark 1.3.1
EDIT:
To change logging from STDERR to STDOUT you will have to remove the current StreamHandler and add a new one.
Find the existing Stream Handler (This line can be removed when finished)
print(logger.handlers)
# will look like [<logging.StreamHandler object at 0x7fd8f4b00208>]
There will probably only be a single one, but if not you will have to update position.
logger.removeHandler(logger.handlers[0])
Add new handler for sys.stdout
import sys # Put at top if not already there
sh = logging.StreamHandler(sys.stdout)
sh.setLevel(logging.DEBUG)
logger.addHandler(sh)
We needed to log from the executors, not from the driver node. So we did the following:
We created a /etc/rsyslog.d/spark.conf on all of the nodes (using a Bootstrap method with Amazon Elastic Map Reduceso that the Core nodes forwarded sysloglocal1` messages to the master node.
On the Master node, we enabled the UDP and TCP syslog listeners, and we set it up so that all local messages got logged to /var/log/local1.log.
We created a Python logging module Syslog logger in our map function.
Now we can log with logging.info(). ...
One of the things we discovered is that the same partition is being processed simultaneously on multiple executors. Apparently Spark does this all the time, when it has extra resources. This handles the case when an executor is mysteriously delayed or fails.
Logging in the map functions has taught us a lot about how Spark works.
In my case, I am just happy to get my log messages added to the workers stderr, along with the usual spark log messages.
If that suits your needs, then the trick is to redirect the particular Python logger to stderr.
For example, the following, inspired from this answer, works fine for me:
def getlogger(name, level=logging.INFO):
import logging
import sys
logger = logging.getLogger(name)
logger.setLevel(level)
if logger.handlers:
# or else, as I found out, we keep adding handlers and duplicate messages
pass
else:
ch = logging.StreamHandler(sys.stderr)
ch.setLevel(level)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)
return logger
Usage:
def tst_log():
logger = getlogger('my-worker')
logger.debug('a')
logger.info('b')
logger.warning('c')
logger.error('d')
logger.critical('e')
...
Output (plus a few surrounding lines for context):
17/05/03 03:25:32 INFO MemoryStore: Block broadcast_24 stored as values in memory (estimated size 5.8 KB, free 319.2 MB)
2017-05-03 03:25:32,849 - my-worker - INFO - b
2017-05-03 03:25:32,849 - my-worker - WARNING - c
2017-05-03 03:25:32,849 - my-worker - ERROR - d
2017-05-03 03:25:32,849 - my-worker - CRITICAL - e
17/05/03 03:25:32 INFO PythonRunner: Times: total = 2, boot = -40969, init = 40971, finish = 0
17/05/03 03:25:32 INFO Executor: Finished task 7.0 in stage 20.0 (TID 213). 2109 bytes result sent to driver
import logging
# Logger
logging.basicConfig(format='%(asctime)s %(filename)s %(funcName)s %(lineno)d %(message)s')
logger = logging.getLogger('driver_logger')
logger.setLevel(logging.DEBUG)
Simplest way to log from pyspark !
The key of interacting pyspark and java log4j is the jvm.
This below is python code, the conf is missing the url, but this is about logging.
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
my_jars = os.environ.get("SPARK_HOME")
myconf = SparkConf()
myconf.setMaster("local").setAppName("DB2_Test")
myconf.set("spark.jars","%s/jars/log4j-1.2.17.jar" % my_jars)
spark = SparkSession\
.builder\
.appName("DB2_Test")\
.config(conf = myconf) \
.getOrCreate()
Logger= spark._jvm.org.apache.log4j.Logger
mylogger = Logger.getLogger(__name__)
mylogger.error("some error trace")
mylogger.info("some info trace")
You can implement the logging.Handler interface in a class that forwards log messages to log4j under Spark. Then use logging.root.addHandler() (and, optionally, logging.root.removeHandler()) to install that handler.
The handler should have a method like the following:
def emit(self, record):
"""Forward a log message for log4j."""
Logger = self.spark_session._jvm.org.apache.log4j.Logger
logger = Logger.getLogger(record.name)
if record.levelno >= logging.CRITICAL:
# Fatal and critical seem about the same.
logger.fatal(record.getMessage())
elif record.levelno >= logging.ERROR:
logger.error(record.getMessage())
elif record.levelno >= logging.WARNING:
logger.warn(record.getMessage())
elif record.levelno >= logging.INFO:
logger.info(record.getMessage())
elif record.levelno >= logging.DEBUG:
logger.debug(record.getMessage())
else:
pass
Installing the handler should go immediately after you initialise your Spark session:
spark = SparkSession.builder.appName("Logging Example").getOrCreate()
handler = CustomHandler(spark_session)
# Replace the default handlers with the log4j forwarder.
root_handlers = logging.root.handlers[:]
for h in self.root_handlers:
logging.root.removeHandler(h)
logging.root.addHandler(handler)
# Now you can log stuff.
logging.debug("Installed log4j log handler.")
There's a more complete example here: https://gist.github.com/thsutton/65f0ec3cf132495ef91dc22b9bc38aec
You need to make spark log is reachable for driver and all executors so we have create logging class and deal it as job dependency and load it on each executors.
class Log4j:
def __init__(this, spark_session):
conf = spark_session.SparkContext.getConf()
app_id = conf.get('spark.app.id')
app_name = conf.get('spark.app.name')
log4jlogger = spark_session._jvm.org.apache.log4j
prefix_msg = '<'+app_id + ' : ' + app_name +'> '
print(prefix_msg)
self.logger = log4jlogger.logManager.getLogger(prefix_msg)
def warn(this, msg):
# log warning
self.logger.warn(msg)
def error(this, msg):
#log error
self.logger.error(msg)
def info(this, msg):
# log information message
self.logger.info(msg)
How does one turn on celery logging programmatically?
From the terminal, this works fine:
celery worker -l DEBUG
When I call get_task_logger(__name__).debug('hello'), I can see the message come up in the terminal. (stdout and stderr are being displayed) I can even import logging and call logger.info('hi') and see that too. (both work)
However, while developing a task, I prefer to use a test module and call the task function directly rather than firing up a whole worker. But I can't see the log messages. I understand that Celery is redirecting everything to its internal apparatus, but I want to see the log messages on the stdout too.
How do I tell Celery to send a copy of the log messages back to stdout?
I've read a bunch of online articles about logging but it seems that a number of logging-related configuration vars from celery have been deprecated and it's unclear to me from the docs what is the supported path today.
Here is an example module that creates a celery object and attempts to log output. Nothing shows in the terminal.
example mymodule.py
from celery import Celery
import logging
from celery.utils.log import get_task_logger
app = Celery('test')
app.config_from_object('myfile', True)
get_task_logger(__name__).warn('hello world')
logging.getLogger(__name__).warn('hello world 2')
EDIT
I know that I can add a handler to redirect some of the output back to the terminal by adding a handler
log = get_task_logger(__name__)
h = logging.StreamHandler(sys.stdout)
log.addHandler(h)
But is there a "Celery way" to do this? Maybe one that lets me also have the Celery formatted lines of text.
[2014-03-02 15:51:32,949: WARNING] hello world
I have been looking at the same issue...
What seems to work best is to use the signal handler, according to http://docs.celeryproject.org/en/latest/userguide/signals.html#after-setup-logger
In your celery.py file use:
from celery.signals import after_setup_logger
import logging
#after_setup_logger.connect()
def logger_setup_handler(logger, **kwargs ):
my_handler = MyLogHandler()
my_handler.setLevel(logging.DEBUG)
my_formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') #custom formatter
my_handler.setFormatter(my_formatter)
logger.addHandler(my_handler)
logging.info("My log handler connected -> Global Logging")
if __name__ == '__main__':
app.start()
then you can define MyLogHandler() as you wish.
To send the logs to STDOUT you should also be able to use (I have not tested it):
my_handler = logging.StreamHandler(sys.stdout)
I know how to make twisted use python logging (docs)
But normal python logging is still swallowed. Print-statements are visible, but logger.warn('...') is not.
I use this logging set up in my library which I want to use from twisted:
import logging
logger=logging.getLogger(os.path.basename(sys.argv[0]))
class Foo:
def foo(self):
logger.warn('...')
I don't want to change my library to use twisted logging, since the library is already used in a lot of projects which don't use twisted.
If I google for this problem, I only find solutions which pipe the twisted logs in to the python logging.
How can I view see the logging of my library (without changing it)?
The thing is that Twisted is asynchronous, and avoids doing blocking I/O wherever possible. However, stdlib logging is not asynchronous, and so it does blocking I/O, and the two can't easily mix because of this. You may be able to achieve some measure of cooperation between them if you e.g. use a QueueHandler (introduced in the stdlib in Python 3.2 and mentioned here, but available to earlier versions through the logutils project). You can use this handler (and this handler only) to deal with events sent using stdlib logging, and your corresponding QueueListener can dispatch the events received using Twisted (non-blocking) I/O. It should work, as the queue handler shouldn't block if created with no finite capacity, and assuming that the I/O sinks can get rid of the events quickly enough (otherwise, memory would fill up).
utils/log.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import logging
import logging.handlers
from logging.config import dictConfig
from twisted.python.failure import Failure
from twisted.python import log as twisted_log
logger = logging.getLogger(__name__)
DEFAULT_LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'loggers': {
'twisted':{
'level': 'ERROR',
}
}
}
def failure_to_exc_info(failure):
"""Extract exc_info from Failure instances"""
if isinstance(failure, Failure):
return (failure.type, failure.value, failure.getTracebackObject())
def configure_logging(logfile_path):
"""
Initialize logging defaults for Project.
:param logfile_path: logfile used to the logfile
:type logfile_path: string
This function does:
- Assign INFO and DEBUG level to logger file handler and console handler
- Route warnings and twisted logging through Python standard logging
"""
observer = twisted_log.PythonLoggingObserver('twisted')
observer.start()
dictConfig(DEFAULT_LOGGING)
default_formatter = logging.Formatter(
"[%(asctime)s] [%(levelname)s] [%(name)s] [%(funcName)s():%(lineno)s] [PID:%(process)d TID:%(thread)d] %(message)s",
"%d/%m/%Y %H:%M:%S")
file_handler = logging.handlers.RotatingFileHandler(logfile_path, maxBytes=10485760,backupCount=300, encoding='utf-8')
file_handler.setLevel(logging.INFO)
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.DEBUG)
file_handler.setFormatter(default_formatter)
console_handler.setFormatter(default_formatter)
logging.root.setLevel(logging.DEBUG)
logging.root.addHandler(file_handler)
logging.root.addHandler(console_handler)
hello.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import logging
logger = logging.getLogger(__name__)
from twisted.internet import reactor
from twisted.web.server import Site
from twisted.web.resource import Resource
class BasicPage(Resource):
isLeaf = True
def render_GET(self, request):
logger.info("<html><body><h1>Basic Test</h1><p>This is a basic test page.</p></body></html>")
return "<html><body><h1>Basic Test</h1><p>This is a basic test page.</p></body></html>"
def hello():
logger.info("Basic web server started. Visit http://localhost:8000.")
root = BasicPage()
factory = Site(root)
reactor.listenTCP(8000, factory)
reactor.run()
exit()
main.py
def main():
configure_logging('logfilepath')
hello.hello()