Celery logging to file from inside the Django modules - python

I have the following in my tasks.py
from celery.utils.log import get_task_logger
logger = get_task_logger("celery.task")
I have setup logging for celery.task in my settings.py and all the log from my tasks.py file are properly logged to file.
I have Django modules. These modules can be called directly from Django or by a Celery task. What I want is, if the module is called by Django, then logs should go to Django log file. If the module is called by a task, the log should go to celery task logger.
Example:
# tasks.py
from app.foo import MyFoo
#task(name="bar")
def boo():
MyFoo.someFoo()
# app.foo.py
log = logging.getLogger(__name__)
I want the log messages inside MyFoo to go to celery log when run by a worker task.
Any ideas?

You should be able to configure loggers separately for Django process and Celery process, as if by definition they run in separate processes.
Actually I am surprised that the log output does not go to a separate log files; thus maybe it would make sense to you expose what kind of logging configurations you have already in-place.
Where Python logging output goes is defined by logging handlers.
Configuring logger for Django: https://docs.djangoproject.com/en/dev/topics/logging/
For Celery, there doesn't seem to be straightforward way to override logging settings. They seem to give a celery.signals.setup_logging where you could set a breakpoint, use logging API to reconfigure logging handlers to go to a separate log file.
http://docs.celeryproject.org/en/latest/configuration.html#logging
Alternatively, you can just pull out different logger object on a task level. When you execute a task you know whether it is executed from Django (eager) or Celery.
I have never done this myself, but apparently Celery tasks expose the task running context as self.request parameter (nothing to do with Python classes).
http://docs.celeryproject.org/en/latest/userguide/tasks.html#context
So at the beginning of your task function you could switch between the loggers:
# Define web_worker_logger
# Define task_logger
#task(name='example')
def moobar():
logger = web_worker_logger if self.request.is_eager else task_logger

Related

Send all output from a specific celery task to a file

from celery import Celery
from celery.schedules import crontab
tasks = Celery("tasks")
#tasks.on_after_configure.connect
def setup_periodic_tasks(sender: Celery, **kwargs) -> None:
"""Setup periodic tasks."""
sender.add_periodic_task(crontab(minute="*/15"), my_task.s())
#tasks.task
def my_task()
SomeModule.do_something()
How do I redirect everything that my_function outputs to a single specific file? Just using logger won't work, because that module might be using all kind of weird things like print(), different loggers into nested threads and other things that I have no control over. There might also be different tasks running a different function from the same module at the same time.
Ideally it'd be something simple like
#tasks.task
#logs_to("path/to/file.txt")
def my_task()
Is this possible?

How to propagate errors in python rq worker tasks to Sentry

I have a Flask app with Sentry error tracking. Now I created some tasks with rq, but their errors do not show up in Sentry Issues stream. I can tell the issues aren't filtered out, because the number of filtered issues doesn't increase. The errors show up in heroku logs --tail.
I run the worker with rq worker homework-fetcher -c my_app.rq_sentry
my_app/rq_sentry.py:
import os
import sentry_sdk
from sentry_sdk.integrations.rq import RqIntegration
dsn = os.environ["SENTRY_DSN"]
print(dsn) # I confirmed this appears in logs, so it is initialized
sentry_sdk.init(dsn=dsn, integrations=[RqIntegration()])
Do I have something wrong, or should I set up a full app confirming this and publish a bug report?
Also, I have a (a bit side-) question:
Should I include RqIntegration and RedisIntegration in sentry settings of the app itself? What is the benefit of these?
Thanks a lot for any help
Edit 1: when I schedule task my_app.nonexistent_module, the worker correctly raises error, which is caught by sentry.
So I maybe change my question: how to propagate Exceptions in rq worker tasks to Sentry?
So after 7 months, I figured it out.
The Flask app uses the app factory pattern. Then in the worker, I need to access the database the same way the app does, and for that, I need the app context. So I from app_factory import create_app, and then create_app().app_context().push(). And that's the issue - in the factory function, I also init Sentry for the app itself, so I end up with Sentry initialized twice - once for the app, and once for the worker. Combined with the fact that I called the app initialization in the worker tasks file (not the config), that probably overridden the correct sentry initialization and prevented the tasks from being correctly logged

What is the different between the get logger functions from celery.utils.log and logging?

I have been trying to figure out the difference between the python logger and the celery logger, specifically the difference between the commands below, but cannot find a good answer.
I am using celery v3, with django 1.10.
from celery.utils.log import get_task_logger
logger = get_logger(__name__)
...
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
...
import logging
logger = logging.get_logger(__name__)
The celery documentation (latest, v3.1) is very lacking on this topic.
I have looked at similar questions such as this one, but it is still it unclear which to use, why to use that one, and specifically what the differences are. I am looking for a clear, concise answer.
I am also using sentry in my production environment. How does this choice affect your sentry logs? i.e. these common settings
From experience using get_task_logger seems to get you a few things of importance, especially with Sentry.
Auto prepending task names to your log output
The ability to set log handling rules at a higher level than just module (I believe it's actually setting the logger name to celery.task)
Probably, most importantly for Sentry setup, is it hooks the logging into their log handlers which Sentry makes use of.
Important: There is a bit of extra config that needs to go into Celery registration for Sentry:
https://docs.sentry.io/clients/python/integrations/celery/
You may be able to get errors to flow into Sentry without some of this setup, but I think this will give you the best traces and details + ensure that things like expected exceptions declared via throws are properly ignored.

How to log to different files based on from which python process logging is called?

I am working on a test framework. Each test is launched as a new python multiprocessing process.
There is one master log file and individual log files corresponding to each test.
There is a master logger created at the launch of framework code and a new logger created in each test process. Test loggers log to both - it's own log file and master log file.
There are multiple libraries that can be used by any of the test.
Currently there is no logging done in library function. In order to add logging to the library functions, logger object needs to passed as a parameter to this library function. To achieve this, every function signature in library modules and function call will have to be modified, which is not practical.
As I understand, I cannot have module level logger because module level logger will log to different file for each module and not for each test process.
Can you suggest a solution where I don't have to pass log objects around function and log statements would log to the right file based on which process is calling the function?
The threading module has a get_ident member which could be used to index some logger dictionary, something like;
from threading import get_ident
loggers[get_ident()].logError('blah blah blah')
However, once you have all of this test logging throughout your libraries, how will that impact your production performance?

python testfixtures.LogCapture not capturing logs

I have a multi-module package in python. One of the modules is essentially a command line application. We'll call that one the "top level" module. The other module has three classes in it, which are essentially the backend of the application.
The toplevel module, in the init for it's class, does logging.basicConfig to log debug to file, then adds a console logger for info and above. The backend classes just use getLogger(classname), because when the application run in full, the backend will be called by the top level command line frontend, so logging will already be configured.
In the Test class (subclassed from unittest.TestCase and run via nose), I simply run testfixtures.LogCapture() in setup, and testfixtures.LogCapture.uninstall_all() in tearDown, and all the logging is captured just fine, no effort.
In the backend test file, I tried to do the same thing. I run testfixtures.LogCapture in the setup, uninstall_all in the teardown. However, all the "INFO" level logmessages still print when I'm running unittests for the backend.
Any help on
1) why log capture works for the frontend but not backend
2) an elegant way to be able to log and capture logs in my backend class without explictly setting up logging in those files.
would be amazing.
I fixed the same issue by setting 'disable_existing_loggers' to False when reconfiguring the Logger: the previous logger was disabled and it was preventing it from propagating the logs to the RootLogger.

Categories

Resources