Send all output from a specific celery task to a file - python

from celery import Celery
from celery.schedules import crontab
tasks = Celery("tasks")
#tasks.on_after_configure.connect
def setup_periodic_tasks(sender: Celery, **kwargs) -> None:
"""Setup periodic tasks."""
sender.add_periodic_task(crontab(minute="*/15"), my_task.s())
#tasks.task
def my_task()
SomeModule.do_something()
How do I redirect everything that my_function outputs to a single specific file? Just using logger won't work, because that module might be using all kind of weird things like print(), different loggers into nested threads and other things that I have no control over. There might also be different tasks running a different function from the same module at the same time.
Ideally it'd be something simple like
#tasks.task
#logs_to("path/to/file.txt")
def my_task()
Is this possible?

Related

Django celery redis remove a specific periodic task from queue

There is a specific periodic task that needs to be removed from message queue. I am using the configuration of Redis and celery here.
tasks.py
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
"""
some operations here
"""
There are other periodic tasks also in the project but I need to stop this specific task to stop from now on.
As explained in this answer, the following code will work?
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
pass
In this example periodic task schedule is defined directly in code, meaning it is hard-coded and cannot be altered dynamically without code change and app re-deploy.
The provided code with task logic deleted or with simple return at the beginning - will work, but will not be the answer to the question - task will still run, there just is no code that will run with it.
Also, it is recommended NOT to use #periodic_task:
"""Deprecated decorator, please use :setting:beat_schedule."""
so it is not recommended to use it.
First, change method from being #periodic_task to just regular celery #task, and because you are using Django - it is better to go straightforward for #shared_task:
from celery import shared_task
#shared_task
def task_abcd():
...
Now this is just one of celery tasks, which needs to be called explicitly. Or it can be run periodically if added to celery beat schedule.
For production and if using multiple workers it is not recommended to run celery worker with embedded beat (-B) - run separate instance of celery beat scheduler.
Schedule can specified in celery.py or in django project settings (settings.py).
It is still not very dynamic, as to re-read settings app needs to be reloaded.
Then, use Database Scheduler which will allow dynamically creating schedules - which tasks need to be run and when and with what arguments. It even provides nice django admin web views for administration!
That code will work but I'd go for something that doesn't force you to update your code every time you need to disable/enable the task.
What you could do is to use a configurable variable whose value could come from an admin panel, a configuration file, or whatever you want, and use that to return before your code runs if the task is in disabled mode.
For instance:
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
config = load_config_for_task_abcd()
if not config.is_enabled:
return
# some operations here
In this way, even if your task is scheduled, its operations won't be executed.
If you simply want to remove the periodic task, have you tried to remove the function and then restart your celery service. You can restart your Redis service as well as your Django server for safe measure.
Make sure that the function you removed is not referenced anywhere else.

Difference between different ways to create celery task

I am very confused by looking at different ways of creating a celery task. On the surface they all work the same So, Can someone explain what is the difference between these.
1.
from myproject.tasks import app
#app.task
def foo():
pass
2.
from celery import task
#task
def foo():
pass
3.
from celery import shared_task
#shared_task
def foo():
pass
I know by a little bit of googling that the difference between the 1nd and 3rd one is shared_task is used when you don't have a concrete app instance. Can someone elaborate more on that and when is the second one is used?
Don't use #2 unless you are using celery v3. If you are using celery v4, use #1.
Use #3 in instances where you are writing a reusable library or django app. For example, if you are writing an open source set of tasks that allow you to manage aws ec2 instances using celery, you would use shared_task so that the tasks could be run on celery, but you would leave it to the person using your library to configure celery for themselves.
Use #1 if you are writing for your own project and there is no concern for re-use.

Celery's inspect unstable behaviour

I got celery project with RabbitMQ backend, that relies heavily on inspecting scheduled tasks. I found that the following code returns nothing for most of the time (of course, there are scheduled tasks) :
i = app.control.inspect()
scheduled = i.scheduled()
if (scheduled):
# do something
This code also runs from one of tasks, but I think it doesn't matter, I got same result from interactive python command line (with some exceptions, see below).
At the same time, celery -A <proj> inspect scheduled command never fails. Also, I noticed, that when called from interactive python command line for the first time, this command also never fails. Most of the successive i.scheduled() calls return nothing.
i.scheduled() guarantees result only when called for the first time?
If so, why and how then can I inspect scheduled tasks from task? Run dedicated worker and restart it after every task? Seems like overkill for such trivial task.
Please explain, how to use this feature the right way.
This is caused by some weird issue inside Celery app. To repeat methods from Inspect object you have to create new Celery app instance object.
Here is small snippet, which can help you:
from celery import Celery
def inspect(method):
app = Celery('app', broker='amqp://')
return getattr(app.control.inspect(), method)()
print inspect('scheduled')
print inspect('active')

Celery logging to file from inside the Django modules

I have the following in my tasks.py
from celery.utils.log import get_task_logger
logger = get_task_logger("celery.task")
I have setup logging for celery.task in my settings.py and all the log from my tasks.py file are properly logged to file.
I have Django modules. These modules can be called directly from Django or by a Celery task. What I want is, if the module is called by Django, then logs should go to Django log file. If the module is called by a task, the log should go to celery task logger.
Example:
# tasks.py
from app.foo import MyFoo
#task(name="bar")
def boo():
MyFoo.someFoo()
# app.foo.py
log = logging.getLogger(__name__)
I want the log messages inside MyFoo to go to celery log when run by a worker task.
Any ideas?
You should be able to configure loggers separately for Django process and Celery process, as if by definition they run in separate processes.
Actually I am surprised that the log output does not go to a separate log files; thus maybe it would make sense to you expose what kind of logging configurations you have already in-place.
Where Python logging output goes is defined by logging handlers.
Configuring logger for Django: https://docs.djangoproject.com/en/dev/topics/logging/
For Celery, there doesn't seem to be straightforward way to override logging settings. They seem to give a celery.signals.setup_logging where you could set a breakpoint, use logging API to reconfigure logging handlers to go to a separate log file.
http://docs.celeryproject.org/en/latest/configuration.html#logging
Alternatively, you can just pull out different logger object on a task level. When you execute a task you know whether it is executed from Django (eager) or Celery.
I have never done this myself, but apparently Celery tasks expose the task running context as self.request parameter (nothing to do with Python classes).
http://docs.celeryproject.org/en/latest/userguide/tasks.html#context
So at the beginning of your task function you could switch between the loggers:
# Define web_worker_logger
# Define task_logger
#task(name='example')
def moobar():
logger = web_worker_logger if self.request.is_eager else task_logger

Celery task timeout/time limit for windows?

I have a web app written in Flask that is currently running on IIS on Windows (don't ask...).
I'm using Celery to handle some asynchronous processing (accessing a slow database and generating a report).
However, when trying to set up some behavior for error handling, I came across this in the docs:
"Time limits do not currently work on Windows and other platforms that do not support the SIGUSR1 signal."
Since the DB can get really slow, I would really like to be able to specify a timeout behavior for my tasks, and have them retry later when the DB might not be so tasked. Given that the app, for various reasons, has to be served from Windows, is there any workaround for this?
Thanks so much for your help.
If you really need to set the task timeout, you can use the child process to achieve, the code as follows
import json
from multiprocessing import Process
from celery import current_app
from celery.exceptions import SoftTimeLimitExceeded
soft_time_limit = 60
#current_app.task(name="task_name")
def task_worker(self, *args, **kwargs):
def on_failure():
pass
worker = Process(target=do_working, args=args, kwargs=kwargs, name='worker')
worker.daemon = True
worker.start()
worker.join(soft_time_limit)
while worker.is_alive():
worker.terminate()
raise SoftTimeLimitExceeded
return json.dumps(dict(message="ok"))
def do_working(*args, **kwargs):
pass # do something
It doesn't look like there is any built in workaround for this in Celery. Could you perhaps code this into your task directly? In other words, in your python code, start a timer when you begin the task, if the task takes too long to complete, raise an exception, and resubmit the job to the queue again.

Categories

Resources