Celery task timeout/time limit for windows? - python

I have a web app written in Flask that is currently running on IIS on Windows (don't ask...).
I'm using Celery to handle some asynchronous processing (accessing a slow database and generating a report).
However, when trying to set up some behavior for error handling, I came across this in the docs:
"Time limits do not currently work on Windows and other platforms that do not support the SIGUSR1 signal."
Since the DB can get really slow, I would really like to be able to specify a timeout behavior for my tasks, and have them retry later when the DB might not be so tasked. Given that the app, for various reasons, has to be served from Windows, is there any workaround for this?
Thanks so much for your help.

If you really need to set the task timeout, you can use the child process to achieve, the code as follows
import json
from multiprocessing import Process
from celery import current_app
from celery.exceptions import SoftTimeLimitExceeded
soft_time_limit = 60
#current_app.task(name="task_name")
def task_worker(self, *args, **kwargs):
def on_failure():
pass
worker = Process(target=do_working, args=args, kwargs=kwargs, name='worker')
worker.daemon = True
worker.start()
worker.join(soft_time_limit)
while worker.is_alive():
worker.terminate()
raise SoftTimeLimitExceeded
return json.dumps(dict(message="ok"))
def do_working(*args, **kwargs):
pass # do something

It doesn't look like there is any built in workaround for this in Celery. Could you perhaps code this into your task directly? In other words, in your python code, start a timer when you begin the task, if the task takes too long to complete, raise an exception, and resubmit the job to the queue again.

Related

Send all output from a specific celery task to a file

from celery import Celery
from celery.schedules import crontab
tasks = Celery("tasks")
#tasks.on_after_configure.connect
def setup_periodic_tasks(sender: Celery, **kwargs) -> None:
"""Setup periodic tasks."""
sender.add_periodic_task(crontab(minute="*/15"), my_task.s())
#tasks.task
def my_task()
SomeModule.do_something()
How do I redirect everything that my_function outputs to a single specific file? Just using logger won't work, because that module might be using all kind of weird things like print(), different loggers into nested threads and other things that I have no control over. There might also be different tasks running a different function from the same module at the same time.
Ideally it'd be something simple like
#tasks.task
#logs_to("path/to/file.txt")
def my_task()
Is this possible?

How to kill a running task in celery?

I am using Celery in python. I have the following task:
#app.task
def data():
while 1:
response = requests.get(url, timeout=300).json()
db.colloectionName.insert_many(response)
sleep(10000)
This task gets data from a web server and saves it on MongoDB in a loop.
I have called it by the following code:
data.delay()
it works fine. But, I want to kill it by programming. I tried by data.AsyncResult(task_id).revoke()
but it does not work.
How can I kill a running task in celery?
Why are you doing data.AsyncResult? You can try doing something like this.
from celery.task.control import revoke
revoke(task_id, terminate=True)
Also, further details can be found here:
http://docs.celeryproject.org/en/latest/userguide/workers.html#revoke-revoking-tasks
You can try:
res = data.delay()
res.revoke()
but I don't understand the point of using Celery in your scenario. How does Celery help you if you're doing a while 1 loop?
Consider breaking it down to a small task that performs a single HTTP request and add Celery beat to call it every 10 seconds.
You can stop beats whenever you like.
You can use following code:
revoke(task_Id, terminate=True)

How to run recurring task in the Python Flask framework?

I'm building a website which provides some information to the visitors. This information is aggregated in the background by polling a couple external APIs every 5 seconds. The way I have it working now is that I use APScheduler jobs. I initially preferred APScheduler because it makes the whole system more easy to port (since I don't need to set cron jobs on the new machine). I start the polling functions as follows:
from apscheduler.scheduler import Scheduler
#app.before_first_request
def initialize():
apsched = Scheduler()
apsched.start()
apsched.add_interval_job(checkFirstAPI, seconds=5)
apsched.add_interval_job(checkSecondAPI, seconds=5)
apsched.add_interval_job(checkThirdAPI, seconds=5)
This kinda works, but there's some trouble with it:
For starters, this means that the interval-jobs are running outside of the Flask context. So far this hasn't been much of a problem, but when calling an endpoint fails I want the system to send me an email (saying "hey calling API X failed"). Because it doesn't run within the Flask context however, it complaints that flask-mail cannot be executed (RuntimeError('working outside of application context')).
Secondly, I wonder how this is going to behave when I don't use the Flask built-in debug server anymore, but a production server with lets say 4 workers. Will it start every job four times then?
All in all I feel that there should be a better way of running these recurring tasks, but I'm unsure how. Does anybody out there have an interesting solution to this problem? All tips are welcome!
[EDIT]
I've just been reading about Celery with its schedules. Although I don't really see how Celery is different from APScheduler and whether it could thus solve my two points, I wonder if anyone reading this thinks that I should investigate more in Celery?
[CONCLUSION]
About two years later I'm reading this, and I thought I could let you guys know what I ended up with. I figured that #BluePeppers was right in saying that I shouldn't be tied so closely to the Flask ecosystem. So I opted for regular cron-jobs running every minute which are set using Ansible. Although this makes it a bit more complex (I needed to learn Ansible and convert some code so that running it every minute would be enough) I think this is more robust.
I'm currently using the awesome pythonr-rq for queueing a-sync jobs (checking APIs and sending emails). I just found out about rq-scheduler. I haven't tested it yet, but it seems to do precisely what I needed in the first place. So maybe this is a tip for future readers of this question.
For the rest, I just wish all of you a beautiful day!
(1)
You can use the app.app_context() context manager to set the application context. I imagine usage would go something like this:
from apscheduler.scheduler import Scheduler
def checkSecondApi():
with app.app_context():
# Do whatever you were doing to check the second API
#app.before_first_request
def initialize():
apsched = Scheduler()
apsched.start()
apsched.add_interval_job(checkFirstAPI, seconds=5)
apsched.add_interval_job(checkSecondAPI, seconds=5)
apsched.add_interval_job(checkThirdAPI, seconds=5)
Alternatively, you could use a decorator
def with_application_context(app):
def inner(func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
with app.app_context():
return func(*args, **kwargs)
return wrapper
return inner
#with_application_context(app)
def checkFirstAPI():
# Check the first API as before
(2)
Yes it will still work. The sole (significant) difference is that your application will not be communicating directly with the world; it will be going through a reverse proxy or something via fastcgi/uwsgi/whatever. The only concern is that if you have multiple instances of the app starting, then multiple schedulers will be created. To manage this, I would suggest you move your backend tasks out of the Flask application, and use a tool designed for running tasks regularly (i.e. Celery). The downside to this is that you won't be able to use things like Flask-Mail, but imo, it's not too good to be so closely tied to the Flask ecosystem; what are you gaining by using Flask-Mail over a standard, non Flask, mail library?
Also, breaking up your application makes it much easier to scale up individual components as the capacity is required, compared to having one monolithic web application.

Bottle: execute a long running function asynchronously and send an early response to the client?

The Bottle app (behind CherryPy) that I'm working on receives a request for a resource from an HTTP client which results in an execution of a task that can take a few hours to finish. I'd like to send an early HTTP response (e.g., 202 Accepted) and continue processing the task. Is there a way to achieve this without using MQ libraries and using Python/Bottle alone?
For example:
from bottle import HTTPResponse
#route('/task')
def f():
longRunningTask() # <-- Anyway to make this asynchronous?
return bottle.HTTPResponse(status=202)
I know this question is several years old but I found #ahmed's answer so unbelievably unhelpful that I thought I would at least share how I solved this problem in my application.
All I did was make use of Python's existing threading libraries, as below:
from bottle import HTTPResponse
from threading import Thread
#route('/task')
def f():
task_thread = Thread(target=longRunningTask) # create a thread that will execute your longRunningTask() function
task_thread.setDaemon(True) # setDaemon to True so it terminates when the function returns
task_thread.start() # launch the thread
return bottle.HTTPResponse(status=202)
Using threads allows you to maintain a consistent response time while still having relatively complex or time-consuming functions.
I used uWSGI, so do make sure you enable threading in your uWSGI application config if that's the way you went.

Having error queues in celery

Is there any way in celery by which if a task execution fails I can automatically put it into another queue.
For example it the task is running in a queue x, on exception enqueue it to another queue named error_x
Edit:
Currently I am using celery==3.0.13 along with django 1.4, Rabbitmq as broker.
Some times the task fails. Is there a way in celery to add messages to an error queue and process it later.
The problem when celery task fails is that I don't have access to the message queue name. So I can't use self.retry retry to put it to a different error queue.
Well, you cannot use the retry mechanism if you want to route the task to another queue. From the docs:
retry() can be used to re-execute the task, for example in the event
of recoverable errors.
When you call retry it will send a new message, using the same
task-id, and it will take care to make sure the message is delivered
to the same queue as the originating task.
You'll have to relaunch yourself and route it manually to your wanted queue in the event of any exception raised. It seems a good job for error callbacks.
The main issue is that we need to get the task name in the error callback to be able to launch it. Also we may not want to add the callback each time we launch a task. Thus a decorator would be a good way to automatically add the right callback.
from functools import partial, wraps
import celery
#celery.shared_task
def error_callback(task_id, task_name, retry_queue, retry_routing_key):
# We must retrieve the task object itself.
# `tasks` is a dict of 'task_name': celery_task_object
task = celery.current_app.tasks[task_name]
# Re launch the task in specified queue.
task.apply_async(queue=retry_queue, routing_key=retry_routing_key)
def retrying_task(retry_queue, retry_routing_key):
"""Decorates function to automatically add error callbacks."""
def retrying_decorator(func):
#celery.shared_task
#wraps(func) # just to keep the original task name
def wrapper(*args, **kwargs):
return func(*args, **kwargs)
# Monkey patch the apply_async method to add the callback.
wrapper.apply_async = partial(
wrapper.apply_async,
link_error=error_callback.s(wrapper.name, retry_queue, retry_routing_key)
)
return wrapper
return retrying_decorator
# Usage:
#retrying_task(retry_queue='another_queue', retry_routing_key='another_routing_key')
def failing_task():
print 'Hi, I will fail!'
raise Exception("I'm failing!")
failing_task.apply_async()
You can adjust the decorator to pass whatever parameters you need.
I had a similar problem and i solved it may be not in a most efficient way but however my solution is as follows:
I have created a django model to keep all my celery task-ids and that is capable of checking the task state.
Then i have created another celery task that is running in an infinite cycle and checks all tasks that are 'RUNNING' on their actual state and if the state is 'FAILED' it just reruns it. Im not actually changing the queue for the task which i rerun but i think you can implement some custom logic to decide where to put every task you rerun this way.

Categories

Resources