Having error queues in celery

Having error queues in celery - python

Is there any way in celery by which if a task execution fails I can automatically put it into another queue.
For example it the task is running in a queue x, on exception enqueue it to another queue named error_x
Edit:
Currently I am using celery==3.0.13 along with django 1.4, Rabbitmq as broker.
Some times the task fails. Is there a way in celery to add messages to an error queue and process it later.
The problem when celery task fails is that I don't have access to the message queue name. So I can't use self.retry retry to put it to a different error queue.

Well, you cannot use the retry mechanism if you want to route the task to another queue. From the docs:
retry() can be used to re-execute the task, for example in the event
of recoverable errors.
When you call retry it will send a new message, using the same
task-id, and it will take care to make sure the message is delivered
to the same queue as the originating task.
You'll have to relaunch yourself and route it manually to your wanted queue in the event of any exception raised. It seems a good job for error callbacks.
The main issue is that we need to get the task name in the error callback to be able to launch it. Also we may not want to add the callback each time we launch a task. Thus a decorator would be a good way to automatically add the right callback.
from functools import partial, wraps
import celery
#celery.shared_task
def error_callback(task_id, task_name, retry_queue, retry_routing_key):
# We must retrieve the task object itself.
# `tasks` is a dict of 'task_name': celery_task_object
task = celery.current_app.tasks[task_name]
# Re launch the task in specified queue.
task.apply_async(queue=retry_queue, routing_key=retry_routing_key)
def retrying_task(retry_queue, retry_routing_key):
"""Decorates function to automatically add error callbacks."""
def retrying_decorator(func):
#celery.shared_task
#wraps(func) # just to keep the original task name
def wrapper(*args, **kwargs):
return func(*args, **kwargs)
# Monkey patch the apply_async method to add the callback.
wrapper.apply_async = partial(
wrapper.apply_async,
link_error=error_callback.s(wrapper.name, retry_queue, retry_routing_key)
)
return wrapper
return retrying_decorator
# Usage:
#retrying_task(retry_queue='another_queue', retry_routing_key='another_routing_key')
def failing_task():
print 'Hi, I will fail!'
raise Exception("I'm failing!")
failing_task.apply_async()
You can adjust the decorator to pass whatever parameters you need.

I had a similar problem and i solved it may be not in a most efficient way but however my solution is as follows:
I have created a django model to keep all my celery task-ids and that is capable of checking the task state.
Then i have created another celery task that is running in an infinite cycle and checks all tasks that are 'RUNNING' on their actual state and if the state is 'FAILED' it just reruns it. Im not actually changing the queue for the task which i rerun but i think you can implement some custom logic to decide where to put every task you rerun this way.

Related

nidaqmx: prevent task from closing after being altered in function

I am trying to write an API that takes advantage of the python wrapper for NI-DAQmx, and need to have a global list of tasks that can be edited across the module.
Here is what I have tried so far:
1) Created an importable dictionary of tasks which is updated whenever a call is made to ni-daqmx. The function endpoint processes data from an HTTPS request, I promise it's not just a pointless wrapper around the ni-daqmx library itself.
e.g., on startup, the following is created:
#./daq/__init.py__
import nidaqmx
# ... other stuff ...#
TASKS = {}
then, the user can create a task by calling this endpoint
#./daq/task/task.py
from daq import TASKS
# ...
def api_create_task_endpoint(task_id):
try:
task = nidaqmx.Task(new_task_name=task_id)
TASKS[task_id] = task
except Exception:
# handle it
Everything up to here works as it should. I can get the task list, and the task stays open. I also tried explicitly calling task.control(nidaqmx.constants.TaskMode.TASK_RESERVE), but the following code gives me the same issue no matter what.
When I try to add channels to the task, it closes at the end of the function call no matter how I set the state.
#./daq/task/channels.py
from daq import TASKS
def api_add_channel_task_endpoint(task_id, channel_type, function):
# channel_type corresponds to ni-daqmx channel modules (e.g. ai_channels).
# function corresponds to callable functions (e.g. add_ai_voltage_chan)
# do some preliminary checks (e.g. task exists, channel type valid)
channels = get_chans_from_json_post()
with TASKS[task_id] as task:
getattr(getattr(task, channel_type), function)(channels)
# e.g. task.ai_channels.add_ai_voltage_chan("Dev1/ai0")
This is apparently closing the task. When I call api_create_task_endpoint(task_id) again, I receive the DaqResourceWarning that the task has been closed, and no longer exists.
I similarly tried setting the TaskMode using task.control here, to no avail.
I would like to be able to make edits to the task by storing it in the module-wide TASKS dict, but cannot keep the Task open long enough to do so.
2) I also tried implementing this using the NI-MAX save feature. The issue with this is that tasks cannot be saved unless they already contain channels, which I don't necessarily want to do immediately after creating the task.
I attempted to work around this by adding to the api_create_task_endpoint() some default behavior which just adds a random channel that is removed on the first channel added by the user.
Problem is, I can't find any documentation for a way to remove channels from a task after adding them without a GUI (this is running on CENTOS, so GUI is a non-starter).
Thank you so much for any help!

I haven't use the Python bindings for NI-DAQmx, but
with TASKS[task_id] as task:
looks like it would stop and clear the task immediately after updating it because the program flow leaves the with block and Task.__exit__() executes.
Because you expect these tasks to live while the Python module is in use, my recommendation is to only use task.control() when you need to change a task's state.

How to retry a celery task without duplicating it - SQS

I have a Celery task that takes a message from an SQS queue and tries to run it. If it fails it is supposed to retry every 10 seconds at least 144 times. What I think is happening is that it fails and gets back into the queue, and at the same time it creates a new one, duplicating it to 2. These 2 fail again and follow the same pattern to create 2 new and becoming 4 messages in total. So if I let it run for some time the queue gets clogged.
What I am not getting is the proper way to retry it without duplicating. Following is the code that retries. Please see if someone can guide me here.
from celery import shared_task
from celery.exceptions import MaxRetriesExceededError
#shared_task
def send_br_update(bgc_id, xref_id, user_id, event):
from myapp.models.mappings import BGC
try:
bgc = BGC.objects.get(pk=bgc_id)
return bgc.send_br_update(user_id, event)
except BGC.DoesNotExist:
pass
except MaxRetriesExceededError:
pass
except Exception as exc:
# retry every 10 minutes for at least 24 hours
raise send_br_update.retry(exc=exc, countdown=600, max_retries=144)
Update:
More explanation of the issue...
A user creates an object in my database. Other users act upon that object and as they change the state of that object, my code emits signals. The signal handler then initiates a celery task, which means that it connects to the desired SQS queue and submits the message to the queue. The celery server, running the workers, see that new message and try to execute the task. This is where it fails and the retry logic comes in.
According to celery documentation to retry a task all we need to do is to raise self.retry() call with countdown and/or max_retries. If a celery task raises an exception it is considered as failed. I am not sure how SQS handles this. All I know is that one task fails and there are two in the queue, both of these fail and then there are 4 in the queue and so on...

This is NOT celery nor SQS issues.
The real issues is the workflow , i.e. way of you sending message to MQ service and handle it that cause duplication. You will face the same problem using any other MQ service.
Imagine your flow
script : read task message. MQ Message : lock for 30 seconds
script : task fail. MQ Message : locking timeout, message are now free to be grab again
script : create another task message
Script : Repeat Step 1. MQ Message : 2 message with the same task, so step 1 will launch 2 task.
So if the task keep failing, it will keep multiply, 2,4,8,16,32....
If celery script are mean to "Recreate failed task and send to message queue", you want to make sure these message can only be read ONCE. **You MUST discard the task message after it already been read 1 time, even if the task failed. **
There are at least 2 ways to do this, choose one.
Delete the message before recreate the task. OR
In SQS, you can enforce this by create DeadLetter Queue, configure the Redrive Policy, set Maximum Receives to 1. This will make sure the message
with the task that have been read never recycle.
You may prefer method 2, because method 1 require you to configure celery to "consume"(read and delete) ASAP it read the message, which is not very practical. (and you must make sure you delete it before create a new message for failed task)
This dead letter queue is a way to let you to check if celery CRASH, i.e. message that have been read once but not consumed (delete) means program stop somewhere.

This is probably a little bit late, I have written a backoff policy for Celery + SQS as a patch.
You can see how it is implemented in this repository
https://github.com/galCohen88/celery_sqs_retry_policy/blob/master/svc/celery.py

How to do parallel execution inside flask view function

In my python-flask web app runs at port 5001, I have the scenario to create an endpoint where all other endpoint view functions need to be executed parallel and followed by aggregate all the individual responses to return it on same request life cycle.
For example,
routes in flask app contains, the following view functions
#app.route(/amazon)
def amazon():
return "amazon"
#app.route(/flipkart)
def flipkart():
return "flipkart"
#app.route(/snapdeal)
def sd():
return "snapdeal"
Note: In the above three endpoints, significant amount network io involved
I am creating another endpoint, where all other endpoint implementations has to be called even here collectively.
### This is my endpoint
#app.route(/all)
def do_all():
# execute all amazon, flipkart, snapdeal implementations
I am suggesting two approaches for this above scenario.
Approach-1 (Multiprocessing approach):
Writing the worker task as separate function, calling each callers via python-multiprocessing module and collects the response
def do_all():
def worker(name):
# doing network io task for the given source name
pass
for name in ['amazon', 'flipkart', 'snapdeal']:
p = multiprocessing.Process(target=worker, args=(name,))
jobs.append(p)
p.start()
# Terminating all the process explicitly, Since freezing after execution complete
for j in jobs:
j.terminate()
return 200
Here I am invoking each child process, to call the worker, finally all child process gets terminated explicitly, since it is also wsgi threads i presume.
Approach-2(grequests):
Calling each endpoint explicitly using python-grequests. So, each endpoint which resides in the same app will be called parallel, and collects the response
def do_all():
grequests.post("http://localhost:5001/amazon", data={})
grequests.post("http://localhost:5001/flipkart", data={})
grequests.post("http://localhost:5001/snapdeal", data={})
This will get executed via each wsgi threads will be spawned for each request, Here I have no idea about multiple process spawned and not terminating after execution ?
Both could be similar one, but Which one would be seamless to implement, Please assist me if there is any alternative way to solve this scenario? Why?

Maybe you could simplify it, using another approach:
Return immediate response to the User
def do_all():
return 'amazon, ... being processed'
Invoke all relevant methods in the background.
Let the invoked background methods send signals
from blinker import Namespace
background = Namespace()
amazon_finished = background.signal('amazon-finished')
def amazon_bg():
amazon_finished.send(**interesting)
Subscribe to the signals from the background workers.
def do_all():
amazon_finished.connect(amazon_logic)
Display the return values of the background workers to the user. This would be a new request (maybe in the same route).
def amazon_logic(sender, **extra):
sender
Advantages would be that the User has on each request immediate response to the status of the background workers, error handling would be much easier.
I have not tested it, so you should look up the blinker API for yourself.

How to create a shared counter in Celery?

Is there a way to have a shared counter (shared between workers) in Celery? I am also open to other ideas on how to solve my problem, but would like to stick to Celery. Here is my problem:
I have a task that is dependent on an index passed to it. These tasks could pass or fail, but I need to target a number of passed tasks. If a job fails it should kick off a new job with the next available index.
I can of course do this through a function that tracks the active jobs and initiates the new jobs, but if there was something built in that'd be great.

You can use task_failure celery signal.
from celery.signals import task_failure
#task_failure.connect
def fail_task_handler(sender=None, body=None, **kwargs):
print('a task has failed')
# start new task or do something else
More at http://celery.readthedocs.org/en/latest/userguide/signals.html#task-failure

Celery: standard method for querying pending tasks?

Is there any standard/backend-independent method for querying pending tasks based on certain fields?
For example, I have a task which needs to run once after the “last user interaction”, and I'd like to implement it something like:
def user_changed_content():
task = find_task(name="handle_content_change")
if task is None:
task = queue_task("handle_content_change")
task.set_eta(datetime.now() + timedelta(minutes=5))
task.save()
Or is it simpler to hook directly into the storage backend?

No, this is not possible.
Even if some transports may support accessing the "queue" out of order (e.g. Redis)
it is not a good idea.
The task may not be on the queue anymore, and instead reserved by a worker.
See this part in the documentation: http://docs.celeryproject.org/en/latest/userguide/tasks.html#state
Given that, a better approach would be for the task to check if it should reschedule itself
when it starts:
#task
def reschedules():
new_eta = redis.get(".".join([reschedules.request.task_id, "new_eta"])
if new_eta:
return reschedules.retry(eta=new_eta)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.