After the finishing a task, celery writes it in to redis. However, the key of the entry is determined by celery itself(I think it is the task id). I want to give specific names because that's how other services know their place.
In my code there is one producer that creates the tasks and many workers.
Right now I use tasks_name.get() after all tasks get completed and create new redis entries with my naming convention.
But that is that seems so unnecessary and slow. Celery should just use my convention.
I am thinking about renaming them but if this is feasible. How can I receive ids from producer.
Maybe some custom RedisBackend class but that is too deep.
You can give the task a custom ID using task_id parameter when submitting the task. It's supported by apply and apply_async methods of Celery task and send_task method of Celery class.
Related
I am trying to find a way to add a new task to celery after celery worker has been executed and after celery has been instantiated. I basically want to add a new task with a dynamic name based on user input. I will also want to set a rate limit for that new task.
I have not been able to find any documentation on this and no examples on my google searches. All I have been able to find is dynamically adding periodic tasks with celery beat.
Is there any way to do what I am looking to do?
What you want to achieve is not trivial. I am not aware of any distributed system similar to Celery that allows such thing.
The only way perhaps to do it is to dynamically create and run a new Celery worker with the new task added and configured the way you prefer...
You can register a task as:
def dynamic_task():
return "Hi"
dynamic_task = app.task(dynamic_task, name='my_name')
See that you can register a list of functions in this way, then you will need to restart the worker, however you can use the signals https://docs.celeryproject.org/en/stable/userguide/signals.html as well.
Is there any common solution to store and reuse celery task results without executing tasks again? I have many http fetch tasks in my metasearch project and wish to reduce number of useless http requests (they can take long time and return same results) by store results of first one and fire it back without real fetching. Also it will be very useful to does not start new fetch task when the same one is already in progress. Instead of running new job app has to return AsyncResult by id (id is unique and generated by task call args) of already pending task.
Looks like I need to define new apply_async(Celery.send_task) behavior for tasks with same task_id:
if task with given task_id doesn't started yet then start it
if task with given task_id already started return AsyncResult(task_id) without actually run task
#task decorator should accept new ttl
kwarg to determine cache time (only for redis backend?)
Looks like the simplest answer is to store your results in a cache (like a database) and first ask for the result from your cache else fire the http request.
I don't think there's something specific to celery that can perform this.
Edit:
To comply with the fact that you the tasks are sent at the same time an additional thing would be to build a lock for celery task (see Celery Task Lock receipt).
In your case you want to give the lock a name containing the task name and the url name. And you can use whatever system you want for cache if visible by all your workers (Redis in your case?)
I've started a new Python 3 project in which my goal is to download tweets and analyze them. As I'll be downloading tweets from different subjects, I want to have a pool of workers that must download from Twitter status with the given keywords and store them in a database. I name this workers fetchers.
Other kind of worker is the analyzers whose function is to analyze tweets contents and extract information from them, storing the result in a database also. As I'll be analyzing a lot of tweets, would be a good idea to have a pool of this kind of workers too.
I've been thinking in using RabbitMQ and Celery for this but I have some questions:
General question: Is really a good approach to solve this problem?
I need at least one fetcher worker per downloading task and this could be running for a whole year (actually is a 15 minutes cycle that repeats and last for a year). Is it appropriate to define an "infinite" task?
I've been trying Celery and I used delay to launch some example tasks. The think is that I don't want to call ready() method constantly to check if the task is completed. Is it possible to define a callback? I'm not talking about a celery task callback, just a function defined by myself. I've been searching for this and I don't find anything.
I want to have a single RabbitMQ + Celery server with workers in different networks. Is it possible to define remote workers?
Yeah, it looks like a good approach to me.
There is no such thing as infinite task. You might reschedule a task it to run once in a while. Celery has periodic tasks, so you can schedule a task so that it runs at particular times. You don't necessarily need celery for this. You can also use a cron job if you want.
You can call a function once a task is successfully completed.
from celery.signals import task_success
#task_success(sender='task_i_am_waiting_to_complete')
def call_me_when_my_task_is_done():
pass
Yes, you can have remote workes on different networks.
We have an application that uses a Celery instance in two ways: The instance's .task attribute is used as our task decorator, and when we invoke celery workers, we pass the instance as the -A (--app) argument. This workflow uses the same Celery instance for both producing and consuming, and it has worked, but we are using the same Celery instance for both producers (the tasks) and consumers (the celery workers).
Now, we are considering using Bigwig RabbitMQ, which is an AMQP service provider, and they publish two different URLs, one optimized for message producers, the other optimized for message consumers.
What's the best way for us to modify our setup in order to take advantage of the separate broker endpoints? I'm assuming a single Celery instance can only use a single broker URL (via the BROKER_URL setting). Should we use two distinct Celery instances configured identically except for the BROKER_URL setting?
This feature will be available in Celery 4.0: http://docs.celeryproject.org/en/master/whatsnew-4.0.html#configure-broker-url-for-read-write-separately
Yes you are right one celery instance can use only one broker URL. As you said the only way is to use 2 workers with just different BROKER_URL one for consuming and one for producing.
Technically is trivial, you can take advantage of this (http://celery.readthedocs.org/en/latest/reference/celery.html#celery.Celery.config_from_object) but off course you will have two workers running but I don't think that this introduces any problem.
There is also another option explained here , but I would avoid it.
I'm using Celery (3.0.15) with Redis as a broker.
Is there a straightforward way to query the number of tasks with a given name that exist in a Celery queue?
And, as a followup, is there a way to cancel all tasks with a given name that exist in a Celery queue?
I've been through the Monitoring and Management Guide and don't see a solution there.
# Retrieve tasks
# Reference: http://docs.celeryproject.org/en/latest/reference/celery.events.state.html
query = celery.events.state.tasks_by_type(your_task_name)
# Kill tasks
# Reference: http://docs.celeryproject.org/en/latest/userguide/workers.html#revoking-tasks
for uuid, task in query:
celery.control.revoke(uuid, terminate=True)
There is one issue that earlier answers have not addressed and may throw off people if they are not aware of it.
Among those solutions already posted, I'd use Danielle's with one minor modification: I'd import the task into my file and use its .name attribute to get the task name to pass to .tasks_by_type().
app.control.revoke(
[uuid for uuid, _ in
celery.events.state.State().tasks_by_type(task.name)])
However, this solution will ignore those tasks that have been scheduled for future execution. Like some people who commented on other answers, when I checked what .tasks_by_type() return I had an empty list. And indeed my queues were empty. But I knew that there were tasks scheduled to be executed in the future and these were my primary target. I could see them by executing celery -A [app] inspect scheduled but they were unaffected by the code above.
I managed to revoke the scheduled tasks by doing this:
app.control.revoke(
[scheduled["request"]["id"] for scheduled in
chain.from_iterable(app.control.inspect().scheduled()
.itervalues())])
app.control.inspect().scheduled() returns a dictionary whose keys are worker names and values are lists of scheduling information (hence, the need for chain.from_iterable which is imported from itertools). The task information is in the "request" field of the scheduling information and "id" contains the task id. Note that even after revocation, the scheduled task will still show among the scheduled tasks. Scheduled tasks that are revoked won't get removed from the list of scheduled tasks until their timers expire or until Celery performs some cleanup operation. (Restarting workers triggers such cleanup.)
You can do this in one request:
app.control.revoke([
uuid
for uuid, _ in
celery.events.state.State().tasks_by_type(task_name)
])
As usual with Celery, none of the answers here worked for me at all, so I did my usual thing and hacked together a solution that just inspects redis directly. Here we go...
# First, get a list of tasks from redis:
import redis, json
r = redis.Redis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['CELERY'],
)
l = r.lrange('celery', 0, -1)
# Now import the task you want so you can get its name
from my_django.tasks import my_task
# Now, import your celery app and iterate over all tasks
# from redis and nuke the ones that have a matching name.
from my_django.celery_init import app
for task in l:
task_headers = json.loads(task)['headers']
task_name = task_headers["task"]
if task_name == my_task.name:
task_id = task_headers['id']
print("Terminating: %s" % task_id)
app.control.revoke(task_id, terminate=True)
Note that revoking in this way might not revoke prefetched tasks, so you might not see results immediately.
Also, this answer doesn't support prioritized tasks. If you want to modify it to do that, you'll want some of the tips in my other answer that hacks redis.
It looks like flower provides monitoring:
https://github.com/mher/flower
Real-time monitoring using Celery Events
Task progress and history Ability to show task details (arguments,
start time, runtime, and more) Graphs and statistics Remote Control
View worker status and statistics Shutdown and restart worker
instances Control worker pool size and autoscale settings View and
modify the queues a worker instance consumes from View currently
running tasks View scheduled tasks (ETA/countdown) View reserved and
revoked tasks Apply time and rate limits Configuration viewer Revoke
or terminate tasks HTTP API
OpenID authentication