how to see all celery tasks pushed in rabbitmq queue - python

I've got django project with celery 2.5.5 and rabbitmq backend on debian 6. I've got over 6000 tasks of different types in one queue. There were some bug in code and I need to list all tasks in that queue and pull out some of them. All I need is findout all task ids in rabbitmq queue. I cant findout way how to connect rabbitmq queue and list it's content, best without starting up management plugin.
Great would be something pythonish like:
import somelib
conn = somelib.server(credentials, vhost)
queue = conn.get_queue(queue_name)
messages = queue.get_messages()
But any other tool to list such queue helps. Found out some tool installed using npm, but debian 6 does not know npm and building it from source is not quite pleasant way.
Or something to backup rabbitmq queues in human readable form is also appreciated.
Thanks for ideas
Pavel

You can use celery flower library to do that.
It will provide you with multiple features like displaying task progress and history, showing the task details and graphs and statistics in a pretty dashboard-style interface.
Below are some screenshots for reference.
Task Dashboard:
Worker tasks:
Task info:

If you are up for a premade interface you will like flower. It shows you all tasks in a nice web view.
If you are however trying to process each task programmatically flower isn't the right thing, since it doesn't support it. Then you would have to use a rabbitmq/AMQP library for python which has been discussed about e.g. here: Good Python library for AMQP
With this it should definitely be possible to do your imagined code in some or another way, but you'll have to read into that, since I've been fine with celery and flower for now.

Related

Using celery to send tasks from component A to component B

The technology I would like to use in this example is Celery for queueing and python for component implementation.
Imagine a simple project hat exists of 2 components. One is a web app that connects to an API and gathers data. Component 2 is a processor that can then process the data. When the web app has gotten a piece of data from the API it is supposed to send a task into a task queue including the just crawled data which is then consumed by the processor to process the Data.
Whether or not this is a sensible way to go about a task like this is debatable and not the point of my question.
My question is, the tasks to process things are defined within the processor since they state what processing function shall be executed and the definition of that function is obviously within the processor. Now that the web app doesn't have access to the task definition how does he communicate the task to the processor?
Do you have to hold a copy of the source code of the processor within the web app?
Do you make the processor a dependency of the web app?
What is the best practice approach to handle such a scenario?
What you are describing is probably one of the most common use-cases for Celery. Just look how many people are asking Django/Flask + Celery questions here on StackOverflow... If you are a Django user, there is an entire section in the Celery documentation describing how to do exactly what you want. Things should be similar with other frameworks.
Do you have to hold a copy of the source code of the processor within the web app?
As far as I know you do not have to (I do not use any web framework) but it could be that you do need to because of some deeper integration with Celery. If your web application knows the Celery task name, and its parameters, it can schedule it to run without actually having access to the Python code. This is accomplished using send_task(task_name, ...).
Do you make the processor a dependency of the web app?
As I wrote above there are several ways to use it. If you want tighter integration then yes. If you just want to run task and get result using the send_task() than your web application should only depend on Celery.
What is the best practice approach to handle such a scenario?
Follow the Django guide. I advise you to run Celery independently, run some tasks, just so you learn about basic principles how it distributes the work, etc.

Celery with (bind=True) in dask or dramatiq?

I have been using celery for a while but am looking for an alternative due to the lack of windows support.
The top competitors seem to be dask and dramatiq. What I'm really looking for is for something that can distribute 1000 long running tasks onto 10 machines. Each should pick up the next job when it has completed the task, and give a callback with updates (in celery this can be nicely achieved with #task(bind=True), as the task instance itself can be accessed and I can send the status back to the instance that sent it with an update).
Is there a similar functionality available in dramatiq or dask? Any suggestions would be appreciated.
On the Dask side you're probably looking for the futures interface : https://docs.dask.org/en/latest/futures.html
Futures have a basic status like "finished" or "pending" or "error" that you can check any time. If you want more complex messages then you should look into Dask Queues, PubSub, or other intertask communication mechanisms, also available from that doc page.

Status of Python Celery tasks

I'm wondering what kind of options there are for monitoring celery tasks from a browser, after they have been deployed to a worker?
My current application stack is a flask app running inside twisted, using celery to run dozens to thousands of small background tasks (updating metadata in a repository, creating image derivatives, etc.) I'm envisioning using ajax long-polling to monitor the status of the celery tasks initiated by the user. I'm using redis for the backend broker and results.
I see celery has some command line ways to monitor tasks, or flower for a web dashboard. But if I wanted to see more detailed status from a particular task sent to celery, would it make more sense for that task to print / write to a log file, then long-poll that file for changes from the flask front-end?
At this point a user can say, "update these 10,000 items", the tasks are sent to celery, and the front-end very quickly says, "job sent!". And the tasks do complete. But I'd like to have the user navigate to "/status" and see the status of those 10,000 small jobs - even a scrolling log file would probably work.
Any suggestions would be greatly appreciated. Took a lot of head scratching to make it this far sketching things out, but I'm spinning my wheels figuring out exactly WHAT to long-poll from the user front-end.
Try Jobstatic, which is extending Celery.
From project description:
Jobtastic gives you goodies like:
Easy progress estimation/reporting
Job status feedback
Helper methods for gracefully handling a dead task broker (delay_or_eager and delay_or_fail)
Super-easy result caching
Thundering herd avoidance
Integration with a celery jQuery plugin for easy client-side progress display
Memory leak detection in a task run
Jobtastic was a great idea, but not quite what worked for us. In the end, decided to create an incrementing job number (stored in Redis alongside results and broker), push all celery task id's associated with that job number into a python object, then pickle and store that in redis. We can then use that later to see if the entire "job" is complete, or the status thereof. For our purposes, works just lovely.

Simple approach to launching background task in Django

I have a Django website, and one page has a button (or link) that when clicked will launch a somewhat long running task. Obviously I want to launch this task as a background task and immediately return a result to the user. I want to implement this using a simple approach that will not require me to install and learn a whole new messaging architecture like Celery for example. I do not want to use Celery! I just want to use a simple approach that I can set up and get running over the next half hour or so. Isn't there a simple way to do this in Django without having to add (yet another) 3rd party package?
Just use a thread.
import threading
t = threading.Thread(target=long_process,
args=args,
kwargs=kwargs)
t.setDaemon(True)
t.start()
return HttpResponse()
See this question for more details:
Can Django do multi-thread works?
Have a look at django-background-tasks - it does exactly what you need and doesn't need any additional services to be running like RabbitMQ or Redis. It manages a task queue in the database and has a Django management command which you can run once or as a cron job.
If you're willing to install a 3rd party library, but you want something a whole lot simpler than Celery, check out Redis Queue. It does require Redis, which is pretty easy in itself, but that can provide a lot of other benefits as well.
RQ itself has almost zero configuration. It's startlingly simple.
References:
http://python-rq.org/
http://nvie.com/posts/introducing-rq/
https://devcenter.heroku.com/articles/python-rq (RQ on Heroku)

Django, Signals and another process

I've got my Django project running well, and a separate background process which will collect data from various sources and store that data in an index.
I've got a model in a Django app called Sources which contains, essentially, a list of sources that data can come from! I've successfully managed to create a signal that is activated/called when a new entry is put in the Sources model.
My question is, is there a simple way that anybody knows of whereby I can send some form of signal/message to my background process indicating that the Sources model has been changed? Or should I just resort to polling for changes every x seconds, because it's so much simpler?
Many thanks for any help received.
It's unclear how are you running the background process you're talking about.
Anyway, I'd suggest that in your background task you use the Sources model directly. There are convenient ways to run the task without leaving realm of Django (so as to have an access to your models. You can use Celery [1], for example, or RQ [2].
Then you won't need to pass any messages, any changes to Sources model will take effect the next time your task is run.
[1] Celery is an open source asynchronous task queue/job queue, it isn't hard to set up and integrates with Django well.
Celery: general introduction
Django with celery introduction
[2] RQ means "Redis Queue", it is ‘a simple Python library for queueing jobs and processing them in the background with workers’.
Introductory post
GitHub repository
Polling is probably the easiest if you don't need split-second latency.
If you do, however, then you'll probably want to look into either, say,
sending an UNIX signal (or other methods of IPC, depending on platform) to the process
having the background process have a simple listening socket that you just send, say, a byte to (which is, admittedly, a form of IPC), and that triggers the action you want to trigger
or some sort of task/message queue. Celery or ZeroMQ come to mind.

Categories

Resources