I'm using RabbitMQ to make my tasks pool run sequentially one by one. But how can add a time parameter to make a task only run at the defined time in the future (like a scheduled tasks).
RabbitMQ is not a task scheduler, even though the documentation talks about "scheduling" a task. You might consider using something like cron. You could also use a library like sched to build a scheduler in a Python process.
FYI It looks like this question has already been answered:
Delayed message in RabbitMQ
RabbitMQ has a plugin for delayed messages.
Using this plugin, the messages can be delivered to the respective queues after a certain delay. Thanks to this plugin, you can use RabbitMQ as a scheduler, even though it's not a task scheduler by nature.
You can use celery along with rabbitmq as broker for task scheduling. Here is the celery documentation http://docs.celeryproject.org/en/master/index.html
Related
I am trying to run multiple rabbitMQ consumers in a single call. But in python, other consumers won't start until first one is closed, due to its synchronous nature. Can I use asyncio library to run multiple consumers in parallel? Is yes, how can I do that?
I am currently using threads to run multiple consumers and using pika 1.2.0.
Thanks :)
In my Django project I implemented Celery, which is running using RabbitMQ backend.
What celery does:
Celery puts my tasks into queue, and then under certain conditions runs them. That being said, I basically interact with RabbitMQ message queue exclusively using Celery python interface.
Problem:
I want just to push simple string message into RabbitMQ queue, which should be consumed by 3rd party application.
What I tried:
There is a way how to directly connect to RabbitMQ using Pika library. However I would find it a little clunky - If I have already Celery connected to RabbitMQ, why not use it (if possible) to send simple messages to a specific queue, instead opening another connection using mentioned Pika library.
Any insights appreciated.
You cannot use Celery to send arbitrary messages to your RabbitMQ server.
However, considering that you already use RabbitMQ as a broker, which means you already have all the necesary RabbitMQ support (py-amqp supports it either directly, or via librabbitmq), you can easily send messages to the MQ server from your Celery tasks. If you for whatever reason do not like py-amqp, you may use Pika as you mentioned already.
I use celery, with python 3 and supervisor in Ubuntu.
I've been working to make a new API, which will get an image from the internet using PIL(Pillow) and save it in a server.
However the problem is that I use Celery as scheduler and in the original API it returns the result in a milisecond, but when I use PIL, the wait becomes almost a second.
So as a solution, I am looking for a way to make the Celery worker run in the background.
Is it possible?
What you probably want is to daemonize your Celery worker.
If you follow the steps provided in the Celery running the worker as a daemon documentation you will be able to do that.
It is a bit of a complicated process, but it will allow the Celery worker to run in the background
I need to run some tasks in background of web app (checking the code out, etc) without blocking the views.
The twist in typical Queue/Celery scenario is that I have to ensure that the tasks will complete, surviving even web app crash or restart until those tasks complete, whatever their final result.
I was thinking about recording parameters for multiprocessing.Pool in a database and starting all the incomplete tasks at webapp restart. It's doable, but I'm wondering if there's a simpler or more cost-effective aproach?
UPDATE: Why not Celery itself? Well, I used Celery in some projects and it's really a great solution, but for this task it's on the big side: it requires a separate server, communication, etc., while all I need is spawning a few processes/threads, doing some work in them (git clone ..., svn co ...) and checking whether they succeeded or failed. Another issue is that I need the solution to be as small as possible since I have to make it follow elaborate corporate guidelines, procedures, etc., and the human administrative and bureaucratic overhead I'd have to go through to get Celery onboard is something I'd prefer to avoid if I can.
I would suggest you to use Celery.
Celery does not require its own server, you can have a worker running on the same machine. You can also have a "poor man's queue" using an SQL database instead of a "real" queue/messaging server such as RabbitMQ - this setup would look very much like what you're describing, only with a separate process doing the long-running tasks.
The problem with starting long-running tasks from the webserver process is that in the production environment the web "workers" are normally managed by the webserver - multiple workers can be spawned or killed at any time. The viability of your approach would highly depend on the web server you're using and its configuration. Also, with multiple workers each trying to do a task you may have some concurrency issues.
Apart from Celery, another option is to look at UWSGI's spooler subsystem, especially if you're already using UWSGI.
I have a webapp that's built on python/Flask and it has a corresponding background job that runs continuously, periodically polling for data for each registered user.
I would like this background job to start when the system starts and keep running til it shuts down. Instead of setting up /etc/rc.d scripts, I just had the flask app spawn a new process (using the multiprocessing module) when the app starts up.
So with this setup, I only have to deploy the Flask app and that will get the background worker running as well.
What are the downsides of this? Is this a complete and utter hack that is fragile in some way or a nice way to set up a webapp with corresponding background task?
The downside of your approach is that there are many ways it could fail especially around stopping and restarting your flask application.
You will have to deal with graceful shutdown to give your worker a chance to finish its current task.
Sometime your worker won't stop on time and might linger while you start another one when you reboot your flask application.
Here are some approches I would suggest depending on your constraints:
script + crontab
You only have to write a script that does whatever task you want and cron will take care of running it for you every few minutes. Advantages: cron will run it for you periodically and will start when the system starts. Disadvantages: if the task takes too long, you might have multiple instances of your script running at the same time. You can find some solutions for this problem here.
supervisord
supervisord is a neat way to deal with different daemons. You can set it to run your app, your background script or both and have them start with the server. Only downside is that you have to install supervisord and make sure its daemon is running when the server starts.
uwsgi
uwsgi is a very common way for deploying flask applications. It has few features that might come in handy for managing background workers.
Celery
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well. I think this is the best solution for scheduling background tasks for a flask application or any other python based application. But using it comes with some extra bulk. You will be introducing at least the following processes:
- a broker (rabbitmq or redis)
- a worker
- a scheduler
You can also get supervisord to manage all of the processes above and get them to start when the server starts.
Conclusion
In your quest of reducing the number of processes, I would highly suggest the crontab based solution as it can get you a long way. But please make sure your background script leaves an execution trace or logs of some sort.