Sometimes I have a situation where Celery queue builds up on accidental unnecessary tasks, clogging down the server. E.g. the code shoots up 20 000 tasks instead of 1.
How one can inspect what Python tasks Celery queue contains and then get selectively rid of certain tasks?
Tasks are defined and started with the standard Celery decorators (if that matters):
#task()
def update_foobar(foo, bar):
# Some heavy activon here
pass
update_foobar.delay(foo, bar)
Stack: Django + Celery + RabbitMQ.
Maybe you can use Flower. It's a real time monitor for Celery with a nice web interface. I think you can shutdown tasks from there. Anyways I would try to avoid those queued unnecessary tasks.
Related
Is there a way to get all tasks being added to Celery to perform one-after-the-next?
I have a bunch of celery tasks and they can happen at any time (they're triggered by users) and I would like for them to not all run at the same time as to lighten the load on my server.
The simplest way to achieve this is to have a dedicated worker with concurrency set to 1, subscribed to a "special" queue. Then you send your tasks that you want to run sequentially to this queue. celery multi (creates multiple workers on the same node) is especially useful for such use-cases.
I have a Flask app that is using external scripts to perform certain actions. In one of the scripts, I am using threading to run the threads.
I am using the following code for the actual threading:
for a_device in get_devices:
my_thread = threading.Thread(target=DMCA.do_connect, args=(self, a_device, cmd))
my_thread.start()
main_thread = threading.currentThread()
for some_thread in threading.enumerate():
if some_thread != main_thread:
some_thread.join()
However, when this script gets ran (from a form), the process will hang and I will get a continuous loading cycle on the webpage.
Is there another way to use multithreading within the app?
Implementing threading by myself in a Flask app has always ended in some kind of disaster for me. You might want to use a distributed task queue such as Celery. Even though it might be tempting to spin off threads by yourself to get it finished faster, you will start to face all kinds of problems along the way and just end up wasting a lot of time (IMHO).
Celery is an asynchronous task queue/job queue based on distributed
message passing. It is focused on real-time operation, but supports
scheduling as well.
The execution units, called tasks, are executed concurrently on a
single or more worker servers using multiprocessing, Eventlet, or
gevent. Tasks can execute asynchronously (in the background) or
synchronously (wait until ready).
Here are some good resources that you can use to get started
Using Celery With Flask - Miguel Grinberg
Celery Background Tasks - Flask Documentation
In my application, I have python celery tasks that connect to a rest API.. simple.
The problem I have is that the API does not allow multiple resuests with the same credentials.
Is there a way to have these api tasks blocking in the queue? Meaning, If multiple requests are made around the same time, can I have the tasks sit in the queue and execute one by one, waiting for the first in the queue to finish?
Currently, in the rabbitmq message queue (with one worker), i see the tasks go through (spawned) and not wait.
I looked over documentation but could not find a simple solution.
Thanks.
With one worker it's impossible for celery to do more than one task at a time. what you may be seeing is called prefetching which allows the worker to reserve tasks.
http://docs.celeryproject.org/en/latest/userguide/optimizing.html#prefetch-limits
The default prefetch value is 4, turn it down to one and see if that fixes it.
I am new to this, so bear with me if I'm asking something completely stupid.
I am developing a basic web app and using Heroku+flask+python.
For the background tasks, Heroku recommends using a worker. I wonder if I could just create new threads for those background tasks? Or is there a reason why a worker+redis is a better solution?
Those background tasks are not critical, really.
The main benefit to doing this in a separate worker is you'd be completely decoupling your app from your background tasks, so if one breaks it can't affect the other. That said, if you don't care about that, or need your background tasks more tightly coupled to your app for whatever reason, you can use APScheduler to have the background tasks run as separate threads without spinning up another worker. A simple example of that to run a background job every 10 seconds is as follows:
from apscheduler.schedulers.background import BackgroundScheduler
def some_job():
print "successfully finished job!"
apsched = BackgroundScheduler()
apsched.start()
apsched.add_job(my_job, 'interval', seconds=10)
If you want tasks run asynchronously instead of on a schedule, you can use RQ, which has great examples of how to use it on its homepage. RQ is backed by Redis, but you don't need to run it in a separate worker process, although you can if you like.
We plan to have celary running on an app that is replicated as multiple instances.
Will the scheduler still be able to handle a periodic task (e.g 30 minutes) between those multiple instances without being ran too many times in error?
Thanks in advance
If by scheduler you mean celery beat, it doesn't handle the tasks itself.
Instead, it sends a new message to your message queue that your workers will then process.
If you configured your workers to listen on the same queue and the beat to send to this same queue, everything should be well.
PS: You can have multiple workers but you should have only one beat. Check progress on https://github.com/celery/celery/issues/251 for multiple beats.