worker does not consume tasks after celery add_consumer is called - python

I would like to leverage Celery (with RabbitMQ as backend MQ) to execute tasks of varying flavors via different Queues. One requirement is that consumption (by the workers) from a particular Queue should have the capability to be paused and resumed.
Celery, seems to have this capability via calling add_consumer and cancel_consumer. While I was able to cancel the consumption of tasks from a queue for a particular worker, I cannot get the worker to resume consumption by calling add_consumer. The code to reproduce this issue is provided here. My guess is likely I'm missing some sort of a parameter to be provided either in the celeryconfig or via the arguments when starting the workers?
Would be great to get some fresh pairs of eyes on this. There is not much discussion on Stackoverflow regarding add_consumer nor in Github. So I'm hoping there's some experts here willing to share their thoughts/experience.
--
I am running the below:
Windows OS, RabbitMQ 3.5.6, Erlang 18.1, Python 3.3.5, celery 3.1.15

To resume from queue, you need to specify queue name as well as target workers. Here is how to do it.
app.control.add_consumer(queue='high', destination=['celery#asus'])
Here is add_consumer signature
def add_consumer(state, queue, exchange=None, exchange_type=None,
routing_key=None, **options):
In your case, you are calling with
app.control.add_consumer('high', destination=['celery#high1woka'])
So high is getting passed to state and queue is empty. So it is not able to resume.

To get celery worker to resume working in Windows OS, my work around is listed below.
update celery : pip install celery==4.1.0
update billiard/spawn.py : encasulate line 338 to 339 with try: except: pass
(optional) install eventlet: pip install eventlet==0.22.1
add --pool=eventlet or --pool=solo when starting workers per comment in https://github.com/celery/celery/issues/4178

Related

Concurrency within redis queue

I'm working with a django application hosted on heroku with redistogo addon:nano pack. I'm using rq, to execute tasks in the background - the tasks are initiated by online users. I've a constraint on increasing number of connections, limited resources I'm afraid.
I'm currently having a single worker running over 'n' number of queues. Each queue uses an instance of connection from the connection pool to handle 'n' different types of task. For instance, lets say if 4 users initiate same type of task, I would like to have my main worker create child processes dynamically, to handle it. Is there a way to achieve required multiprocessing and concurrency?
I tried with multiprocessing module, initially without introducing Lock(); but that exposes and overwrites user passed data to the initiating function, with the previous request data. After applying locks, it restricts second user to initiate the requests by returning a server error - 500
github link #1: Looks like the team is working on the PR; not yet released though!
github link #2: This post helps to explain creating more workers at runtime.
This solution however also overrides the data. The new request is again processed with the previous requests data.
Let me know if you need to see some code. I'll try to post a minimal reproducible snippet.
Any thoughts/suggestions/guidelines?
Did you get a chance to try AutoWorker?
Spawn RQ Workers automatically.
from autoworker import AutoWorker
aw = AutoWorker(queue='high', max_procs=6)
aw.work()
It makes use of multiprocessing with StrictRedis from redis module and following imports from rq
from rq.contrib.legacy import cleanup_ghosts
from rq.queue import Queue
from rq.worker import Worker, WorkerStatus
After looking under the hood, I realised Worker class is already implementing multiprocessing.
The work function internally calls execute_job(job, queue) which in turn as quoted in the module
Spawns a work horse to perform the actual work and passes it a job.
The worker will wait for the work horse and make sure it executes within the given timeout bounds,
or will end the work horse with SIGALRM.
The execute_job() funtion makes a call to fork_work_horse(job, queue) implicitly which spawns a work horse to perform the actual work and passes it a job as per the following logic:
def fork_work_horse(self, job, queue):
child_pid = os.fork()
os.environ['RQ_WORKER_ID'] = self.name
os.environ['RQ_JOB_ID'] = job.id
if child_pid == 0:
self.main_work_horse(job, queue)
else:
self._horse_pid = child_pid
self.procline('Forked {0} at {1}'.format(child_pid, time.time()))
The main_work_horse makes an internal call to perform_job(job, queue) which makes a few other calls to actually perform the job.
All the steps about The Worker Lifecycle mentioned over rq's official documentation page are taken care within these calls.
It's not the multiprocessing I was expecting, but I guess they have a way of doing things. However my original post is still not answered with this, also I'm still not sure about concurrency..
The documentation there still needs to be worked upon, since it hardly covers the true essence of this library!

Celery with (bind=True) in dask or dramatiq?

I have been using celery for a while but am looking for an alternative due to the lack of windows support.
The top competitors seem to be dask and dramatiq. What I'm really looking for is for something that can distribute 1000 long running tasks onto 10 machines. Each should pick up the next job when it has completed the task, and give a callback with updates (in celery this can be nicely achieved with #task(bind=True), as the task instance itself can be accessed and I can send the status back to the instance that sent it with an update).
Is there a similar functionality available in dramatiq or dask? Any suggestions would be appreciated.
On the Dask side you're probably looking for the futures interface : https://docs.dask.org/en/latest/futures.html
Futures have a basic status like "finished" or "pending" or "error" that you can check any time. If you want more complex messages then you should look into Dask Queues, PubSub, or other intertask communication mechanisms, also available from that doc page.

why each time a new queue is generated by celery+rabbitmq?

abmp.py:
from celery import Celery
app = Celery('abmp', backend='amqp://guest#localhost',broker='amqp://guest#localhost' )
#app.task(bind=True)
def add(self, a, b):
return a + b
execute_test.py
from abmp import add
add.apply_async(
args=(5,7),
queue='push_tasks',
exchange='push_tasks',
routing_key='push_tasks'
)
execute celery
celery -A abmp worker -E -Q push_tasks -l info
execute execute_test.py
python2.7 execute_test.py。
Finally to the rabbitmq background view and found that the implementation of execute_test.py each time to generate a new queue, rather than the task thrown into push_tasks queue.
You are using AMQP as result backend. Celery stores each task's result as new queue, named with the task's ID. Use a better suited backend (Redis, for example) to avoid spamming new queues.
When you are using AMQP as the result backend for Celery, default behavior is to store every task result (for 1 day as per the faqs in http://docs.celeryproject.org/en/latest/faq.html).
As per the documentation on current stable version (4.1), this is deprecated and should not be used.
Your options are,
Use result_expires setting, if you plan to go ahead with amqp as backend.
Use a different backend (like redis)
If you dont need the results at all, user ignore_result setting

Inspecting Celery queues for tasks from Python

I have a Django+Celery application, where Celery is used to push (and pull) Django model instances to a third party SOAP service.
My Django models have dependencies between them and also a simple hash like this:
class MyModel(Models):
def get_dependencies(self):
# ...
return [...]
def __hash__(self):
return hash(self.__class__.__name__+str(self.pk))
This hash came handy in my own implementation which I had to drop due to stability issues. Celery is a much sounder ground.
When I push an instance over to the SOAP service, I must make sure that its dependencies have been pushed. This is done by checking all related instances for a pushed_ok timestamp fields.
The difficult part is when an instance a which depends on list of instances deps (all are instances of MyModel subclasses) is being pushed. I cannot push a unless all instances in deps have been processed by Celery. In other words I need to serialize tasks so that the dependecies order is respected.
Celery is run like this:
celery -A server worker -P eventlet -c 100
How can I prevent one of the the eventlets (/process/thread) from running a before its dependencies, if any, have finished being run by other eventlets?
Thank you for your help.
I went for a pragmatic solution of moving all dependency checking of a resource (which includes pushing out-of-sync dependencies to the soap server) inside the celery task. Instead of trying to serialize tasks depending on the resources' dependencies.
The up side is that it keeps it simple and I could implement it rapidly.
The down side is that I'm locking a worker for a moment and potentially many synchronous SOAP operations instead of dispatching these operations accross workers.

how to see all celery tasks pushed in rabbitmq queue

I've got django project with celery 2.5.5 and rabbitmq backend on debian 6. I've got over 6000 tasks of different types in one queue. There were some bug in code and I need to list all tasks in that queue and pull out some of them. All I need is findout all task ids in rabbitmq queue. I cant findout way how to connect rabbitmq queue and list it's content, best without starting up management plugin.
Great would be something pythonish like:
import somelib
conn = somelib.server(credentials, vhost)
queue = conn.get_queue(queue_name)
messages = queue.get_messages()
But any other tool to list such queue helps. Found out some tool installed using npm, but debian 6 does not know npm and building it from source is not quite pleasant way.
Or something to backup rabbitmq queues in human readable form is also appreciated.
Thanks for ideas
Pavel
You can use celery flower library to do that.
It will provide you with multiple features like displaying task progress and history, showing the task details and graphs and statistics in a pretty dashboard-style interface.
Below are some screenshots for reference.
Task Dashboard:
Worker tasks:
Task info:
If you are up for a premade interface you will like flower. It shows you all tasks in a nice web view.
If you are however trying to process each task programmatically flower isn't the right thing, since it doesn't support it. Then you would have to use a rabbitmq/AMQP library for python which has been discussed about e.g. here: Good Python library for AMQP
With this it should definitely be possible to do your imagined code in some or another way, but you'll have to read into that, since I've been fine with celery and flower for now.

Categories

Resources