Django + Celery tasks on multiple worker nodes - python

I've deployed a django(1.10) + celery(4.x) on the same VM, with rabbitmq being the broker(on the same machine).
I want to develop the same application on a multi-node architecture like I can just replicate a number of worker nodes, and scale the tasks to run quickly.
Here,
How to configure celery with rabbitmq for this architecture?
On the other worker nodes, what should be the setup?

You should have borker in one node and configure it so that, workers from other nodes can access it.
For that, you can create a new user/vhost on rabbitmq.
# add new user
sudo rabbitmqctl add_user <user> <password>
# add new virtual host
sudo rabbitmqctl add_vhost <vhost_name>
# set permissions for user on vhost
sudo rabbitmqctl set_permissions -p <vhost_name> <user> ".*" ".*" ".*"
# restart rabbit
sudo rabbitmqctl restart
From other nodes, you can queue up tasks or you can just run workers to consume tasks.
from celery import Celery
app = Celery('tasks', backend='amqp',
broker='amqp://<user>:<password>#<ip>/<vhost>')
def add(x, y):
return x + y
If you have a file(say task.py) like this, you can queue up tasks using add.delay().
You can also start worker with
celery worker -A task -l info
You can see my answer here to get a brief idea about how to run tasks on remote machines. For a step by step process, you can checkout a post i have written on scaling celery.

Related

Django celery monitor doesn't show any tasks

I cannot see the tasks in admin.
I followed the steps in https://github.com/jezdez/django-celery-monitor
I used
celery==4.1.1
django-celery-results==1.0.1
django-celery-beat==1.0.1
django_celery_monitor==1.1.2
ran manage.py migrate celery_monitor The migrations went well. ran celery -A lbb events -l info --camera django_celery_monitor.camera.Camera --frequency=2.0 and celery -A lbb worker -l info in separate shells. But still cannot see the tasks I ran in celery-monitor > tasks table.
Running celery command with -E to force event worked for me.
celery -A proj worker -l info -E

Two applications using celery scheduled tasks: "Received unregistered task" errors in one of the workers

The scenario:
Two unrelated web apps with celery background tasks running on same server.
One RabbitMQ instance
Each web app has its own virtualenv (including celery). Same celery version in both virtualenvs.
I use the following command lines to start a worker and a beat process for each application.
celery -A firstapp.tasks worker
celery -A firstapp.tasks beat
celery -A secondapp.tasks worker --hostname foobar
celery -A secondapp.tasks beat
Now everything seems to work OK, but in the worker process of secondapp I get the following error:
Received unregistered task of type 'firstapp.tasks.do_something'
Is there a way to isolate the two celery's from each other?
I'm using Celery version 3.1.16, BTW.
I believe I fixed the problem by creating a RabbitMQ vhost and configuring the second app to use that one.
Create vhost (and set permissions):
sudo rabbitmqctl add_vhost /secondapp
sudo rabbitmqctl set_permissions -p /secondapp guest ".*" ".*" ".*"
And then change the command lines for the second app:
celery -A secondapp.tasks -b amqp://localhost//secondapp worker
celery -A secondapp.tasks -b amqp://localhost//secondapp beat

How to invoke celery task using celery daemon

I have written this task in tasks.py file which is under my django apps directory myapp.
#periodic task that run every minute
#periodic_task(run_every=(crontab(hour="*", minute="*", day_of_week="*")))
def news():
'''
Grab url
'''
logger.info("Start task")
now = datetime.now()
urls = []
urls.append(crawler()) #crawler return dic obj
for url_dic in list(reversed(urls)):
for title, url in url_dict.items():
#Save all the scrape url in database
Url.objects.create(title=headline, url=url)
logger.info("Task finished: result = %s" %url)
The main objectives of this task is to push the url and title to django database every minute
To run this celery task we need to invoke these commands using django ./manage utility how to run these commands as a daemon and I am planning to host this app in heroku
python manage.py celeryd --verbosity=2 --loglevel=DEBUG
python manage.py celerybeat --verbosity=2 --loglevel=DEBUG
but I need to run these two commands command as a daemon in background, How can we run this commands as a daemon so that my celery tasks can run.
A fast fix will be to put "&" after your commands i.e.
python manage.py celeryd --verbosity=2 --loglevel=DEBUG &
python manage.py celerybeat --verbosity=2 --loglevel=DEBUG &
After hitting enter this tasks will act as daemon and still print out the useful debug info. So this is great for initial stage and sometimes small applications that do not rely heavily on celery.
For development purpose i will suggest using supervisor .See THIS POST that gives realy nice info for celery, django and supervisor integration. Read the: "Running Celery workers as daemons" part of the post.

Celery Cloudamqp creates new connection for each task

I am currently using nitrous.io running Django with Celery and then Cloudamqp as my broker with the free plan (max 3 connections). I'm able to connect just fine and start up a periodic task just fine.
When I run
celery -A proj worker -l info
2 connections are created immediately on Cloudamqp and I am able to manually create multiple tasks on a 3rd connection and all is well. However, when I run celery beat with
celery -A proj worker -B -l info
all 3 connections are used and if celery beat creates 1 or more new tasks, another 4th connection will be created thus going over the maximum connections allowed.
I've tried and currently have set
BROKER_POOL_LIMIT = 1
but that doesn't seem to limit the connections
I've also tried
celery -A proj worker -B -l info
celery -A proj worker -B -l info -c 1
celery -A proj worker -B -l info --autoscale=1,1 -c 1
with no luck.
Why is there 2 connections made immediately that are doing nothing?
Is there someway limit the initial celery connections to 0 or 1 or have the tasks share/run on the celery beat connection?
While it does not actually limit connections, another user found that disabling the connection pool reduced the number of connections in practice:
https://stackoverflow.com/a/23563018/1867779
BROKER_POOL_LIMIT = 0
The Redis and Mongo backends have their own connection limit parameters.
http://docs.celeryproject.org/en/master/configuration.html#celery-redis-max-connections
http://docs.celeryproject.org/en/master/configuration.html#celery-mongodb-backend-settings (using the max_pool_size parameter)
The AMQP backend does not have such a setting.
http://docs.celeryproject.org/en/master/configuration.html#amqp-backend-settings
Given that, I'm not sure what BROKER_POOL_LIMIT is meant to do, but I'd really like to see CELERY_AMQP_MAX_CONNECTIONS.
Here's a related, unanswered question: How can I minimise connections with django-celery when using CloudAMQP through dotcloud?

Practical examples on redis and celery

I am new to redis and celery. I have gone through the basic tutorial of both, but I am not getting how to implement then in task scheduling job
I am unable to start with the scripting part. I am not getting how to write a script to make a queue, run the workers etc. I would need a practical example
So here's a cannonical example of how can celery run with Redis (let the script filename be mytasks.py):
from celery import Celery
celery = Celery('tasks', broker='redis://localhost:6379/0')
#celery.task
def add(x, y):
return x + y
As you see, broker argument was set to use Redis installed on your local machine. The next thing is to start celery server:
$ celery -A mytasks worker --loglevel=info
As your tasks celery server has been started, you can now use it to run your task just by importing mytasks script, e.g from Python interpreter interactive mode:
>>> from mytasks import add
>>> add.delay(1, 1)
2
After some time '2' will appear in console.
That's a basic example of how you can setup your tasks execution environment.

Categories

Resources