I've a python big data project in which i'm trying to do my tasks using celery and Redis server.
The thing is i need three different queue's for three different tasks which i'm applying to celery to do.
This is the configuration i've made to run the three tasks at once but they are using one single queue to perform the task one after the other so it's taking so much time.
from celery import Celery
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
app = Celery("tasks", broker="redis://localhost:6379")
#app.task()
def main(capital,gamma,alpha):
#app.task()
def gain(capital,gamma,alpha):
#app.task()
def lain(capital,gamma,alpha):
And to start the celery app i'm using this lines of code
("celery -A task worker --loglevel=info -P eventlet --concurrency=10 -n worker1#%h" , shell=True)
("celery -A task worker --loglevel=info -P eventlet --concurrency=10 -n worker2#%h" , shell=True)
("celery -A task worker --loglevel=info -P eventlet --concurrency=10 -n worker3#%h" , shell=True)
These code perfectly runs my app with three tasks but i need to make three independent queue's for each of these tasks so that all the three tasks can run concurrently or parallelly at once.
This is how my celery app looks like with three tasks but it's using only one queue which is default queue named celery.
-------------- worker1#EC2AMAZ-RTM8UD8 v5.2.3 (dawn-chorus)
--- ***** -----
-- ******* ---- Windows-10-10.0.20348-SP0 2022-03-10 12:59:07
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: tasks:0x248adf81e80
- ** ---------- .> transport: redis://localhost:6379//
- ** ---------- .> results: disabled://
- *** --- * --- .> concurrency: 10 (eventlet)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. task.gain
. task.lain
. task.main
[2022-03-10 12:59:07,461: INFO/MainProcess] Connected to redis://localhost:6379//
[2022-03-10 12:59:07,477: INFO/MainProcess] mingle: searching for neighbors
[2022-03-10 12:59:08,493: INFO/MainProcess] mingle: all alone
[2022-03-10 12:59:08,493: INFO/MainProcess] pidbox: Connected to redis://localhost:6379//.
[2022-03-10 12:59:08,493: INFO/MainProcess] worker1#EC2AMAZ-RTM8UD8 ready.
So please can anyone help me defining three different queue's for three different tasks in celery app and any help would be very much appreciated :)
It should be as simple as adding -Q <queue name> to those three Celery workers that you run.
Related
I have a celery worker running redis as a broker.
Starting the worker processes gives me this:
celery -A celeryworker worker --loglevel=INFO
-------------- celery#cd38f5e26c28 v5.2.1 (dawn-chorus)
--- ***** -----
-- ******* ---- Linux-5.10.25-linuxkit-x86_64-with-glibc2.28 2021-12-14 00:22:02
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: myapp:0x7f96dd51af10
- ** ---------- .> transport: redis://redis-container:6379/1
- ** ---------- .> results: disabled://
- *** --- * --- .> concurrency: 6 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> 0 exchange=0(direct) key=0
[tasks]
. app.tasks.bye
. app.tasks.printme
[2021-12-14 00:22:02,708: INFO/MainProcess] Connected to redis://redis-container:6379/1
[2021-12-14 00:22:02,717: INFO/MainProcess] mingle: searching for neighbors
[2021-12-14 00:22:03,740: INFO/MainProcess] mingle: all alone
[2021-12-14 00:22:03,762: INFO/MainProcess] celery#cd38f5e26c28 ready.
[2021-12-14 00:22:23,332: INFO/MainProcess] Task app.task.bye[7e28e6a0-8aaa-4609-bd85-9312e91cb355] received
[2021-12-14 00:23:23,326: INFO/ForkPoolWorker-3] Task app.tasks.bye[7e28e6a0-8aaa-4609-bd85-9312e91cb355] succeeded in 60.061842500006605s: 'the text was byebye!!'
This is what I can see in redis right after starting the celery workers:
127.0.0.1:6379[1]> keys *
1) "_kombu.binding.0"
2) "_kombu.binding.celery.pidbox"
3) "_kombu.binding.celeryev"
Even if I set a long timer on my tasks (sleep(60)) the tasks will take 60 seconds to run, but I still don't see anything in my redis container.
mget <key> returns nil for all keys above.
I was expecting to see messages incoming in form of ID or something into Redis (I can see messages if I use SQS as broker, but not for redis).
Your messages are picked immediately by your worker.
To actually see where redis stores them, stop your worker process and then publish it (You can execute task.delay(*args, **kwargs) from python shell).
You'll find your messages under celery key in your redis.
Celery Keys in redis
Note: Check your redis broker url and which logical database it is using
I want to create a multiple queues for different tasks. For example emailqueue to sending emails or pipedrive queue to sync tasks with pipedrive API so email does not have to wait until all pipedrives are synced and vice versa.
I'm new in routing and I tried two approaches to create queues but none of them seemes to work.
This is a preferred approach. I tried to define queue inside #task decorator
#task(bind=True, queue='pipedrivequeue')
def backsync_lead(self,lead_id):
settings.py
CELERY_ROUTES = { # tried CELERY_TASK_ROUTES too
'pipedrive.tasks.*': {'queue': 'pipedrivequeue'},
...
}
In both cases, when I run celery worker manually, I see only one default celery queue.
(project) milano#milano-PC:~/PycharmProjects/project$ celery -A project.celery worker -l info
-------------- celery#milano-PC v4.2.2 (windowlicker)
---- **** -----
--- * *** * -- Linux-4.15.0-47-generic-x86_64-with-Ubuntu-18.04-bionic 2019-04-12 17:17:05
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: project:0x7f3b47f66cf8
- ** ---------- .> transport: redis://localhost:6379//
- ** ---------- .> results: redis://localhost/
- *** --- * --- .> concurrency: 12 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. project.apps.apis.pipedrive.tasks.backsync_all_stages
. project.apps.apis.pipedrive.tasks.backsync_lead
As you can see in this line:
-------------- [queues]
.> celery exchange=celery(direct) key=celery
There is probably just one queue. I want to use this queue only for tasks without queue specified.
Do you know where is the problem?
EDIT
(project) milano#milano-PC:~/PycharmProjects/peoject$ celery inspect active_queues
Error: No nodes replied within time constraint.
You need to run a worker with the queue named explicitly, then django will be able to feed into that queue;
celery worker -A project.celery -l info # Default queue worker
celery worker -A project.celery -l info -Q pipedrivequeue # pipedrivequeue worker
celery worker -A project.celery -l info -Q testqueue # testqueue worker
According to the Celery documentation, the -Q/--queues command line option can be used for:
-Q, --queues
List of queues to enable for this worker, separated by comma. By default all configured queues are enabled. Example: -Q video,image
However I don't understand what does it mean with configured queues here. Does this mean all queues known to Celery, including the default one? Or only the ones defined in the task_queues config option? Does the task_create_missing_queues option affect this?
If you haven't configured anything, it will consume from celery queue and you can as you can see from logs
celery worker -A t
-------------- celery#pavilion v4.0.2 (latentcall)
---- **** -----
--- * *** * -- Linux-4.4.0-79-generic-x86_64-with-Ubuntu-16.04-xenial 2017-06-09 10:39:14
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: tasks:0x7f15cf9cdfd0
- ** ---------- .> transport: amqp://guest:**#localhost:5672//
- ** ---------- .> results: rpc://
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
You can also configure celery to consume a set of queues by default like this
from celery import Celery
from kombu import Queue
app = Celery(broker='amqp://guest#localhost//', backend='rpc')
app.conf.task_queues = (Queue('foo'), Queue('bar'))
Now all workers will consume foo, bar queues by default.
-------------- [queues]
.> bar exchange=celery(direct) key=celery
.> foo exchange=celery(direct) key=celery
I was facing an issue where when I would execute
$ celery -A myCeleryConfig worker -Q myQueue2
I would get error
celery.exceptions.ImproperlyConfigured: Trying to select queue subset of ['myQueue2'], but queue 'myQueue2' isn't defined in the `task_queues` setting.
The documentation for the task_queues setting was unclear to me. It does state
If you really want to configure advanced routing, this setting should be a list of kombu.Queue objects the worker will consume from.
I wasn't sure what it meant by this. No code examples are provided in the documentation. But, thanks to #Chillar's response, I came to find that configuring
app.conf.task_queues = (Queue('myQueue1'), Queue('myQueue2'))
solved the issue. I now see
-------------- [queues]
.> myQueue1 exchange=myQueue1 key=myQueue1
.> myQueue2 exchange=myQueue2 key=myQueue2
when I start the worker, indicating the queue is now registered.
I'm new in Celery. I'm trying to properly configure Celery with my Django project. To test whether the celery works, I've created a periodic task which should print "periodic_task" each 2 seconds. Unfortunately it doesn't work but no error.
1 Installed rabbitmq
2 Project/project/celery.py
from __future__ import absolute_import
import os
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'project.settings')
from django.conf import settings # noqa
app = Celery('project')
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
#app.task(bind=True)
def myfunc():
print 'periodic_task'
#app.task(bind=True)
def debudeg_task(self):
print('Request: {0!r}'.format(self.request))
3 Project/project/__init__.py
from __future__ import absolute_import
from .celery import app as celery_app
4 Settings.py
INSTALLED_APPS = [
'djcelery',
...]
...
...
CELERYBEAT_SCHEDULE = {
'schedule-name': {
'task': 'project.celery.myfunc', # We are going to create a email_sending_method later in this post.
'schedule': timedelta(seconds=2),
},
}
And before python manage.py, I run celery -A project worker -l info
Still can't see any "periodic_task" printed in console every 2 seconds... Do you know what to do?
EDIT CELERY CONSOLE:
-------------- celery#Milwou_NB v3.1.23 (Cipater)
---- **** -----
--- * *** * -- Windows-8-6.2.9200
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: dolava:0x33d1350
- ** ---------- .> transport: amqp://guest:**#localhost:5672//
- ** ---------- .> results: disabled://
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ----
--- ***** ----- [queues]
-------------- .> celery exchange=celery(direct) key=celery
[tasks]
. project.celery.debudeg_task
. project.celery.myfunc
EDIT:
After changing worker to beat, it seems to work. Something is happening each 2 seconds (changed to 5 seconds) but I can't see the results of the task. (I can put anything into the CELERYBEAT_SCHEDULE, even wrong path and it doesn't raises any error..)
I changed myfunc code to:
#app.task(bind=True)
def myfunc():
# notifications.send_message_to_admin('sdaa','dsadasdsa')
with open('text.txt','a') as f:
f.write('sa')
But I can't see text.txt anywhere.
> celery -A dolava beat -l info
celery beat v3.1.23 (Cipater) is starting.
__ - ... __ - _
Configuration ->
. broker -> amqp://guest:**#localhost:5672//
. loader -> celery.loaders.app.AppLoader
. scheduler -> djcelery.schedulers.DatabaseScheduler
. logfile -> [stderr]#%INFO
. maxinterval -> now (0s)
[2016-10-26 17:46:50,135: INFO/MainProcess] beat: Starting...
[2016-10-26 17:46:50,138: INFO/MainProcess] Writing entries...
[2016-10-26 17:46:51,433: INFO/MainProcess] DatabaseScheduler: Schedule changed.
[2016-10-26 17:46:51,433: INFO/MainProcess] Writing entries...
[2016-10-26 17:46:51,812: INFO/MainProcess] Scheduler: Sending due task schedule-name (dolava_app.tasks.myfunc)
[2016-10-26 17:46:51,864: INFO/MainProcess] Writing entries...
[2016-10-26 17:46:57,138: INFO/MainProcess] Scheduler: Sending due task schedule-name (dolava_app.tasks.myfunc)
[2016-10-26 17:47:02,230: INFO/MainProcess] Scheduler: Sending due task schedule-name (dolava_app.tasks.myfunc)
Try to run
$ celery -A project beat -l info
After looking at a lot of articles about chord callbacks not executing and trying their solutions, I am still unable to get it to work. In fact, the chord_unlock method is also not getting executed for some reason.
celery.py
from __future__ import absolute_import
from celery import Celery
app = Celery('sophie',
broker='redis://localhost:6379/2',
backend='redis://localhost:6379/2',
include=['sophie.lib.chord_test'])
app.conf.update(
CELERY_ACCEPT_CONTENT=["json"],
CELERY_TASK_SERIALIZER="json",
CELERY_TRACK_STARTED=True,
CELERYD_PREFETCH_MULTIPLIER=1, # NO PREFETCHING OF TASKS
BROKER_TRANSPORT_OPTIONS = {
'priority_steps': [0, 1] # ALLOW ONLY 2 TASK PRIORITIES
}
)
if __name__ == '__main__':
app.start()
chord_test.py
from __future__ import absolute_import
from sophie.celery import app
from celery import chord
#app.task(name='sophie.lib.add')
def add(x, y):
return x + y
#app.task(name='sophie.lib.tsum')
def tsum(numbers):
return sum(numbers)
if __name__ == '__main__':
tasks = [add.s(100, 100), add.s(200, 200)]
chord(tasks, tsum.s()).apply_async()
The output of my worker logfile is as follows
$ celery worker -l info --app=sophie.celery -n worker1.%h
-------------- celery#worker1.vagrant-ubuntu-11 v3.1.6 (Cipater)
---- **** -----
--- * *** * -- Linux-3.0.0-12-server-x86_64-with-Ubuntu-11.10-oneiric
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> broker: redis://localhost:6379/2
- ** ---------- .> app: sophie:0x3554250
- ** ---------- .> concurrency: 1 (prefork)
- *** --- * --- .> events: OFF (enable -E to monitor this worker)
-- ******* ----
--- ***** ----- [queues]
-------------- .> celery exchange=celery(direct) key=celery
[tasks]
. sophie.lib.add
. sophie.lib.tsum
[2013-12-12 19:37:26,499: INFO/MainProcess] Connected to redis://localhost:6379/2
[2013-12-12 19:37:26,506: INFO/MainProcess] mingle: searching for neighbors
[2013-12-12 19:37:27,512: INFO/MainProcess] mingle: all alone
[2013-12-12 19:37:27,527: WARNING/MainProcess] celery#worker1.vagrant-ubuntu-11 ready.
[2013-12-12 19:37:29,723: INFO/MainProcess] Received task: sophie.lib.add[b7d504c1-217f-43a9-b57e-86f0fcbdbe22]
[2013-12-12 19:37:29,734: INFO/MainProcess] Task sophie.lib.add[b7d504c1-217f-43a9-b57e-86f0fcbdbe22] succeeded in 0.009769904
00522s: 200
[2013-12-12 19:37:29,735: INFO/MainProcess] Received task: sophie.lib.add[eb01a73e-f6c8-401d-8049-6cdbc5f0bd90]
[2013-12-12 19:37:29,737: INFO/MainProcess] Task sophie.lib.add[eb01a73e-f6c8-401d-8049-6cdbc5f0bd90] succeeded in 0.001446505
00442s: 400
There is no chord_unlock being called at all. Some more output to give further context:
$ sudo pip freeze | egrep 'celery|kombu|billiard'
billiard==3.3.0.12
celery==3.1.6
kombu==3.0.7
$ uname -a
Linux vagrant-ubuntu-11 3.0.0-12-server #20-Ubuntu SMP Fri Oct 7 16:36:30 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
$ redis-server --version
Redis server version 2.2.11 (00000000:0)
The chords documentation uses the following syntax for chord:
result = chord(header)(callback)
So you could do:
chord(tasks)(tsum.s())
There is an Important Notes section under the chords documentation that says:
Tasks used within a chord must not ignore their results. In practice this means that you must enable a CELERY_RESULT_BACKEND in order to use chords. Additionally, if CELERY_IGNORE_RESULT is set to True in your configuration, be sure that the individual tasks to be used within the chord are defined with ignore_result=False. This applies to both Task subclasses and decorated tasks.
Even when I followed the recommendations the chord callback is not executed, but might help others.
EDIT:
Found what it was in my case! The chord doesn't fully support specifying queue name in all cases:
https://github.com/celery/celery/issues/2085
Happened to me when my task signatures did not match the argments provided by the flow, accepting a *args might help debug such an issue.