celery with redis: mixing workers with different concurrency - python

I use the following scripts to distribute tasks to two workers on different nodes.
celery_call.py:
from celery_test import add
import time
results = []
for i in range(20):
results.append(add.delay(i))
for result in results:
timeStart = time.time()
resultValue = result.get(timeout=10)
timePassed = time.time() - timeStart
print(timePassed, resultValue)
celery_test.py:
from celery import Celery
app = Celery('celery_test', backend='redis://ip', broker='redis://ip')
#app.task
def add(x):
import time
time.sleep(2)
return x
I have two workers running on two different nodes. First node:
celery -A celery_test worker --concurrency 1 -l INFO
and second node:
celery -A celery_test worker --concurrency 10 -l INFO
The tasks are distributed and solved within few seconds.
In 10 seconds on first node:
[2017-05-27 13:46:22,529: INFO/MainProcess] Received task: celery_test.add[2d9d592f-391f-4e1f-8dd2-5e50e5977c81]
[2017-05-27 13:46:22,531: INFO/MainProcess] Received task: celery_test.add[e5c20ae9-92d7-4811-9b54-4efd3707bb5f]
[2017-05-27 13:46:22,533: INFO/MainProcess] Received task: celery_test.add[bb9b50bc-1c5d-4ede-abfb-9591f53b3912]
[2017-05-27 13:46:22,535: INFO/MainProcess] Received task: celery_test.add[b195fbb4-2683-461e-aee7-4715a1387eb6]
[2017-05-27 13:46:22,537: INFO/MainProcess] Received task: celery_test.add[c8ce9d10-f03a-4585-9ff8-0d49cfd2e8b2]
[2017-05-27 13:46:24,538: INFO/PoolWorker-1] Task celery_test.add[2d9d592f-391f-4e1f-8dd2-5e50e5977c81] succeeded in 2.0077112819999456s: 1
[2017-05-27 13:46:26,543: INFO/PoolWorker-1] Task celery_test.add[e5c20ae9-92d7-4811-9b54-4efd3707bb5f] succeeded in 2.0030374974012375s: 3
[2017-05-27 13:46:28,547: INFO/PoolWorker-1] Task celery_test.add[bb9b50bc-1c5d-4ede-abfb-9591f53b3912] succeeded in 2.0030434764921665s: 5
[2017-05-27 13:46:30,551: INFO/PoolWorker-1] Task celery_test.add[b195fbb4-2683-461e-aee7-4715a1387eb6] succeeded in 2.0029842611402273s: 7
[2017-05-27 13:46:32,555: INFO/PoolWorker-1] Task celery_test.add[c8ce9d10-f03a-4585-9ff8-0d49cfd2e8b2] succeeded in 2.0029691718518734s: 9
In 4 seconds on second node:
[2017-05-27 13:46:22,528: INFO/MainProcess] Received task: celery_test.add[402fb858-d9ec-4565-ab71-fbf4ec531787]
[2017-05-27 13:46:22,530: INFO/MainProcess] Received task: celery_test.add[311fc7ed-e44a-4119-a0fa-c6849574723e]
[2017-05-27 13:46:22,532: INFO/MainProcess] Received task: celery_test.add[af54e423-651b-4b01-a3d1-26ead5ae6af1]
[2017-05-27 13:46:22,534: INFO/MainProcess] Received task: celery_test.add[29234f7f-f841-44c2-94fb-b491a074318b]
[2017-05-27 13:46:22,537: INFO/MainProcess] Received task: celery_test.add[1a638710-810a-422d-8f2f-d554af3c5a92]
[2017-05-27 13:46:22,538: INFO/MainProcess] Received task: celery_test.add[5b1a6863-1b62-4927-849b-e04e2d34ce7c]
[2017-05-27 13:46:22,540: INFO/MainProcess] Received task: celery_test.add[f4cd393a-2f02-48dd-b27f-d73806e154da]
[2017-05-27 13:46:22,543: INFO/MainProcess] Received task: celery_test.add[da8241bf-ce4e-4fe6-bd65-91350da8b163]
[2017-05-27 13:46:22,544: INFO/MainProcess] Received task: celery_test.add[4892b31a-e488-4011-86e7-d55eb941cf1f]
[2017-05-27 13:46:22,545: INFO/MainProcess] Received task: celery_test.add[c883b6ec-3842-4b50-bfab-35613f1724ed]
[2017-05-27 13:46:22,548: INFO/MainProcess] Received task: celery_test.add[1b021f4c-b41d-46a8-8548-7016539e8a8b]
[2017-05-27 13:46:22,549: INFO/MainProcess] Received task: celery_test.add[ae5d9b7d-0fa2-493b-aa0b-0b13b3764fdf]
[2017-05-27 13:46:22,551: INFO/MainProcess] Received task: celery_test.add[af0d27fe-394f-4fbb-821e-95acbb99324c]
[2017-05-27 13:46:22,552: INFO/MainProcess] Received task: celery_test.add[0b91cc9d-63c4-4a39-9d03-35ac474252bc]
[2017-05-27 13:46:22,553: INFO/MainProcess] Received task: celery_test.add[1be5a881-064c-4cf8-8a18-10c2fe0400ed]
[2017-05-27 13:46:24,538: INFO/PoolWorker-1] Task celery_test.add[402fb858-d9ec-4565-ab71-fbf4ec531787] succeeded in 2.0080740470439196s: 0
[2017-05-27 13:46:24,540: INFO/PoolWorker-3] Task celery_test.add[311fc7ed-e44a-4119-a0fa-c6849574723e] succeeded in 2.007692627608776s: 2
[2017-05-27 13:46:24,543: INFO/PoolWorker-6] Task celery_test.add[29234f7f-f841-44c2-94fb-b491a074318b] succeeded in 2.007782282307744s: 6
[2017-05-27 13:46:24,543: INFO/PoolWorker-5] Task celery_test.add[af54e423-651b-4b01-a3d1-26ead5ae6af1] succeeded in 2.007910629734397s: 4
[2017-05-27 13:46:24,546: INFO/PoolWorker-8] Task celery_test.add[1a638710-810a-422d-8f2f-d554af3c5a92] succeeded in 2.0075698774307966s: 8
[2017-05-27 13:46:24,548: INFO/PoolWorker-2] Task celery_test.add[f4cd393a-2f02-48dd-b27f-d73806e154da] succeeded in 2.0072230715304613s: 11
[2017-05-27 13:46:24,548: INFO/PoolWorker-10] Task celery_test.add[5b1a6863-1b62-4927-849b-e04e2d34ce7c] succeeded in 2.007256705313921s: 10
[2017-05-27 13:46:24,552: INFO/PoolWorker-7] Task celery_test.add[da8241bf-ce4e-4fe6-bd65-91350da8b163] succeeded in 2.0082139261066914s: 12
[2017-05-27 13:46:24,554: INFO/PoolWorker-9] Task celery_test.add[c883b6ec-3842-4b50-bfab-35613f1724ed] succeeded in 2.0077442210167646s: 14
[2017-05-27 13:46:24,554: INFO/PoolWorker-4] Task celery_test.add[4892b31a-e488-4011-86e7-d55eb941cf1f] succeeded in 2.007783567532897s: 13
[2017-05-27 13:46:26,542: INFO/PoolWorker-1] Task celery_test.add[1b021f4c-b41d-46a8-8548-7016539e8a8b] succeeded in 2.002950184047222s: 15
[2017-05-27 13:46:26,544: INFO/PoolWorker-3] Task celery_test.add[ae5d9b7d-0fa2-493b-aa0b-0b13b3764fdf] succeeded in 2.002891855314374s: 16
[2017-05-27 13:46:26,547: INFO/PoolWorker-5] Task celery_test.add[af0d27fe-394f-4fbb-821e-95acbb99324c] succeeded in 2.002899706363678s: 17
[2017-05-27 13:46:26,547: INFO/PoolWorker-6] Task celery_test.add[0b91cc9d-63c4-4a39-9d03-35ac474252bc] succeeded in 2.002899182960391s: 18
[2017-05-27 13:46:26,550: INFO/PoolWorker-8] Task celery_test.add[1be5a881-064c-4cf8-8a18-10c2fe0400ed] succeeded in 2.0029856264591217s: 19
However, the retrieval of the results is delayed and takes in total 20 seconds:
1.9911870956420898 0
0.0006098747253417969 1
0.0011210441589355469 2
2.003366231918335 3
0.0006439685821533203 4
2.0034918785095215 5
1.0012683868408203 6
1.00254487991333 7
1.001213788986206 8
1.002840518951416 9
1.0012362003326416 10
1.001204490661621 11
1.00126314163208 12
1.0012261867523193 13
1.0012249946594238 14
1.0012695789337158 15
1.0013458728790283 16
1.0013868808746338 17
1.0014445781707764 18
1.001399278640747 19
I have two questions:
How to setup celery for optimal distribution of jobs? E.g. a good solution would be to assign 1+1 tasks to the first node and 10+8 tasks to the second node.
IMHO the retrieval of results should take as long as the slowest worker, i.e. 10 seconds. Why does it take much longer? How to speed it up?
Turning off the first worker, the second worker needs 4 seconds for 10+10 tasks (ok), and retrieving the results takes 5 seconds. Why do I still loose a second?

Understanding celery task prefetching reduces the runtime to 4 or 5 seconds. Still sometimes 1 second too much, but 1. and 2. are solved.
Since it seems to be a different problem, I asked 3. again in a separate question: python celery - get() is delayed

Related

Sending a chain tasks will run out of order when I want to send task to startup of a worker

I want to send a chain task at startup o worker like in this https://stackoverflow.com/a/14589445/3922534 question, but task run out of order.
Logs from worker
[2022-07-12 20:51:47,369: INFO/MainProcess] Task task.add_newspapers[5de1f446-65af-472a-a4b6-d9752142b588] received
[2022-07-12 20:51:47,372: WARNING/MainProcess] Now Runing Newspaper Function
[2022-07-12 20:51:47,408: INFO/MainProcess] Task task.check_tasks_are_created[33d6b9d1-660b-4a80-a726-6f167e246480] received
[2022-07-12 20:51:47,412: WARNING/MainProcess] Now Runing Podcast Function
[2022-07-12 20:51:47,427: INFO/MainProcess] Task task.add_newspapers[5de1f446-65af-472a-a4b6-d9752142b588] succeeded in 0.0470000000204891s: 'Now Runing Podcast Function'
[2022-07-12 20:51:47,432: INFO/MainProcess] Task task.add_yt_channels[26179491-2632-46bd-95c1-9e9dbb9e8130] received
[2022-07-12 20:51:47,433: WARNING/MainProcess] None
[2022-07-12 20:51:47,457: INFO/MainProcess] Task task.check_tasks_are_created[33d6b9d1-660b-4a80-a726-6f167e246480] succeeded in 0.0470000000204891s: None
[2022-07-12 20:51:47,463: INFO/MainProcess] Task task.add_podcasts[ad94a119-c6b2-475a-807b-b1a73bef589e] received
[2022-07-12 20:51:47,468: WARNING/MainProcess] Now Runing Check Tasks are Created Function
[2022-07-12 20:51:47,501: INFO/MainProcess] Task task.add_yt_channels[26179491-2632-46bd-95c1-9e9dbb9e8130] succeeded in 0.06299999984912574s: 'Now Runing Check Tasks are Created Function'
[2022-07-12 20:51:47,504: INFO/MainProcess] Task task.add_podcasts[ad94a119-c6b2-475a-807b-b1a73bef589e] succeeded in 0.030999999959021807s: 'Now Runing Yotube Channels Function'
Code How i send the task:
#worker_ready.connect
def at_start(sender, **k):
with sender.app.connection() as conn:
#sender.app.send_task(name='task.print_word', args=["I Send Task On Startup"],connection=conn,)
#ch = [add_newspapers.s(),add_podcasts.s(),add_yt_channels.s(),check_tasks_are_created.s()]
ch = [
signature("task.add_podcasts"),
signature("task.add_yt_channels"),
signature("task.check_tasks_are_created"),
]
sender.app.send_task(name='task.add_newspapers',chain=ch,connection=conn,)
Then I try it to run chain task like normally run apply_async(), but it runs at every worker. I want to run just once at one worker
#worker_ready.connect
def at_start(sender, **k):
chain(add_newspapers.s(),add_podcasts.s(),add_yt_channels.s(),check_tasks_are_created.s()).apply_async()
Then I try it to recognize the worker then apply .apply_async(), but it does not catch the if statment.
Documentation https://docs.celeryq.dev/en/latest/userguide/signals.html#celeryd-init
celery -A celery_app.celery worker --loglevel=INFO -P gevent --concurrency=40 -n celeryworker1
#worker_ready.connect
def at_start(sender, **k):
print("This is host name ", sender.hostname)
if sender == "celery#celeryworker1":
with sender.app.connection() as conn:
chain(add_newspapers.s(),add_podcasts.s(),add_yt_channels.s(),check_tasks_are_created.s()).apply_async()
Am I doing something wrong or is it just a bug?
Since a task doesn't need the return value of the previous task you can run it as:
chain(add_newspapers.si(),add_podcasts.si(),add_yt_channels.si(),check_tasks_are_created.si()).apply_async()
(change call from s() to si()
You can read about immutability here.
#worker_ready.connect handler will run on every worker. So, if you have 10 workers, you will send the same task 10 times, when they broadcast the "worker_ready" signal. Is this intentional?

How can run multiple celery tasks in parallel (by using group)?

I am new to Celery. I want to run demo_task in parallel, but it runs tasks sequentially instead of in parallel. Please let me know if I did something wrong.
import time
from celery import Celery
from celery import chain, group, chord, chunks
import pandas as pd
CONFIG = {
'BROKER_URL': 'redis://localhost:6379/0',
'CELERY_RESULT_BACKEND': 'redis://localhost:6379/0',
}
app = Celery()
app.config_from_object(CONFIG)
#app.task(name='demo_task')
def demo_task(x, y):
print("demo_task", x, y)
pd.DataFrame({"a": [1, 2, 3], "b": [2, 3, 4]}).to_csv(f"demo{x}.csv", index=False)
print("saved")
time.sleep(8)
def run_task():
print("start chain_call")
t = group(*[demo_task.signature((3, 3)),
demo_task.signature((4, 4)),
demo_task.signature((5, 5))]
).apply_async()
if __name__ == '__main__':
run_task()
[Command]
celery -A celery_demo worker -l info --pool=solo --purge
[Log]
[2022-04-22 16:29:51,668: WARNING/MainProcess] Please run `celery upgrade settings path/to/settings.py` to avoid these warnings and to allow a smoother upgrade to Celery 6.0.
[2022-04-22 16:29:51,668: INFO/MainProcess] Connected to redis://localhost:6379/0
[2022-04-22 16:29:51,668: INFO/MainProcess] mingle: searching for neighbors
[2022-04-22 16:29:52,672: INFO/MainProcess] mingle: all alone
[2022-04-22 16:30:05,602: WARNING/MainProcess]
[2022-04-22 16:30:05,602: WARNING/MainProcess] 4
[2022-04-22 16:30:05,602: WARNING/MainProcess]
[2022-04-22 16:30:05,602: WARNING/MainProcess] 4
[2022-04-22 16:30:05,602: WARNING/MainProcess] saved
[2022-04-22 16:30:13,614: INFO/MainProcess] Task demo_task[c017c03e-b49d-4d54-85c5-4af57dd55908] succeeded in 8.016000000061467s: None
[2022-04-22 16:30:13,614: INFO/MainProcess] Task demo_task[d60071c6-4332-4ec1-88fd-3fce79c06ab5] received
[2022-04-22 16:30:13,614: WARNING/MainProcess] demo_task
[2022-04-22 16:30:13,614: WARNING/MainProcess]
[2022-04-22 16:30:13,614: WARNING/MainProcess] 5
[2022-04-22 16:30:13,614: WARNING/MainProcess]
[2022-04-22 16:30:13,614: WARNING/MainProcess] 5
[2022-04-22 16:30:13,614: WARNING/MainProcess] saved
[2022-04-22 16:30:21,634: INFO/MainProcess] Task demo_task[d60071c6-4332-4ec1-88fd-3fce79c06ab5] succeeded in 8.015000000130385s: None
How do you expect tasks to run in parallel if you use the "solo" pool?
Instead, start with the prefork concurrency (the default): celery -A celery_demo worker -l info -c 8
This will make Celery worker spawn 8 worker processes that can execute tasks in parallel. If your machine has more than 8 cores then you could increase that number from 8 to N where N is number of cores available on the host machine. I always go for N-1 to let the system have one more spare core for some other stuff.
Prefork concurrency is great for CPU-bound tasks. If your tasks are more about I/O, then give the "gevent" or "eventlet" concurrency type a try.
Modify your run_task function
async def run_task():
print("start chain_call")
t = await group(*[demo_task.signature((3, 3)),
demo_task.signature((4, 4)),
demo_task.signature((5, 5))]
).apply_async()

How to run a task periodically in celery?

I want to run a task every 10 seconds by celery periodic task. This is my code in celery.py:
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'DjangoCelery1.settings')
app = Celery('DjangoCelery1')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
#app.on_after_finalize.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(10, test.s('hello'), name='add every 10')
#app.task
def test(arg):
print(arg)
with open("test.txt", "w") as myfile:
myfile.write(arg)
Then I run it by the following command:
celery -A DjangoCelery1 beat -l info
It seems to run and in the terminal, I give the following message:
celery beat v4.4.2 (cliffs) is starting.
__ - ... __ - _
LocalTime -> 2020-04-26 15:56:48
Configuration ->
. broker -> amqp://guest:**#localhost:5672//
. loader -> celery.loaders.app.AppLoader
. scheduler -> celery.beat.PersistentScheduler
. db -> celerybeat-schedule
. logfile -> [stderr]#%INFO
. maxinterval -> 5.00 minutes (300s)
[2020-04-26 15:56:48,483: INFO/MainProcess] beat: Starting...
[2020-04-26 15:56:48,499: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
[2020-04-26 15:56:53,492: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
[2020-04-26 15:56:58,492: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
[2020-04-26 15:57:03,492: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
[2020-04-26 15:57:08,492: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
[2020-04-26 15:57:13,492: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
But, the task is not run and there is no printed message and created text file.
what is the problem?
This is the beat process - now you need to run another process:
celery -A tasks worker ...
so a worker could consume the tasks that you'ree triggering via beats and handle them.

Celery duplicate workers on long tasks

Task run once and long celery tasks(5-6 hours long) starts to duplicate itself approximately every hour up to 4(concurrency parameter).
Logs:
[2016-08-19 07:43:08,505: INFO/MainProcess] Received task: doit[ed09d5fd-ba07-4cd5-96eb-7ae546bf94db]
[2016-08-19 07:45:44,067: INFO/MainProcess] Received task: doit[7cbc4633-0687-499f-876c-3298ffdf90f9]
[2016-08-19 08:41:16,611: INFO/MainProcess] Received task: doit[ed09d5fd-ba07-4cd5-96eb-7ae546bf94db]
[2016-08-19 08:48:36,623: INFO/MainProcess] Received task: doit[7cbc4633-0687-499f-876c-3298ffdf90f9]
Task code
#task()
def doit(company_id, cls):
p = cls.objects.get(id=company_id)
Celery worker start with --concurrency=4 -Ofair, broker - Redis 3.0.5 version.
Python package versions:
Django==1.8.14
celery==3.1.18
redis==2.10.3

Task dependency in celery

Is such task dependency possible? 1 and 2 can be executed in parallel. 1a can be only executed when 1 is finished but 12b can be executed if both 1 and 2 are finished.
I know that I can make 1 and 2 a group, and then group(1, 2) | 12b can be a chain but how to make it so 1a starts just after 1 is finished, no matter what is going on with 2?
Yes it is possible. Here is one way to do it. I used celery signal task_success to connect to a function which triggers a celery task
my_tasks.py
from celery import Celery, task
from celery.signals import task_success
c = Celery('my_tasks')
#task
def t1():
print('t1')
#task
def t2():
print('t2')
#task
def t11():
print('t11')
#task
def t12():
print('t12')
def trigger_task(*args, **kwargs):
t11.s().delay()
task_success.connect(trigger_task, sender=t1)
Testing the task:
In [6]: complex_task = chain(group(t1.s(), t2.s())(), t12.si().delay())
Here is the log.
[2014-10-10 12:31:05,082: INFO/MainProcess] Received task: my_tasks.t1[25dc70d2-263b-4e70-b9f2-56478bfedab5]
[2014-10-10 12:31:05,083: INFO/MainProcess] Received task: my_tasks.t2[0b0c5eb6-78fa-4900-a605-5bfd55c0d309]
[2014-10-10 12:31:05,084: INFO/MainProcess] Received task: my_tasks.t12[b08c616d-7a2d-4f7b-9298-2c8324b747ff]
[2014-10-10 12:31:05,084: WARNING/Worker-1] t1
[2014-10-10 12:31:05,084: WARNING/Worker-4] t2
[2014-10-10 12:31:05,085: WARNING/Worker-3] t12
[2014-10-10 12:31:05,086: INFO/MainProcess] Task my_tasks.t2[0b0c5eb6-78fa-4900-a605-5bfd55c0d309] succeeded in 0.00143978099914s: None
[2014-10-10 12:31:05,086: INFO/MainProcess] Task my_tasks.t1[25dc70d2-263b-4e70-b9f2-56478bfedab5] succeeded in 0.00191083699974s: None
[2014-10-10 12:31:05,087: INFO/MainProcess] Task my_tasks.t12[b08c616d-7a2d-4f7b-9298-2c8324b747ff] succeeded in 0.00184817300033s: None
[2014-10-10 12:31:05,087: INFO/MainProcess] Received task: my_tasks.t11[a3e3f0c6-ac1f-4888-893a-02eee3b29585]
[2014-10-10 12:31:05,088: WARNING/Worker-2] t11
[2014-10-10 12:31:05,089: INFO/MainProcess] Task my_tasks.t11[a3e3f0c6-ac1f-4888-893a-02eee3b29585] succeeded in 0.000978848000159s: None
I tried to connect directly to task but it is throwing some error.

Categories

Resources