Celery duplicate workers on long tasks - python

Task run once and long celery tasks(5-6 hours long) starts to duplicate itself approximately every hour up to 4(concurrency parameter).
Logs:
[2016-08-19 07:43:08,505: INFO/MainProcess] Received task: doit[ed09d5fd-ba07-4cd5-96eb-7ae546bf94db]
[2016-08-19 07:45:44,067: INFO/MainProcess] Received task: doit[7cbc4633-0687-499f-876c-3298ffdf90f9]
[2016-08-19 08:41:16,611: INFO/MainProcess] Received task: doit[ed09d5fd-ba07-4cd5-96eb-7ae546bf94db]
[2016-08-19 08:48:36,623: INFO/MainProcess] Received task: doit[7cbc4633-0687-499f-876c-3298ffdf90f9]
Task code
#task()
def doit(company_id, cls):
p = cls.objects.get(id=company_id)
Celery worker start with --concurrency=4 -Ofair, broker - Redis 3.0.5 version.
Python package versions:
Django==1.8.14
celery==3.1.18
redis==2.10.3

Related

Sending a chain tasks will run out of order when I want to send task to startup of a worker

I want to send a chain task at startup o worker like in this https://stackoverflow.com/a/14589445/3922534 question, but task run out of order.
Logs from worker
[2022-07-12 20:51:47,369: INFO/MainProcess] Task task.add_newspapers[5de1f446-65af-472a-a4b6-d9752142b588] received
[2022-07-12 20:51:47,372: WARNING/MainProcess] Now Runing Newspaper Function
[2022-07-12 20:51:47,408: INFO/MainProcess] Task task.check_tasks_are_created[33d6b9d1-660b-4a80-a726-6f167e246480] received
[2022-07-12 20:51:47,412: WARNING/MainProcess] Now Runing Podcast Function
[2022-07-12 20:51:47,427: INFO/MainProcess] Task task.add_newspapers[5de1f446-65af-472a-a4b6-d9752142b588] succeeded in 0.0470000000204891s: 'Now Runing Podcast Function'
[2022-07-12 20:51:47,432: INFO/MainProcess] Task task.add_yt_channels[26179491-2632-46bd-95c1-9e9dbb9e8130] received
[2022-07-12 20:51:47,433: WARNING/MainProcess] None
[2022-07-12 20:51:47,457: INFO/MainProcess] Task task.check_tasks_are_created[33d6b9d1-660b-4a80-a726-6f167e246480] succeeded in 0.0470000000204891s: None
[2022-07-12 20:51:47,463: INFO/MainProcess] Task task.add_podcasts[ad94a119-c6b2-475a-807b-b1a73bef589e] received
[2022-07-12 20:51:47,468: WARNING/MainProcess] Now Runing Check Tasks are Created Function
[2022-07-12 20:51:47,501: INFO/MainProcess] Task task.add_yt_channels[26179491-2632-46bd-95c1-9e9dbb9e8130] succeeded in 0.06299999984912574s: 'Now Runing Check Tasks are Created Function'
[2022-07-12 20:51:47,504: INFO/MainProcess] Task task.add_podcasts[ad94a119-c6b2-475a-807b-b1a73bef589e] succeeded in 0.030999999959021807s: 'Now Runing Yotube Channels Function'
Code How i send the task:
#worker_ready.connect
def at_start(sender, **k):
with sender.app.connection() as conn:
#sender.app.send_task(name='task.print_word', args=["I Send Task On Startup"],connection=conn,)
#ch = [add_newspapers.s(),add_podcasts.s(),add_yt_channels.s(),check_tasks_are_created.s()]
ch = [
signature("task.add_podcasts"),
signature("task.add_yt_channels"),
signature("task.check_tasks_are_created"),
]
sender.app.send_task(name='task.add_newspapers',chain=ch,connection=conn,)
Then I try it to run chain task like normally run apply_async(), but it runs at every worker. I want to run just once at one worker
#worker_ready.connect
def at_start(sender, **k):
chain(add_newspapers.s(),add_podcasts.s(),add_yt_channels.s(),check_tasks_are_created.s()).apply_async()
Then I try it to recognize the worker then apply .apply_async(), but it does not catch the if statment.
Documentation https://docs.celeryq.dev/en/latest/userguide/signals.html#celeryd-init
celery -A celery_app.celery worker --loglevel=INFO -P gevent --concurrency=40 -n celeryworker1
#worker_ready.connect
def at_start(sender, **k):
print("This is host name ", sender.hostname)
if sender == "celery#celeryworker1":
with sender.app.connection() as conn:
chain(add_newspapers.s(),add_podcasts.s(),add_yt_channels.s(),check_tasks_are_created.s()).apply_async()
Am I doing something wrong or is it just a bug?
Since a task doesn't need the return value of the previous task you can run it as:
chain(add_newspapers.si(),add_podcasts.si(),add_yt_channels.si(),check_tasks_are_created.si()).apply_async()
(change call from s() to si()
You can read about immutability here.
#worker_ready.connect handler will run on every worker. So, if you have 10 workers, you will send the same task 10 times, when they broadcast the "worker_ready" signal. Is this intentional?

How can run multiple celery tasks in parallel (by using group)?

I am new to Celery. I want to run demo_task in parallel, but it runs tasks sequentially instead of in parallel. Please let me know if I did something wrong.
import time
from celery import Celery
from celery import chain, group, chord, chunks
import pandas as pd
CONFIG = {
'BROKER_URL': 'redis://localhost:6379/0',
'CELERY_RESULT_BACKEND': 'redis://localhost:6379/0',
}
app = Celery()
app.config_from_object(CONFIG)
#app.task(name='demo_task')
def demo_task(x, y):
print("demo_task", x, y)
pd.DataFrame({"a": [1, 2, 3], "b": [2, 3, 4]}).to_csv(f"demo{x}.csv", index=False)
print("saved")
time.sleep(8)
def run_task():
print("start chain_call")
t = group(*[demo_task.signature((3, 3)),
demo_task.signature((4, 4)),
demo_task.signature((5, 5))]
).apply_async()
if __name__ == '__main__':
run_task()
[Command]
celery -A celery_demo worker -l info --pool=solo --purge
[Log]
[2022-04-22 16:29:51,668: WARNING/MainProcess] Please run `celery upgrade settings path/to/settings.py` to avoid these warnings and to allow a smoother upgrade to Celery 6.0.
[2022-04-22 16:29:51,668: INFO/MainProcess] Connected to redis://localhost:6379/0
[2022-04-22 16:29:51,668: INFO/MainProcess] mingle: searching for neighbors
[2022-04-22 16:29:52,672: INFO/MainProcess] mingle: all alone
[2022-04-22 16:30:05,602: WARNING/MainProcess]
[2022-04-22 16:30:05,602: WARNING/MainProcess] 4
[2022-04-22 16:30:05,602: WARNING/MainProcess]
[2022-04-22 16:30:05,602: WARNING/MainProcess] 4
[2022-04-22 16:30:05,602: WARNING/MainProcess] saved
[2022-04-22 16:30:13,614: INFO/MainProcess] Task demo_task[c017c03e-b49d-4d54-85c5-4af57dd55908] succeeded in 8.016000000061467s: None
[2022-04-22 16:30:13,614: INFO/MainProcess] Task demo_task[d60071c6-4332-4ec1-88fd-3fce79c06ab5] received
[2022-04-22 16:30:13,614: WARNING/MainProcess] demo_task
[2022-04-22 16:30:13,614: WARNING/MainProcess]
[2022-04-22 16:30:13,614: WARNING/MainProcess] 5
[2022-04-22 16:30:13,614: WARNING/MainProcess]
[2022-04-22 16:30:13,614: WARNING/MainProcess] 5
[2022-04-22 16:30:13,614: WARNING/MainProcess] saved
[2022-04-22 16:30:21,634: INFO/MainProcess] Task demo_task[d60071c6-4332-4ec1-88fd-3fce79c06ab5] succeeded in 8.015000000130385s: None
How do you expect tasks to run in parallel if you use the "solo" pool?
Instead, start with the prefork concurrency (the default): celery -A celery_demo worker -l info -c 8
This will make Celery worker spawn 8 worker processes that can execute tasks in parallel. If your machine has more than 8 cores then you could increase that number from 8 to N where N is number of cores available on the host machine. I always go for N-1 to let the system have one more spare core for some other stuff.
Prefork concurrency is great for CPU-bound tasks. If your tasks are more about I/O, then give the "gevent" or "eventlet" concurrency type a try.
Modify your run_task function
async def run_task():
print("start chain_call")
t = await group(*[demo_task.signature((3, 3)),
demo_task.signature((4, 4)),
demo_task.signature((5, 5))]
).apply_async()

How to run a task periodically in celery?

I want to run a task every 10 seconds by celery periodic task. This is my code in celery.py:
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'DjangoCelery1.settings')
app = Celery('DjangoCelery1')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
#app.on_after_finalize.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(10, test.s('hello'), name='add every 10')
#app.task
def test(arg):
print(arg)
with open("test.txt", "w") as myfile:
myfile.write(arg)
Then I run it by the following command:
celery -A DjangoCelery1 beat -l info
It seems to run and in the terminal, I give the following message:
celery beat v4.4.2 (cliffs) is starting.
__ - ... __ - _
LocalTime -> 2020-04-26 15:56:48
Configuration ->
. broker -> amqp://guest:**#localhost:5672//
. loader -> celery.loaders.app.AppLoader
. scheduler -> celery.beat.PersistentScheduler
. db -> celerybeat-schedule
. logfile -> [stderr]#%INFO
. maxinterval -> 5.00 minutes (300s)
[2020-04-26 15:56:48,483: INFO/MainProcess] beat: Starting...
[2020-04-26 15:56:48,499: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
[2020-04-26 15:56:53,492: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
[2020-04-26 15:56:58,492: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
[2020-04-26 15:57:03,492: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
[2020-04-26 15:57:08,492: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
[2020-04-26 15:57:13,492: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
But, the task is not run and there is no printed message and created text file.
what is the problem?
This is the beat process - now you need to run another process:
celery -A tasks worker ...
so a worker could consume the tasks that you'ree triggering via beats and handle them.

why do the tasks in celery chain execute out of order?

I have four time-consuming tasks which should execute one by one and the result from previous task can be the input of the next one. So I choose Celery chain to do this. And I do this exampled by the follow code:
mychain = chain(task1.s({'a': 1}), task2.s(), task3.s(), task4.s())
mychain.apply_async()
But the execute order of the tasks is:
enter code here`task1() ---> task4() ---> task3() --->task2()
I don't know what happen.
I run a web server by tornado, and it woke up the tasks by chain.
logging:
[2018-07-23 18:34:12,816][pid:25557][tid:140228657469056][util.py:109] DEBUG: chain: fetch({}) | callback() | convert() | format()
the other tasks run in celery
logging:
[2018-07-23 18:34:12,816: INFO/MainProcess] Received task: fetch[045acf81-274b-457c-8bb5-6d0248264b76]
[2018-07-23 18:34:17,786: INFO/MainProcess] Received task: format[103b4ffa-57db-4b04-a745-7dfee5786695]
[2018-07-23 18:34:18,227: INFO/MainProcess] Received task: convert[81ddbaf9-37b3-406a-b608-a05affa97f45]
[2018-07-23 18:34:20,942: INFO/MainProcess] Received task: callback[b1ea7c70-db45-4501-9859-7ad22532c38a]
The reason is that the celery version of the two machines is different!
And then we set the same celery version, they work!

Task dependency in celery

Is such task dependency possible? 1 and 2 can be executed in parallel. 1a can be only executed when 1 is finished but 12b can be executed if both 1 and 2 are finished.
I know that I can make 1 and 2 a group, and then group(1, 2) | 12b can be a chain but how to make it so 1a starts just after 1 is finished, no matter what is going on with 2?
Yes it is possible. Here is one way to do it. I used celery signal task_success to connect to a function which triggers a celery task
my_tasks.py
from celery import Celery, task
from celery.signals import task_success
c = Celery('my_tasks')
#task
def t1():
print('t1')
#task
def t2():
print('t2')
#task
def t11():
print('t11')
#task
def t12():
print('t12')
def trigger_task(*args, **kwargs):
t11.s().delay()
task_success.connect(trigger_task, sender=t1)
Testing the task:
In [6]: complex_task = chain(group(t1.s(), t2.s())(), t12.si().delay())
Here is the log.
[2014-10-10 12:31:05,082: INFO/MainProcess] Received task: my_tasks.t1[25dc70d2-263b-4e70-b9f2-56478bfedab5]
[2014-10-10 12:31:05,083: INFO/MainProcess] Received task: my_tasks.t2[0b0c5eb6-78fa-4900-a605-5bfd55c0d309]
[2014-10-10 12:31:05,084: INFO/MainProcess] Received task: my_tasks.t12[b08c616d-7a2d-4f7b-9298-2c8324b747ff]
[2014-10-10 12:31:05,084: WARNING/Worker-1] t1
[2014-10-10 12:31:05,084: WARNING/Worker-4] t2
[2014-10-10 12:31:05,085: WARNING/Worker-3] t12
[2014-10-10 12:31:05,086: INFO/MainProcess] Task my_tasks.t2[0b0c5eb6-78fa-4900-a605-5bfd55c0d309] succeeded in 0.00143978099914s: None
[2014-10-10 12:31:05,086: INFO/MainProcess] Task my_tasks.t1[25dc70d2-263b-4e70-b9f2-56478bfedab5] succeeded in 0.00191083699974s: None
[2014-10-10 12:31:05,087: INFO/MainProcess] Task my_tasks.t12[b08c616d-7a2d-4f7b-9298-2c8324b747ff] succeeded in 0.00184817300033s: None
[2014-10-10 12:31:05,087: INFO/MainProcess] Received task: my_tasks.t11[a3e3f0c6-ac1f-4888-893a-02eee3b29585]
[2014-10-10 12:31:05,088: WARNING/Worker-2] t11
[2014-10-10 12:31:05,089: INFO/MainProcess] Task my_tasks.t11[a3e3f0c6-ac1f-4888-893a-02eee3b29585] succeeded in 0.000978848000159s: None
I tried to connect directly to task but it is throwing some error.

Categories

Resources