I am new to Celery. I want to run demo_task in parallel, but it runs tasks sequentially instead of in parallel. Please let me know if I did something wrong.
import time
from celery import Celery
from celery import chain, group, chord, chunks
import pandas as pd
CONFIG = {
'BROKER_URL': 'redis://localhost:6379/0',
'CELERY_RESULT_BACKEND': 'redis://localhost:6379/0',
}
app = Celery()
app.config_from_object(CONFIG)
#app.task(name='demo_task')
def demo_task(x, y):
print("demo_task", x, y)
pd.DataFrame({"a": [1, 2, 3], "b": [2, 3, 4]}).to_csv(f"demo{x}.csv", index=False)
print("saved")
time.sleep(8)
def run_task():
print("start chain_call")
t = group(*[demo_task.signature((3, 3)),
demo_task.signature((4, 4)),
demo_task.signature((5, 5))]
).apply_async()
if __name__ == '__main__':
run_task()
[Command]
celery -A celery_demo worker -l info --pool=solo --purge
[Log]
[2022-04-22 16:29:51,668: WARNING/MainProcess] Please run `celery upgrade settings path/to/settings.py` to avoid these warnings and to allow a smoother upgrade to Celery 6.0.
[2022-04-22 16:29:51,668: INFO/MainProcess] Connected to redis://localhost:6379/0
[2022-04-22 16:29:51,668: INFO/MainProcess] mingle: searching for neighbors
[2022-04-22 16:29:52,672: INFO/MainProcess] mingle: all alone
[2022-04-22 16:30:05,602: WARNING/MainProcess]
[2022-04-22 16:30:05,602: WARNING/MainProcess] 4
[2022-04-22 16:30:05,602: WARNING/MainProcess]
[2022-04-22 16:30:05,602: WARNING/MainProcess] 4
[2022-04-22 16:30:05,602: WARNING/MainProcess] saved
[2022-04-22 16:30:13,614: INFO/MainProcess] Task demo_task[c017c03e-b49d-4d54-85c5-4af57dd55908] succeeded in 8.016000000061467s: None
[2022-04-22 16:30:13,614: INFO/MainProcess] Task demo_task[d60071c6-4332-4ec1-88fd-3fce79c06ab5] received
[2022-04-22 16:30:13,614: WARNING/MainProcess] demo_task
[2022-04-22 16:30:13,614: WARNING/MainProcess]
[2022-04-22 16:30:13,614: WARNING/MainProcess] 5
[2022-04-22 16:30:13,614: WARNING/MainProcess]
[2022-04-22 16:30:13,614: WARNING/MainProcess] 5
[2022-04-22 16:30:13,614: WARNING/MainProcess] saved
[2022-04-22 16:30:21,634: INFO/MainProcess] Task demo_task[d60071c6-4332-4ec1-88fd-3fce79c06ab5] succeeded in 8.015000000130385s: None
How do you expect tasks to run in parallel if you use the "solo" pool?
Instead, start with the prefork concurrency (the default): celery -A celery_demo worker -l info -c 8
This will make Celery worker spawn 8 worker processes that can execute tasks in parallel. If your machine has more than 8 cores then you could increase that number from 8 to N where N is number of cores available on the host machine. I always go for N-1 to let the system have one more spare core for some other stuff.
Prefork concurrency is great for CPU-bound tasks. If your tasks are more about I/O, then give the "gevent" or "eventlet" concurrency type a try.
Modify your run_task function
async def run_task():
print("start chain_call")
t = await group(*[demo_task.signature((3, 3)),
demo_task.signature((4, 4)),
demo_task.signature((5, 5))]
).apply_async()
Related
I want to send a chain task at startup o worker like in this https://stackoverflow.com/a/14589445/3922534 question, but task run out of order.
Logs from worker
[2022-07-12 20:51:47,369: INFO/MainProcess] Task task.add_newspapers[5de1f446-65af-472a-a4b6-d9752142b588] received
[2022-07-12 20:51:47,372: WARNING/MainProcess] Now Runing Newspaper Function
[2022-07-12 20:51:47,408: INFO/MainProcess] Task task.check_tasks_are_created[33d6b9d1-660b-4a80-a726-6f167e246480] received
[2022-07-12 20:51:47,412: WARNING/MainProcess] Now Runing Podcast Function
[2022-07-12 20:51:47,427: INFO/MainProcess] Task task.add_newspapers[5de1f446-65af-472a-a4b6-d9752142b588] succeeded in 0.0470000000204891s: 'Now Runing Podcast Function'
[2022-07-12 20:51:47,432: INFO/MainProcess] Task task.add_yt_channels[26179491-2632-46bd-95c1-9e9dbb9e8130] received
[2022-07-12 20:51:47,433: WARNING/MainProcess] None
[2022-07-12 20:51:47,457: INFO/MainProcess] Task task.check_tasks_are_created[33d6b9d1-660b-4a80-a726-6f167e246480] succeeded in 0.0470000000204891s: None
[2022-07-12 20:51:47,463: INFO/MainProcess] Task task.add_podcasts[ad94a119-c6b2-475a-807b-b1a73bef589e] received
[2022-07-12 20:51:47,468: WARNING/MainProcess] Now Runing Check Tasks are Created Function
[2022-07-12 20:51:47,501: INFO/MainProcess] Task task.add_yt_channels[26179491-2632-46bd-95c1-9e9dbb9e8130] succeeded in 0.06299999984912574s: 'Now Runing Check Tasks are Created Function'
[2022-07-12 20:51:47,504: INFO/MainProcess] Task task.add_podcasts[ad94a119-c6b2-475a-807b-b1a73bef589e] succeeded in 0.030999999959021807s: 'Now Runing Yotube Channels Function'
Code How i send the task:
#worker_ready.connect
def at_start(sender, **k):
with sender.app.connection() as conn:
#sender.app.send_task(name='task.print_word', args=["I Send Task On Startup"],connection=conn,)
#ch = [add_newspapers.s(),add_podcasts.s(),add_yt_channels.s(),check_tasks_are_created.s()]
ch = [
signature("task.add_podcasts"),
signature("task.add_yt_channels"),
signature("task.check_tasks_are_created"),
]
sender.app.send_task(name='task.add_newspapers',chain=ch,connection=conn,)
Then I try it to run chain task like normally run apply_async(), but it runs at every worker. I want to run just once at one worker
#worker_ready.connect
def at_start(sender, **k):
chain(add_newspapers.s(),add_podcasts.s(),add_yt_channels.s(),check_tasks_are_created.s()).apply_async()
Then I try it to recognize the worker then apply .apply_async(), but it does not catch the if statment.
Documentation https://docs.celeryq.dev/en/latest/userguide/signals.html#celeryd-init
celery -A celery_app.celery worker --loglevel=INFO -P gevent --concurrency=40 -n celeryworker1
#worker_ready.connect
def at_start(sender, **k):
print("This is host name ", sender.hostname)
if sender == "celery#celeryworker1":
with sender.app.connection() as conn:
chain(add_newspapers.s(),add_podcasts.s(),add_yt_channels.s(),check_tasks_are_created.s()).apply_async()
Am I doing something wrong or is it just a bug?
Since a task doesn't need the return value of the previous task you can run it as:
chain(add_newspapers.si(),add_podcasts.si(),add_yt_channels.si(),check_tasks_are_created.si()).apply_async()
(change call from s() to si()
You can read about immutability here.
#worker_ready.connect handler will run on every worker. So, if you have 10 workers, you will send the same task 10 times, when they broadcast the "worker_ready" signal. Is this intentional?
I want to run a task every 10 seconds by celery periodic task. This is my code in celery.py:
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'DjangoCelery1.settings')
app = Celery('DjangoCelery1')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
#app.on_after_finalize.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(10, test.s('hello'), name='add every 10')
#app.task
def test(arg):
print(arg)
with open("test.txt", "w") as myfile:
myfile.write(arg)
Then I run it by the following command:
celery -A DjangoCelery1 beat -l info
It seems to run and in the terminal, I give the following message:
celery beat v4.4.2 (cliffs) is starting.
__ - ... __ - _
LocalTime -> 2020-04-26 15:56:48
Configuration ->
. broker -> amqp://guest:**#localhost:5672//
. loader -> celery.loaders.app.AppLoader
. scheduler -> celery.beat.PersistentScheduler
. db -> celerybeat-schedule
. logfile -> [stderr]#%INFO
. maxinterval -> 5.00 minutes (300s)
[2020-04-26 15:56:48,483: INFO/MainProcess] beat: Starting...
[2020-04-26 15:56:48,499: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
[2020-04-26 15:56:53,492: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
[2020-04-26 15:56:58,492: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
[2020-04-26 15:57:03,492: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
[2020-04-26 15:57:08,492: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
[2020-04-26 15:57:13,492: INFO/MainProcess] Scheduler: Sending due task add every 10 (DjangoCelery1.celery.test)
But, the task is not run and there is no printed message and created text file.
what is the problem?
This is the beat process - now you need to run another process:
celery -A tasks worker ...
so a worker could consume the tasks that you'ree triggering via beats and handle them.
Task run once and long celery tasks(5-6 hours long) starts to duplicate itself approximately every hour up to 4(concurrency parameter).
Logs:
[2016-08-19 07:43:08,505: INFO/MainProcess] Received task: doit[ed09d5fd-ba07-4cd5-96eb-7ae546bf94db]
[2016-08-19 07:45:44,067: INFO/MainProcess] Received task: doit[7cbc4633-0687-499f-876c-3298ffdf90f9]
[2016-08-19 08:41:16,611: INFO/MainProcess] Received task: doit[ed09d5fd-ba07-4cd5-96eb-7ae546bf94db]
[2016-08-19 08:48:36,623: INFO/MainProcess] Received task: doit[7cbc4633-0687-499f-876c-3298ffdf90f9]
Task code
#task()
def doit(company_id, cls):
p = cls.objects.get(id=company_id)
Celery worker start with --concurrency=4 -Ofair, broker - Redis 3.0.5 version.
Python package versions:
Django==1.8.14
celery==3.1.18
redis==2.10.3
When I run celery -A tasks2.celery worker -B I want to see "celery task" printed every second. Currently nothing is printed. Why isn't this working?
from app import app
from celery import Celery
from datetime import timedelta
celery = Celery(app.name, broker='amqp://guest:#localhost/', backend='amqp://guest:#localhost/')
celery.conf.update(CELERY_TASK_RESULT_EXPIRES=3600,)
#celery.task
def add(x, y):
print "celery task"
return x + y
CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks2.add',
'schedule': timedelta(seconds=1),
'args': (16, 16)
},
}
This is the only output after staring the worker and beat:
[tasks]
. tasks2.add
[INFO/Beat] beat: Starting...
[INFO/MainProcess] Connected to amqp://guest:**#127.0.0.1:5672//
[INFO/MainProcess] mingle: searching for neighbors
[INFO/MainProcess] mingle: all alone
You wrote the schedule, but didn't add it to the celery config. So beat saw no scheduled tasks to send. The example below uses celery.config_from_object(__name__) to pick up config values from the current module, but you can use any other config method as well.
Once you configure it properly, you will see messages from beat about sending scheduled tasks, as well as the output from those tasks as the worker receives and runs them.
from celery import Celery
from datetime import timedelta
celery = Celery(__name__)
celery.config_from_object(__name__)
#celery.task
def say_hello():
print('Hello, World!')
CELERYBEAT_SCHEDULE = {
'every-second': {
'task': 'example.say_hello',
'schedule': timedelta(seconds=5),
},
}
$ celery -A example.celery worker -B -l info
[tasks]
. example.say_hello
[2015-07-15 08:23:54,350: INFO/Beat] beat: Starting...
[2015-07-15 08:23:54,366: INFO/MainProcess] Connected to amqp://guest:**#127.0.0.1:5672//
[2015-07-15 08:23:54,377: INFO/MainProcess] mingle: searching for neighbors
[2015-07-15 08:23:55,385: INFO/MainProcess] mingle: all alone
[2015-07-15 08:23:55,411: WARNING/MainProcess] celery#netsec-ast-15 ready.
[2015-07-15 08:23:59,471: INFO/Beat] Scheduler: Sending due task every-second (example.say_hello)
[2015-07-15 08:23:59,481: INFO/MainProcess] Received task: example.say_hello[2a9d31cb-fe11-47c8-9aa2-51690d47c007]
[2015-07-15 08:23:59,483: WARNING/Worker-3] Hello, World!
[2015-07-15 08:23:59,484: INFO/MainProcess] Task example.say_hello[2a9d31cb-fe11-47c8-9aa2-51690d47c007] succeeded in 0.0012782540870830417s: None
In version 4.1.0 you have to add logger to your task.py file like so:
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
#task(name="multiply_two_numbers")
def mul(x, y):
total = x * (y * random.randint(3, 100))
#HERE:
logger.info('Adding {0} + {1}'.format(x, y))
return total
Stated halfway down in the docs here if you want more info:
http://docs.celeryproject.org/en/latest/userguide/tasks.html
Make sure you run celery beat worker for scheduled tasks:
celery beat --app app.celery
Check the docs here: http://celery.readthedocs.org/en/latest/userguide/periodic-tasks.html#starting-the-scheduler
I actually had issues printing mine on command prompt because I was using the wrong command but I found a link to a project which I forked Project
(If on Mac ) celery -A Project worker --loglevel=info
(If on Windows) celery -A Project worker -l info --pool=solo
Is such task dependency possible? 1 and 2 can be executed in parallel. 1a can be only executed when 1 is finished but 12b can be executed if both 1 and 2 are finished.
I know that I can make 1 and 2 a group, and then group(1, 2) | 12b can be a chain but how to make it so 1a starts just after 1 is finished, no matter what is going on with 2?
Yes it is possible. Here is one way to do it. I used celery signal task_success to connect to a function which triggers a celery task
my_tasks.py
from celery import Celery, task
from celery.signals import task_success
c = Celery('my_tasks')
#task
def t1():
print('t1')
#task
def t2():
print('t2')
#task
def t11():
print('t11')
#task
def t12():
print('t12')
def trigger_task(*args, **kwargs):
t11.s().delay()
task_success.connect(trigger_task, sender=t1)
Testing the task:
In [6]: complex_task = chain(group(t1.s(), t2.s())(), t12.si().delay())
Here is the log.
[2014-10-10 12:31:05,082: INFO/MainProcess] Received task: my_tasks.t1[25dc70d2-263b-4e70-b9f2-56478bfedab5]
[2014-10-10 12:31:05,083: INFO/MainProcess] Received task: my_tasks.t2[0b0c5eb6-78fa-4900-a605-5bfd55c0d309]
[2014-10-10 12:31:05,084: INFO/MainProcess] Received task: my_tasks.t12[b08c616d-7a2d-4f7b-9298-2c8324b747ff]
[2014-10-10 12:31:05,084: WARNING/Worker-1] t1
[2014-10-10 12:31:05,084: WARNING/Worker-4] t2
[2014-10-10 12:31:05,085: WARNING/Worker-3] t12
[2014-10-10 12:31:05,086: INFO/MainProcess] Task my_tasks.t2[0b0c5eb6-78fa-4900-a605-5bfd55c0d309] succeeded in 0.00143978099914s: None
[2014-10-10 12:31:05,086: INFO/MainProcess] Task my_tasks.t1[25dc70d2-263b-4e70-b9f2-56478bfedab5] succeeded in 0.00191083699974s: None
[2014-10-10 12:31:05,087: INFO/MainProcess] Task my_tasks.t12[b08c616d-7a2d-4f7b-9298-2c8324b747ff] succeeded in 0.00184817300033s: None
[2014-10-10 12:31:05,087: INFO/MainProcess] Received task: my_tasks.t11[a3e3f0c6-ac1f-4888-893a-02eee3b29585]
[2014-10-10 12:31:05,088: WARNING/Worker-2] t11
[2014-10-10 12:31:05,089: INFO/MainProcess] Task my_tasks.t11[a3e3f0c6-ac1f-4888-893a-02eee3b29585] succeeded in 0.000978848000159s: None
I tried to connect directly to task but it is throwing some error.