I'm using Celery on Heroku with RabbitMQ and the following settings:
# settings.py
DEFAULT_AMQP = "amqp://guest:guest#localhost//"
BROKER_URL = os.getenv('CLOUDAMQP_URL', DEFAULT_AMQP)
CELERY_TASK_SERIALIZER = 'pickle'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ENABLE_UTC = True
CELERY_STORE_ERRORS_EVEN_IF_IGNORED = True
CELERY_RESULT_BACKEND = False
BROKER_POOL_LIMIT = 5
# trying to clean up this memory leak
CELERYD_MAX_TASKS_PER_CHILD = 5
CELERYD_TASK_TIME_LIMIT = 60*10 # time limit in seconds--watch if we end up throwing big tasks onto this
I generally call tasks using .delay().
On Heroku, I'd have to provision a separate worker dyno to handle these queued processes. For the development server, I'd like to still call my delayed tasks, but do it in the same thread rather than defer it to a worker queue. This means that every time I call .delay(), on the staging server, I want not to delay the task but actually just to call it as if I had used the normal .__call__() method.
Is there a Celery setting or some other way that I can basically turn Celery off and have calls to .delay() pass to .__call__() instead?
set CELERY_ALWAYS_EAGER to True for the dev env: http://celery.readthedocs.org/en/latest/configuration.html#celery-always-eager
Related
I'm hosting a Django web app on Heroku, and it uses Redis to run a task in the background (this gets kicked off by the user when they submit a form). The task often has to make multiple queries to an API using pexpect, sequentially. When I had it all set up locally, I had no problems with timeouts, but now I regularly run into two problems: TIMEOUT in the pexpect function and "max client connections reached".
I'm using the free tier of Heroku-Redis, so my connection limit is 20. This should be more than enough for what I'm doing, but I notice on the Redis dashboard that when I run the program, the connections stay open for a while afterwards (which is what I suspect is causing the "max connections" error -- the old connections didn't close). I used heroku-redis on the command line to set it so idle connections should close after 15 seconds, but it doesn't seem like they are. I also should only have 4 concurrent connections at the same time (setting in Django settings.py), but I regularly have 15 or so spawned for a single run.
How do I actually limit the number of redis connections generated for my program and make sure they actually close when they finish their job? And how do I prevent TIMEOUTs (which I suspect happen because no connection is available to finish the query)? I don't understand why so many Redis connections are necessary for a single background task.
This is the relevant content in my Django settings.py file. I use celery to run my background task, so I don't directly interface with Redis.
CACHES = {
"default": {
"BACKEND": "redis_cache.RedisCache",
"LOCATION": "redis:// censored",
"OPTIONS": {
'CONNECTION_POOL_CLASS': 'redis.BlockingConnectionPool',
'CONNECTION_POOL_CLASS_KWARGS': {
'max_connections': 4,
'timeout': 15,
'retry_on_timeout': True
},
},
}
}
# the line below lets it run locally
# BROKER_URL and CELERY_RESULT_BACKEND = 'amqp://guest:guest#rabbit:5672/%2F'
# I'm not sure what the difference between BROKER_* and REDIS_* are,
# so I've included them both for good measure
BROKER_BACKEND = "redis"
BROKER_HOST = "amazonaws.com censored"
BROKER_PORT = 20789
BROKER_PASSWORD = "censored"
BROKER_VHOST = "0"
BROKER_POOL_LIMIT = 4 # limit # of celery workers
REDIS_URL = "redis://censored
REDIS_HOST = "amazonaws.com censored"
REDIS_PORT = 20789
REDIS_PASSWORD = "censored"
REDIS_DB = 0
REDIS_CONNECT_RETRY = True
CELERY_RESULT_BACKEND = REDIS_URL
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_REDIS_MAX_CONNECTIONS = 4
CELERYD_TASK_SOFT_TIME_LIMIT = 30
DATABASE_URL = "postgres:// censored"
# I added this for Heroku
django_heroku.settings(locals())
This is where I initialize Celery. I use a #shared_task(bind=True) decorator for my background task function.
app = Celery('aircraftToAirQuality', backend=settings.CELERY_RESULT_BACKEND, broker=settings.REDIS_URL)
app.conf.broker_pool_limit = 0
app.conf.broker_transport_options = {"visibility_timeout": 60}
My UI regularly checks to see the progress of the task to update a progress bar. This is just a simple HTTP Post call.
And this is the pexpect code, in case that's relevant. Since I'm sending sensitive content (passwords and stuff), I've removed much of the text.
def use_pexpect(self, query):
cmd = 'censored'
pex = pexpect.spawn(cmd)
pex.timeout = 30
index = pex.expect(censored)
time.sleep(0.1)
pex.sendline(censored)
pex.sendline(query)
pex.expect(censored)
result = pex.before
pex.sendline('quit;')
pex.close()
return result
Right now, I can infrequently run my task successfully. The longer the task is supposed to run for (make 10 queries instead of just 3), the more likely it is that the program will fail before it finishes running. My background task should be able to make queries sequentially and my UI thread should be able to check its progress without any connection limit or TIMEOUT problems.
Unable to run periodic tasks along with asynchronous tasks together. Although, if I comment out the periodic task, asynchronous tasks are executed fine, else asynchronous tasks are stuck.
Running: celery==4.0.2, Django==2.0, django-celery-beat==1.1.0, django-celery-results==1.0.1
Referred: https://github.com/celery/celery/issues/4184 to choose celery==4.0.2 version, as it seems to work.
Seems to be a known issue
https://github.com/celery/django-celery-beat/issues/27
I've also done some digging the ONLY way I've found to get it back to
normal is to remove all periodic tasks and restart celery beat. ~ rh0dium
celery.py
import django
import os
from celery import Celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'bid.settings')
# Setup django project
django.setup()
app = Celery('bid')
# Using a string here means the worker don't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
settings.py
INSTALLED_APPS = (
...
'django_celery_results',
'django_celery_beat',
)
# Celery related settings
CELERY_BROKER_URL = 'redis://localhost:6379/0'
CELERY_BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 43200, }
CELERY_RESULT_BACKEND = 'django-db'
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TASK_SERIALIZER = 'json'
CELERY_CONTENT_ENCODING = 'utf-8'
CELERY_ENABLE_REMOTE_CONTROL = False
CELERY_SEND_EVENTS = False
CELERY_TIMEZONE = 'Asia/Kolkata'
CELERY_BEAT_SCHEDULER = 'django_celery_beat.schedulers:DatabaseScheduler'
Periodic task
#periodic_task(run_every=crontab(hour=7, minute=30), name="send-vendor-status-everyday")
def send_vendor_status():
return timezone.now()
Async task
#shared_task
def vendor_creation_email(id):
return "Email Sent"
Async task caller
vendor_creation_email.apply_async(args=[instance.id, ]) # main thread gets stuck here, if periodic jobs are scheduled.
Running the worker, with beat as follows
celery worker -A bid -l debug -B
Please help.
Here are a few observations, resulted from multiple trial and errors, and diving into celery's source code.
#periodic_task is deprecated. Hence it would not work.
from their source code:
#venv36/lib/python3.6/site-packages/celery/task/base.py
def periodic_task(*args, **options):
"""Deprecated decorator, please use :setting:`beat_schedule`."""
return task(**dict({'base': PeriodicTask}, **options))
Use UTC as base timezone, to avoid timezone related confusions later on. Configure periodic task to fire on calculated times with respect to UTC. e.g. for 'Asia/Calcutta' reduce the time by 5hours 30mins.
Create a celery.py as follows:
celery.py
import django
import os
from celery import Celery
# set the default Django settings module for the 'celery' program.
from celery.schedules import crontab
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings')
# Setup django project
django.setup()
app = Celery('proj')
# Using a string here means the worker don't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
app.conf.beat_schedule = {
'test_task': {
'task': 'test_task',
'schedule': crontab(hour=2,minute=0),
}
}
and task could be in tasks.py under any app, as follows
#shared_task(name="test_task")
def test_add():
print("Testing beat service")
Use celery worker -A proj -l info and celery beat -A proj -l info for worker and beat, along with a broker e.g. redis. and this setup should work fine.
Whenever I am running the celery worker I am getting the warning
./manage.py celery worker -l info --concurrency=8
and if I am ignored this warning then my celery worker not receiving the celery beat tasks
After googled I have also changed the worker name, but this time I am not receiving the warning but celery worker still not receiving the celery beat scheduled tasks
I have checked the celery beat logs, and celery beat scheduling the task on time.
I have also checked the celery flower and its showing two workers and the first worker is receiving the tasks and not executing it, how to send all task the second worker? or how can i disable the first kombu worker, what is djagno-celery setting that i am missing?
My django settings.py
RABBITMQ_USERNAME = "guest"
RABBITMQ_PASSWORD = "guest"
BROKER_URL = 'amqp://%s:%s#localhost:5672//' % (RABBITMQ_USERNAME,
RABBITMQ_PASSWORD)
CELERY_DEFAULT_QUEUE = 'default'
CELERY_DEFAULT_EXCHANGE = 'default'
CELERY_DEFAULT_ROUTING_KEY = 'default'
CELERY_IGNORE_RESULT = True
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
celery_enable_utc=True
import djcelery
djcelery.setup_loader()
You only enabled the worker. For a task to be executed, you must call the task with the help of the your_task.delay () function.
For example, open another terminal, enter your project, and run the python manage.py shell command. When entering the shell of your project Django, import your task and run the command your_task.delay ()
In the following link, there is an example of celery code with rabbitmq broker, I advise you to study it:
https://github.com/celery/celery/tree/master/examples/django
I'm posting a data to a web-service in Celery. Sometimes, the data is not posted to web-service because of the internet is down, and the task is retried infinite times until it is posted. The retrying of the task is un-necessary because the net was down and hence its not required to re-try it again.
I thought of a better solution, ie if a task fails thrice (retrying a min of 3 times), then it is shifted to another queue. This queue contains list of all failed tasks.
Now when the internet is up and the data is posted over the net , ie the task has been completed from the normal queue, it then starts processing the tasks from the queue having failed tasks.
This will not waste the CPU memory of retrying the task again and again.
Here's my code :- As of right now, I'm just retrying the task again, But I doubt whether that'll be the right way of doing it.
#shared_task(default_retry_delay = 1 * 60, max_retries = 10)
def post_data_to_web_service(data,url):
try :
client = SoapClient(
location = url,
action = 'http://tempuri.org/IService_1_0/',
namespace = "http://tempuri.org/",
soap_ns='soap', ns = False
)
response= client.UpdateShipment(
Weight = Decimal(data['Weight']),
Length = Decimal(data['Length']),
Height = Decimal(data['Height']),
Width = Decimal(data['Width']) ,
)
except Exception, exc:
raise post_data_to_web_service.retry(exc=exc)
How do I maintain 2 queues simultaneous and trying to execute tasks from both the queues.
Settings.py
BROKER_URL = 'redis://localhost:6379/0'
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
By default celery adds all tasks to queue named celery. So you can run your task here and when an exception occurs, it retries, once it reaches maximum retries, you can shift them to a new queue say foo
from celery.exceptions import MaxRetriesExceededError
#shared_task(default_retry_delay = 1 * 60, max_retries = 10)
def post_data_to_web_service(data,url):
try:
#do something with given args
except MaxRetriesExceededError:
post_data_to_web_service([data, url], queue='foo')
except Exception, exc:
raise post_data_to_web_service.retry(exc=exc)
When you start your worker, this task will try to do something with given data. If it fails it will retry 10 times with a dealy of 60 seconds. Then when it encounters MaxRetriesExceededError it posts the same task to new queue foo.
To consume these tasks you have to start a new worker
celery worker -l info -A my_app -Q foo
or you can also consume this task from the default worker if you start it with
celery worker -l info -A my_app -Q celery,foo
I followed celery docs to define 2 queues on my dev machine.
My celery settings:
CELERY_ALWAYS_EAGER = True
CELERY_TASK_RESULT_EXPIRES = 60 # 1 mins
CELERYD_CONCURRENCY = 2
CELERYD_MAX_TASKS_PER_CHILD = 4
CELERYD_PREFETCH_MULTIPLIER = 1
CELERY_CREATE_MISSING_QUEUES = True
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
Queue('feeds', Exchange('feeds'), routing_key='arena.social.tasks.#'),
)
CELERY_ROUTES = {
'arena.social.tasks.Update': {
'queue': 'fs_feeds',
},
}
i opened two terminal windows, in virtualenv of my project, and ran following commands:
terminal_1$ celery -A arena worker -Q default -B -l debug --purge -n deafult_worker
terminal_2$ celery -A arena worker -Q feeds -B -l debug --purge -n feeds_worker
what i get is that all tasks are being processed by both queues.
My goal is to have one queue to process only the one task defined in CELERY_ROUTES and default queue to process all other tasks.
I also followed this SO question, rabbitmqctl list_queues returns celery 0, and running rabbitmqctl list_bindings returns exchange celery queue celery [] twice. Restarting rabbit server didn't change anything.
Ok, so i figured it out. Following is my whole setup, settings and how to run celery, for those who might be wondering about same thing as my question did.
Settings
CELERY_TIMEZONE = TIME_ZONE
CELERY_ACCEPT_CONTENT = ['json', 'pickle']
CELERYD_CONCURRENCY = 2
CELERYD_MAX_TASKS_PER_CHILD = 4
CELERYD_PREFETCH_MULTIPLIER = 1
# celery queues setup
CELERY_DEFAULT_QUEUE = 'default'
CELERY_DEFAULT_EXCHANGE_TYPE = 'topic'
CELERY_DEFAULT_ROUTING_KEY = 'default'
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
Queue('feeds', Exchange('feeds'), routing_key='long_tasks'),
)
CELERY_ROUTES = {
'arena.social.tasks.Update': {
'queue': 'feeds',
'routing_key': 'long_tasks',
},
}
How to run celery?
terminal - tab 1:
celery -A proj worker -Q default -l debug -n default_worker
this will start first worker that consumes tasks from default queue. NOTE! -n default_worker is not a must for the first worker, but is a must if you have any other celery instances up and running. Setting -n worker_name is the same as --hostname=default#%h.
terminal - tab 2:
celery -A proj worker -Q feeds -l debug -n feeds_worker
this will start second worker that consumers tasks from feeds queue. Notice -n feeds_worker, if you are running with -l debug (log level = debug), you will see that both workers are syncing between them.
terminal - tab 3:
celery -A proj beat -l debug
this will start the beat, executing tasks according to the schedule in your CELERYBEAT_SCHEDULE.
I didn't have to change the task, or the CELERYBEAT_SCHEDULE.
For example, this is how looks my CELERYBEAT_SCHEDULE for the task that should go to feeds queue:
CELERYBEAT_SCHEDULE = {
...
'update_feeds': {
'task': 'arena.social.tasks.Update',
'schedule': crontab(minute='*/6'),
},
...
}
As you can see, no need for adding 'options': {'routing_key': 'long_tasks'} or specifying to what queue it should go. Also, if you were wondering why Update is upper cased, its because its a custom task, which are defined as sub classes of celery.Task.
Update Celery 5.0+
Celery made a couple changes since version 5, here is an updated setup for routing of tasks.
How to create the queues?
Celery can create the queues automatically. It works perfectly for simple cases, where celery default values for routing are ok.
task_create_missing_queues=True or, if you're using django settings and you're namespacing all celery configs under CELERY_ key, CELERY_TASK_CREATE_MISSING_QUEUES=True. Note, that it is on by default.
Automatic scheduled task routing
After configuring celery app:
celery_app.conf.beat_schedule = {
"some_scheduled_task": {
"task": "module.path.some_task",
"schedule": crontab(minute="*/10"),
"options": {"queue": "queue1"}
}
}
Automatic task routing
Celery app still has to be configured first and then:
app.conf.task_routes = {
"module.path.task2": {"queue": "queue2"},
}
Manual routing of tasks
In case and you want to route the tasks dynamically, then when sending the task specify the queue:
from module import task
def do_work():
# do some work and launch the task
task.apply_async(args=(arg1, arg2), queue="queue3")
More details re routing can be found here:
https://docs.celeryproject.org/en/stable/userguide/routing.html
And regarding calling tasks here:
https://docs.celeryproject.org/en/stable/userguide/calling.html
In addition to accepted answer, if anyone comes here and still wonders why his settings aren't working (as I did just moments ago), here's why: celery documentation isn't listing settings names properly.
For celery 5.0.5 settings CELERY_DEFAULT_QUEUE, CELERY_QUEUES, CELERY_ROUTES should be named CELERY_TASK_DEFAULT_QUEUE, CELERY_TASK_QUEUESand CELERY_TASK_ROUTES instead. These are settings that I've tested, but my guess is the same rule applies for exchange and routing key aswell.