How to get Celery to load the config from command line? - python

I am attempting to use celery worker to load a config file at the command line:
celery worker --config=my_settings_module
This doesn't appear to work. celery worker starts and uses its default settings (which include assuming that there is a RabbitMQ server available at localhost:5672) In my config, I would like to point celery to a different place. When I change the amqp settings in the config file to something, Celery didn't appear to care. It still shows the default RabbitMQ settings.
I also tried something bogus
celery worker --config=this_file_does_not_exist
And Celery once again did not care. The worker started and attached to the default RabbitMQ. It's not even looking at the --config setting
I read about how Celery lazy loads. I'm not sure that has anything to do with this.
How do I get celery worker to honor the --config setting?

If you give some invalid module name or a module name which is not in PYTHONPATH, say celery worker --config=invalid_foo, celery will ignore it.
You can verify this by creating a simple config file.
$ celery worker -h
--config=CONFIG Name of the configuration module
As mentioned in celery worker help, you should pass configuration module. Otherwise it will raise an error.
If you just run
celery worker
it will start worker and its output will be colored.
In the same directory, create a file called c.py with this line.
CELERYD_LOG_COLOR = False
Now run
celery worker --config=c
it will start worker and its output will not be colored.
If you run celery worker --config=c.py, it will raise an error.
celery.utils.imports.NotAPackage: Error: Module 'c.py' doesn't exist, or it's not a valid Python module name.
Did you mean 'c'?

I had the exact same error, but I eventually figured out that I had made simple option naming mistakes in the configuration module itself which were not obvious at all.
You see, when you start out and follow the tutorial, you will end up with something that looks like this in your main module:
app = Celery('foo', broker='amqp://user:pass#example.com/vsrv', backend='rpc://')
Which works fine, but then later as you add more and more configuration options you decide to move the options to a separate file, at which point you go ahead and just copy+paste and split the options into lines until it looks like this:
Naïve my_settings.py:
broker='amqp://user:pass#example.com/vsrv'
backend='rpc://'
result_persistent=True
task_acks_late=True
# ... etc. etc.
And there you just fooled yourself! Because in a settings module the options are called broker_url and result_backend instead of just broker and backend as they would be called in the instantiation above.
Corrected my_settings.py:
broker_url='amqp://user:pass#example.com/vsrv'
result_backend='rpc://'
result_persistent=True
task_acks_late=True
# ... etc. etc.
And all of a sudden, your worker boots up just fine with all settings in place.
I hope this will cure a few headaches of fellow celery newbies like us.
Further note:
You can test that celery in fact does not ignore your file by placing a print-statement (or print function call if you're on Py3) into the settings module.

Related

How can I load my Django project into celery workers AFTER they fork?

This question might seem odd to folks, but it's actually a creative question.
I'm using Django (v3.2.3) and celery (v5.2.3) for a project. I've noticed that the workers and master process all share the same code (probably b/c celery loads my app modules before it forks the child processes for configuration reasons). While this would normally be fine, I want to do something more unreasonable :smile: — I want the celery workers to each load my project code after they fork (similar to how uwsgi does with lazy-apps configuration).
Some responses here will ask why, but let's not focus on that (remember, I'm being unreasonable). Let's just assume I don't want to write thread-safe code. The risks are understood, namely that each child worker would load more memory and be slow at restart.
It's not clear to me from reading the celery code how this would be possible. I've tried this to no avail:
listen to the signal worker_process_init (source here)
then use my project's instantiated app ref and talk to the DjangoFixup interface app._fixups[0] here
and try to manually call all the registered signal callbacks for the DjangoFixupWorker here
Any ideas on the steps to get this to work would be much appreciated?

Django celery redis remove a specific periodic task from queue

There is a specific periodic task that needs to be removed from message queue. I am using the configuration of Redis and celery here.
tasks.py
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
"""
some operations here
"""
There are other periodic tasks also in the project but I need to stop this specific task to stop from now on.
As explained in this answer, the following code will work?
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
pass
In this example periodic task schedule is defined directly in code, meaning it is hard-coded and cannot be altered dynamically without code change and app re-deploy.
The provided code with task logic deleted or with simple return at the beginning - will work, but will not be the answer to the question - task will still run, there just is no code that will run with it.
Also, it is recommended NOT to use #periodic_task:
"""Deprecated decorator, please use :setting:beat_schedule."""
so it is not recommended to use it.
First, change method from being #periodic_task to just regular celery #task, and because you are using Django - it is better to go straightforward for #shared_task:
from celery import shared_task
#shared_task
def task_abcd():
...
Now this is just one of celery tasks, which needs to be called explicitly. Or it can be run periodically if added to celery beat schedule.
For production and if using multiple workers it is not recommended to run celery worker with embedded beat (-B) - run separate instance of celery beat scheduler.
Schedule can specified in celery.py or in django project settings (settings.py).
It is still not very dynamic, as to re-read settings app needs to be reloaded.
Then, use Database Scheduler which will allow dynamically creating schedules - which tasks need to be run and when and with what arguments. It even provides nice django admin web views for administration!
That code will work but I'd go for something that doesn't force you to update your code every time you need to disable/enable the task.
What you could do is to use a configurable variable whose value could come from an admin panel, a configuration file, or whatever you want, and use that to return before your code runs if the task is in disabled mode.
For instance:
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
config = load_config_for_task_abcd()
if not config.is_enabled:
return
# some operations here
In this way, even if your task is scheduled, its operations won't be executed.
If you simply want to remove the periodic task, have you tried to remove the function and then restart your celery service. You can restart your Redis service as well as your Django server for safe measure.
Make sure that the function you removed is not referenced anywhere else.

Django Celery delay() always pushing to default 'celery' queue

I'm ripping my hair out with this one.
The crux of my issue is that, using the Django CELERY_DEFAULT_QUEUE setting in my settings.py is not forcing my tasks to go to that particular queue that I've set up. It always goes to the default celery queue in my broker.
However, if I specify queue=proj:dev in the shared_task decorator, it goes to the correct queue. It behaves as expected.
My setup is as follows:
Django code on my localhost (for testing and stuff). Executing task .delay()'s via Django's shell (manage.py shell)
a remote Redis instance configured as my broker
2 celery workers configured on a remote machine setup and waiting for messages from Redis (On Google App Engine - irrelevant perhaps)
NB: For the pieces of code below, I've obscured the project name and used proj as a placeholder.
celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery, shared_task
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings')
app = Celery('proj')
app.config_from_object('django.conf:settings', namespace='CELERY', force=True)
app.autodiscover_tasks()
#shared_task
def add(x, y):
return x + y
settings.py
...
CELERY_RESULT_BACKEND = 'django-db'
CELERY_BROKER_URL = 'redis://:{}#{}:6379/0'.format(
os.environ.get('REDIS_PASSWORD'),
os.environ.get('REDIS_HOST', 'alice-redis-vm'))
CELERY_DEFAULT_QUEUE = os.environ.get('CELERY_DEFAULT_QUEUE', 'proj:dev')
The idea is that, for right now, I'd like to have different queues for the different environments that my code exists in: dev, staging, prod. Thus, on Google App Engine, I define an environment variable that is passed based on the individual App Engine service.
Steps
So, with the above configuration, I fire up the shell using ./manage.py shell and run add.delay(2, 2). I get an AsyncResult back but Redis monitor clearly shows a message was sent to the default celery queue:
1497566026.117419 [0 155.93.144.189:58887] "LPUSH" "celery"
...
What am I missing?
Not to throw a spanner in the works, but I feel like there was a point today at which this was actually working. But for the life of me, I can't think what part of my brain is failing me here.
Stack versions:
python: 3.5.2
celery: 4.0.2
redis: 2.10.5
django: 1.10.4
This issue is far more simple than I thought - incorrect documentation!!
The Celery documentation asks us to use CELERY_DEFAULT_QUEUE to set the task_default_queue configuration on the celery object.
Ref: http://docs.celeryproject.org/en/latest/userguide/configuration.html#new-lowercase-settings
We should currently use CELERY_TASK_DEFAULT_QUEUE. This is an inconsistency in the naming of all the other settings' names. It was raised on Github here - https://github.com/celery/celery/issues/3772
Solution summary
Using CELERY_DEFAULT_QUEUE in a configuration module (using config_from_object) has no effect on the queue.
Use CELERY_TASK_DEFAULT_QUEUE instead.
If you are here because you're trying to implement a predefined queue using SQS in Celery and find that Celery creates a new queue called "celery" in SQS regardless of what you say, you've reached the end of your journey friend.
Before passing broker_transport_options to Celery, change your default queue and/or specify the queues you will use explicitly. In my case, I need just the one queue so doing the following worked:
celery.conf.task_default_queue = "<YOUR_PREDEFINED_QUEUE_NAME_IN_SQS">

Celery worker hangs without any error

I have a production setup for running celery workers for making a POST / GET request to remote service and storing result, It is handling load around 20k tasks per 15 min.
The problem is that the workers go numb for no reason, no errors, no warnings.
I have tried adding multiprocessing also, the same result.
In log I see the increase in the time of executing task, like succeeded in s
For more details look at https://github.com/celery/celery/issues/2621
If your celery worker get stuck sometimes, you can use strace & lsof to find out at which system call it get stuck.
For example:
$ strace -p 10268 -s 10000
Process 10268 attached - interrupt to quit
recvfrom(5,
10268 is the pid of celery worker, recvfrom(5 means the worker stops at receiving data from file descriptor.
Then you can use lsof to check out what is 5 in this worker process.
lsof -p 10268
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
......
celery 10268 root 5u IPv4 828871825 0t0 TCP 172.16.201.40:36162->10.13.244.205:wap-wsp (ESTABLISHED)
......
It indicates that the worker get stuck at a tcp connection(you can see 5u in FD column).
Some python packages like requests is blocking to wait data from peer, this may cause celery worker hangs, if you are using requests, please make sure to set timeout argument.
Have you seen this page:
https://www.caktusgroup.com/blog/2013/10/30/using-strace-debug-stuck-celery-tasks/
I also faced the issue, when I was using delay shared_task with
celery, kombu, amqp, billiard. After calling the API when I used
delay() for #shared_task, all functions well but when it goes to delay
it hangs up.
So, the issue was In main Application init.py, the below settings
were missing
This will make sure the app is always imported when # Django starts so that shared_task will use this app.
In init.py
from __future__ import absolute_import, unicode_literals
# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celeryApp
#__all__ = ('celeryApp',)
__all__ = ['celeryApp']
Note1: In place of celery_app put the Aplication name, means the Application mentioned in celery.py import the App and put here
Note2:** If facing only hangs issue in shared task above solution may solve your issue and ignore below matters.
Also wanna mention A=another issue, If anyone facing Error 111
connection issue then please check the versions of amqp==2.2.2,
billiard==3.5.0.3, celery==4.1.0, kombu==4.1.0 whether they are
supporting or not. Mentioned versions are just an example. And Also
check whether redis is install in your system(If any any using redis).
Also make sure you are using Kombu 4.1.0. In the latest version of
Kombu renames async to asynchronous.
Follow this tutorial
Celery Django Link
Add the following to the settings
NB Install redis for both transport and result
# TRANSPORT
CELERY_BROKER_TRANSPORT = 'redis'
CELERY_BROKER_HOST = 'localhost'
CELERY_BROKER_PORT = '6379'
CELERY_BROKER_VHOST = '0'
# RESULT
CELERY_RESULT_BACKEND = 'redis'
CELERY_REDIS_HOST = 'localhost'
CELERY_REDIS_PORT = '6379'
CELERY_REDIS_DB = '1'

Where should I place the one-time operation operation in the Django framework?

I want to perform some one-time operations such as to start a background thread and populate a cache every 30 minutes as initialize action when the Django server is started, so it will not block user from visiting the website. Where should I place all this code in Django?
Put them into the setting.py file does not work. It seems it will cause a circular dependency.
Put them into the __init__.py file does not work. Django server call it many times (What is the reason?)
I just create standalone scripts and schedule them with cron. Admittedly it's a bit low-tech, but It Just Works. Just place this at the top of a script in your projects top-level directory and call as needed.
#!/usr/bin/env python
from django.core.management import setup_environ
import settings
setup_environ(settings)
from django.db import transaction
# random interesting things
# If you change the database, make sure you use this next line
transaction.commit_unless_managed()
We put one-time startup scripts in the top-level urls.py. This is often where your admin bindings go -- they're one-time startup, also.
Some folks like to put these things in settings.py but that seems to conflate settings (which don't do much) with the rest of the site's code (which does stuff).
For one operation in startserver, you can use customs commands or if you want a periodic task or a queue of taske you can use celery
__init__.py will be called every time the app is imported. So if you're using mod_wsgi with Apache for instance with the prefork method, then every new process created is effectively 'starting' the project thus importing __init__.py. It sounds like your best method would be to create a new management command, and then cron that up to run every so often if that's an option. Either that, or run that management command before starting the server. You could write up a quick script that runs that management command and then starts the server for instance.

Categories

Resources