Can't set airflow's celery_result_backend setting to 'rpc://'

Can't set airflow's celery_result_backend setting to 'rpc://' - python

Is 'rpc://' a valid value for the 'celery_result_backend' setting in airflow config? It doesn't seem to work.
Assumed it works, as its a valid value in core celery config.

Since we're using Celery on Redis the URLs both start with: redis://.
If you were using Celery with RabbitMQ the URLs would start with: amqp://
The AWS SQS ones would start with: sqs://
I don't see any queue broker url that starts with rpc:// in the broker documentation.
I do see that the results backend for RabbitMQ could start with rpc:// Since it's just a string passed to the library in question did you install with celery[librabbitmq] and you're not mixing up the two like I almost did?

Related

Celery CELERY_DEFAULT_EXCHANGE_TYPE doesn't work when I run a worker

I want to run some workers in topic mode, so I change the CELERY_DEFAULT_EXCHANGE_TYPE setting.
CELERY_DEFAULT_EXCHANGE = 'interesting_exchange'
CELERY_DEFAULT_EXCHANGE_TYPE = 'topic'
When I want to public messages to a clean rabbitmq (assume producer started before consumer), like simple_task.apply_async(args=[1, 2, 3]), it declared fine: interesting_exchange declared as topic mode.
But when I want to run a worker from a clean rabbitmq (assume consumer started before producer), like celery worker -A celery_app.app, interesting_exchange declared as direct mode.
Do I need to specify other options? Or it's just a bug?

This is a known Celery issue and the fix has already been merged. Upgrading to Celery 4.4 should solve the problem.

Django Celery delay() always pushing to default 'celery' queue

I'm ripping my hair out with this one.
The crux of my issue is that, using the Django CELERY_DEFAULT_QUEUE setting in my settings.py is not forcing my tasks to go to that particular queue that I've set up. It always goes to the default celery queue in my broker.
However, if I specify queue=proj:dev in the shared_task decorator, it goes to the correct queue. It behaves as expected.
My setup is as follows:
Django code on my localhost (for testing and stuff). Executing task .delay()'s via Django's shell (manage.py shell)
a remote Redis instance configured as my broker
2 celery workers configured on a remote machine setup and waiting for messages from Redis (On Google App Engine - irrelevant perhaps)
NB: For the pieces of code below, I've obscured the project name and used proj as a placeholder.
celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery, shared_task
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings')
app = Celery('proj')
app.config_from_object('django.conf:settings', namespace='CELERY', force=True)
app.autodiscover_tasks()
#shared_task
def add(x, y):
return x + y
settings.py
...
CELERY_RESULT_BACKEND = 'django-db'
CELERY_BROKER_URL = 'redis://:{}#{}:6379/0'.format(
os.environ.get('REDIS_PASSWORD'),
os.environ.get('REDIS_HOST', 'alice-redis-vm'))
CELERY_DEFAULT_QUEUE = os.environ.get('CELERY_DEFAULT_QUEUE', 'proj:dev')
The idea is that, for right now, I'd like to have different queues for the different environments that my code exists in: dev, staging, prod. Thus, on Google App Engine, I define an environment variable that is passed based on the individual App Engine service.
Steps
So, with the above configuration, I fire up the shell using ./manage.py shell and run add.delay(2, 2). I get an AsyncResult back but Redis monitor clearly shows a message was sent to the default celery queue:
1497566026.117419 [0 155.93.144.189:58887] "LPUSH" "celery"
...
What am I missing?
Not to throw a spanner in the works, but I feel like there was a point today at which this was actually working. But for the life of me, I can't think what part of my brain is failing me here.
Stack versions:
python: 3.5.2
celery: 4.0.2
redis: 2.10.5
django: 1.10.4

This issue is far more simple than I thought - incorrect documentation!!
The Celery documentation asks us to use CELERY_DEFAULT_QUEUE to set the task_default_queue configuration on the celery object.
Ref: http://docs.celeryproject.org/en/latest/userguide/configuration.html#new-lowercase-settings
We should currently use CELERY_TASK_DEFAULT_QUEUE. This is an inconsistency in the naming of all the other settings' names. It was raised on Github here - https://github.com/celery/celery/issues/3772
Solution summary
Using CELERY_DEFAULT_QUEUE in a configuration module (using config_from_object) has no effect on the queue.
Use CELERY_TASK_DEFAULT_QUEUE instead.

If you are here because you're trying to implement a predefined queue using SQS in Celery and find that Celery creates a new queue called "celery" in SQS regardless of what you say, you've reached the end of your journey friend.
Before passing broker_transport_options to Celery, change your default queue and/or specify the queues you will use explicitly. In my case, I need just the one queue so doing the following worked:
celery.conf.task_default_queue = "<YOUR_PREDEFINED_QUEUE_NAME_IN_SQS">

Using celery, send async_apply to specific vhost?

I have a use case where I'd like to be able to have many clients connect to RabbitMQ but they cannot see each other's messages. I believe using vhosts is the best way to keep privacy between the workers?
I thought I'd be able to pass a virtual_host argument to apply_async but that's not going to work, I believe I have to make a custom connection like so:
from kombu import Connection
my_connection = Connection(virtual_host='new_virtual_host')
task.apply_async(connection=my_connection)
However, I bet there's a built in way to do that inside Celery using the settings I already have configured and going through the proper channels in case I switch backends. What is that internal "get connection" function?
This is using Celery 3.1
EDIT:
Current attempt, not working in that it seems to just return a regular connection not using the specified virtual host...
from celery.app import app_or_default
app = app_or_default()
with app.broker_connection(virtual_host='other') as new_connection:
task.apply_async((data,), connection=new_connection)
If I check new_connection the virtual_host kwarg has been ignored.. hmm...

A ha! So, it turns out Celery accepts the broker_url then ignores virtual_host since broker_url is set. It appears to work fine doing it this way, manually setting the property we want:
from celery.app import app_or_default
app = app_or_default()
with app.connection() as new_connection:
# setting here instead of kwargs above
new_connection.virtual_host = 'other'
task.apply_async((data,), connection=new_connection)
Doing it this way when I change any regular CELERY or BROKER settings, it will apply to these new connections as well -- yay!

Celery worker hangs without any error

I have a production setup for running celery workers for making a POST / GET request to remote service and storing result, It is handling load around 20k tasks per 15 min.
The problem is that the workers go numb for no reason, no errors, no warnings.
I have tried adding multiprocessing also, the same result.
In log I see the increase in the time of executing task, like succeeded in s
For more details look at https://github.com/celery/celery/issues/2621

If your celery worker get stuck sometimes, you can use strace & lsof to find out at which system call it get stuck.
For example:
$ strace -p 10268 -s 10000
Process 10268 attached - interrupt to quit
recvfrom(5,
10268 is the pid of celery worker, recvfrom(5 means the worker stops at receiving data from file descriptor.
Then you can use lsof to check out what is 5 in this worker process.
lsof -p 10268
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
......
celery 10268 root 5u IPv4 828871825 0t0 TCP 172.16.201.40:36162->10.13.244.205:wap-wsp (ESTABLISHED)
......
It indicates that the worker get stuck at a tcp connection(you can see 5u in FD column).
Some python packages like requests is blocking to wait data from peer, this may cause celery worker hangs, if you are using requests, please make sure to set timeout argument.
Have you seen this page:
https://www.caktusgroup.com/blog/2013/10/30/using-strace-debug-stuck-celery-tasks/

I also faced the issue, when I was using delay shared_task with
celery, kombu, amqp, billiard. After calling the API when I used
delay() for #shared_task, all functions well but when it goes to delay
it hangs up.
So, the issue was In main Application init.py, the below settings
were missing
This will make sure the app is always imported when # Django starts so that shared_task will use this app.
In init.py
from __future__ import absolute_import, unicode_literals
# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celeryApp
#__all__ = ('celeryApp',)
__all__ = ['celeryApp']
Note1: In place of celery_app put the Aplication name, means the Application mentioned in celery.py import the App and put here
Note2:** If facing only hangs issue in shared task above solution may solve your issue and ignore below matters.
Also wanna mention A=another issue, If anyone facing Error 111
connection issue then please check the versions of amqp==2.2.2,
billiard==3.5.0.3, celery==4.1.0, kombu==4.1.0 whether they are
supporting or not. Mentioned versions are just an example. And Also
check whether redis is install in your system(If any any using redis).
Also make sure you are using Kombu 4.1.0. In the latest version of
Kombu renames async to asynchronous.

Follow this tutorial
Celery Django Link
Add the following to the settings
NB Install redis for both transport and result
# TRANSPORT
CELERY_BROKER_TRANSPORT = 'redis'
CELERY_BROKER_HOST = 'localhost'
CELERY_BROKER_PORT = '6379'
CELERY_BROKER_VHOST = '0'
# RESULT
CELERY_RESULT_BACKEND = 'redis'
CELERY_REDIS_HOST = 'localhost'
CELERY_REDIS_PORT = '6379'
CELERY_REDIS_DB = '1'

How can I ensure a Celery task runs with the right settings?

I have two sites running essentially the same codebase, with only slight differences in settings. Each site is built in Django, with a WordPress blog integrated.
Each site needs to import blog posts from WordPress and store them in the Django database. When a user publishes a post, WordPress hits a webhook URL on the Django side, which kicks off a Celery task that grabs the JSON version of the post and imports it.
My initial thought was that each site could run its own instance of manage.py celeryd, each is in its own virtualenv, and the two sites would stay out of each other's way. Each is daemonized with a separate upstart script.
But it looks like they're colliding somehow. I can run one at a time successfully, but if both are running, one instance won't receive tasks, or tasks will run with the wrong settings (in this case, each has a WORDPRESS_BLOG_URL setting).
I'm using a Redis queue, if that makes a difference. What am I doing wrong here?

Have you specified the name of the default queue that celery should use? If you haven't set CELERY_DEFAULT_QUEUE the both sites will be using the same queue and getting each other's messages. You need to set this setting to a different value for each site to keep the message separate.
Edit
You're right, CELERY_DEFAULT_QUEUE is only for backends like RabbitMQ. I think you need to set a different database number for each site, using a different number at the end of your broker url.

If you are using django-celery then make sure you don't have an instance of celery running outside of your virtualenvs. Then start the celery instance within your virtualenvs using manage.py celeryd like you have done. I recommend setting up supervisord to keep track of your instances.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't set airflow's celery_result_backend setting to 'rpc://' - python

Is 'rpc://' a valid value for the 'celery_result_backend' setting in airflow config? It doesn't seem to work. Assumed it works, as its a valid value in core celery config.

Related

Celery CELERY_DEFAULT_EXCHANGE_TYPE doesn't work when I run a worker

Django Celery delay() always pushing to default 'celery' queue

Using celery, send async_apply to specific vhost?

Celery worker hangs without any error

How can I ensure a Celery task runs with the right settings?

Categories

Resources