Using celery, send async_apply to specific vhost?

Using celery, send async_apply to specific vhost? - python

I have a use case where I'd like to be able to have many clients connect to RabbitMQ but they cannot see each other's messages. I believe using vhosts is the best way to keep privacy between the workers?
I thought I'd be able to pass a virtual_host argument to apply_async but that's not going to work, I believe I have to make a custom connection like so:
from kombu import Connection
my_connection = Connection(virtual_host='new_virtual_host')
task.apply_async(connection=my_connection)
However, I bet there's a built in way to do that inside Celery using the settings I already have configured and going through the proper channels in case I switch backends. What is that internal "get connection" function?
This is using Celery 3.1
EDIT:
Current attempt, not working in that it seems to just return a regular connection not using the specified virtual host...
from celery.app import app_or_default
app = app_or_default()
with app.broker_connection(virtual_host='other') as new_connection:
task.apply_async((data,), connection=new_connection)
If I check new_connection the virtual_host kwarg has been ignored.. hmm...

A ha! So, it turns out Celery accepts the broker_url then ignores virtual_host since broker_url is set. It appears to work fine doing it this way, manually setting the property we want:
from celery.app import app_or_default
app = app_or_default()
with app.connection() as new_connection:
# setting here instead of kwargs above
new_connection.virtual_host = 'other'
task.apply_async((data,), connection=new_connection)
Doing it this way when I change any regular CELERY or BROKER settings, it will apply to these new connections as well -- yay!

Related

Django Celery delay() always pushing to default 'celery' queue

I'm ripping my hair out with this one.
The crux of my issue is that, using the Django CELERY_DEFAULT_QUEUE setting in my settings.py is not forcing my tasks to go to that particular queue that I've set up. It always goes to the default celery queue in my broker.
However, if I specify queue=proj:dev in the shared_task decorator, it goes to the correct queue. It behaves as expected.
My setup is as follows:
Django code on my localhost (for testing and stuff). Executing task .delay()'s via Django's shell (manage.py shell)
a remote Redis instance configured as my broker
2 celery workers configured on a remote machine setup and waiting for messages from Redis (On Google App Engine - irrelevant perhaps)
NB: For the pieces of code below, I've obscured the project name and used proj as a placeholder.
celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery, shared_task
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings')
app = Celery('proj')
app.config_from_object('django.conf:settings', namespace='CELERY', force=True)
app.autodiscover_tasks()
#shared_task
def add(x, y):
return x + y
settings.py
...
CELERY_RESULT_BACKEND = 'django-db'
CELERY_BROKER_URL = 'redis://:{}#{}:6379/0'.format(
os.environ.get('REDIS_PASSWORD'),
os.environ.get('REDIS_HOST', 'alice-redis-vm'))
CELERY_DEFAULT_QUEUE = os.environ.get('CELERY_DEFAULT_QUEUE', 'proj:dev')
The idea is that, for right now, I'd like to have different queues for the different environments that my code exists in: dev, staging, prod. Thus, on Google App Engine, I define an environment variable that is passed based on the individual App Engine service.
Steps
So, with the above configuration, I fire up the shell using ./manage.py shell and run add.delay(2, 2). I get an AsyncResult back but Redis monitor clearly shows a message was sent to the default celery queue:
1497566026.117419 [0 155.93.144.189:58887] "LPUSH" "celery"
...
What am I missing?
Not to throw a spanner in the works, but I feel like there was a point today at which this was actually working. But for the life of me, I can't think what part of my brain is failing me here.
Stack versions:
python: 3.5.2
celery: 4.0.2
redis: 2.10.5
django: 1.10.4

This issue is far more simple than I thought - incorrect documentation!!
The Celery documentation asks us to use CELERY_DEFAULT_QUEUE to set the task_default_queue configuration on the celery object.
Ref: http://docs.celeryproject.org/en/latest/userguide/configuration.html#new-lowercase-settings
We should currently use CELERY_TASK_DEFAULT_QUEUE. This is an inconsistency in the naming of all the other settings' names. It was raised on Github here - https://github.com/celery/celery/issues/3772
Solution summary
Using CELERY_DEFAULT_QUEUE in a configuration module (using config_from_object) has no effect on the queue.
Use CELERY_TASK_DEFAULT_QUEUE instead.

If you are here because you're trying to implement a predefined queue using SQS in Celery and find that Celery creates a new queue called "celery" in SQS regardless of what you say, you've reached the end of your journey friend.
Before passing broker_transport_options to Celery, change your default queue and/or specify the queues you will use explicitly. In my case, I need just the one queue so doing the following worked:
celery.conf.task_default_queue = "<YOUR_PREDEFINED_QUEUE_NAME_IN_SQS">

Detect if flask is being run via gunicorn?

Is there a way I can check if my flask app is being run instide a gunicorn container? Currently I set an enviroment variable to tell my application this, but I'd prefer that it be automatic. Additionally, is there someway I can check what worker class is being used?
I need to detect this for a few different reasons. Note that typically I use gunicorn, but during testing I won't sometimes.
Logging: I attach to a gunicorn info log when run in gunicorn, otherwise to a stdout log.
Eventlet/subprocess: Since I use subprocesses I need to ensure that the proper monkey_patch'ing is done when using eventlet, otherwise it doesn't behave correctly. (I call many subprocesses).

Late to the party, but a very hacky solution that seems to work:
is_gunicorn = "gunicorn" in os.environ.get("SERVER_SOFTWARE", "")

Is it possible to use django-celery models with RabbitMQ?

I'm interested in using the django-celery models to create and monitor recurring tasks. In particular, I am looking at creating recurring cron-like actions and starting/stopping them from the admin.
As I understand it, it is possible to use this only if I am also using Django's default DB as the celery broker. Is it ever going to be possible to use those models with a non-DB broker?
EDIT: To clarify, I am already using RabbitMQ as the broker. My question is: can I, while using RabbigMQ, still somehow use django-celery's models to dynamically create and manage recurring/scheduled tasks?

If you have AMQP installed you can just set in celeryconfig:
BROKER_URL = 'amqp://127.0.0.1//'
Or replace the ip above with the ip where the RabbitMQ server is running.

How can I ensure a Celery task runs with the right settings?

I have two sites running essentially the same codebase, with only slight differences in settings. Each site is built in Django, with a WordPress blog integrated.
Each site needs to import blog posts from WordPress and store them in the Django database. When a user publishes a post, WordPress hits a webhook URL on the Django side, which kicks off a Celery task that grabs the JSON version of the post and imports it.
My initial thought was that each site could run its own instance of manage.py celeryd, each is in its own virtualenv, and the two sites would stay out of each other's way. Each is daemonized with a separate upstart script.
But it looks like they're colliding somehow. I can run one at a time successfully, but if both are running, one instance won't receive tasks, or tasks will run with the wrong settings (in this case, each has a WORDPRESS_BLOG_URL setting).
I'm using a Redis queue, if that makes a difference. What am I doing wrong here?

Have you specified the name of the default queue that celery should use? If you haven't set CELERY_DEFAULT_QUEUE the both sites will be using the same queue and getting each other's messages. You need to set this setting to a different value for each site to keep the message separate.
Edit
You're right, CELERY_DEFAULT_QUEUE is only for backends like RabbitMQ. I think you need to set a different database number for each site, using a different number at the end of your broker url.

If you are using django-celery then make sure you don't have an instance of celery running outside of your virtualenvs. Then start the celery instance within your virtualenvs using manage.py celeryd like you have done. I recommend setting up supervisord to keep track of your instances.

Communicating with a running python daemon

I wrote a small Python application that runs as a daemon. It utilizes threading and queues.
I'm looking for general approaches to altering this application so that I can communicate with it while it's running. Mostly I'd like to be able to monitor its health.
In a nutshell, I'd like to be able to do something like this:
python application.py start # launches the daemon
Later, I'd like to be able to come along and do something like:
python application.py check_queue_size # return info from the daemonized process
To be clear, I don't have any problem implementing the Django-inspired syntax. What I don't have any idea how to do is to send signals to the daemonized process (start), or how to write the daemon to handle and respond to such signals.
Like I said above, I'm looking for general approaches. The only one I can see right now is telling the daemon constantly log everything that might be needed to a file, but I hope there's a less messy way to go about it.
UPDATE: Wow, a lot of great answers. Thanks so much. I think I'll look at both Pyro and the web.py/Werkzeug approaches, since Twisted is a little more than I want to bite off at this point. The next conceptual challenge, I suppose, is how to go about talking to my worker threads without hanging them up.
Thanks again.

Yet another approach: use Pyro (Python remoting objects).
Pyro basically allows you to publish Python object instances as services that can be called remotely. I have used Pyro for the exact purpose you describe, and I found it to work very well.
By default, a Pyro server daemon accepts connections from everywhere. To limit this, either use a connection validator (see documentation), or supply host='127.0.0.1' to the Daemon constructor to only listen for local connections.
Example code taken from the Pyro documentation:
Server
import Pyro.core
class JokeGen(Pyro.core.ObjBase):
def __init__(self):
Pyro.core.ObjBase.__init__(self)
def joke(self, name):
return "Sorry "+name+", I don't know any jokes."
Pyro.core.initServer()
daemon=Pyro.core.Daemon()
uri=daemon.connect(JokeGen(),"jokegen")
print "The daemon runs on port:",daemon.port
print "The object's uri is:",uri
daemon.requestLoop()
Client
import Pyro.core
# you have to change the URI below to match your own host/port.
jokes = Pyro.core.getProxyForURI("PYROLOC://localhost:7766/jokegen")
print jokes.joke("Irmen")
Another similar project is RPyC. I have not tried RPyC.

What about having it run an http server?
It seems crazy but running a simple web server for administrating your
server requires just a few lines using web.py
You can also consider creating a unix pipe.

Use werkzeug and make your daemon include an HTTP-based WSGI server.
Your daemon has a collection of small WSGI apps to respond with status information.
Your client simply uses urllib2 to make POST or GET requests to localhost:somePort. Your client and server must agree on the port number (and the URL's).
This is very simple to implement and very scalable. Adding new commands is a trivial exercise.
Note that your daemon does not have to respond in HTML (that's often simple, though). Our daemons respond to the WSGI-requests with JSON-encoded status objects.

I would use twisted with a named pipe or just open up a socket. Take a look at the echo server and client examples. You would need to modify the echo server to check for some string passed by the client and then respond with whatever requested info.
Because of Python's threading issues you are going to have trouble responding to information requests while simultaneously continuing to do whatever the daemon is meant to do anyways. Asynchronous techniques or forking another processes are your only real option.

# your server
from twisted.web import xmlrpc, server
from twisted.internet import reactor
class MyServer(xmlrpc.XMLRPC):
def xmlrpc_monitor(self, params):
return server_related_info
if __name__ == '__main__':
r = MyServer()
reactor.listenTCP(8080, Server.Site(r))
reactor.run()
client can be written using xmlrpclib, check example code here.

Assuming you're under *nix, you can send signals to a running program with kill from a shell (and analogs in many other environments). To handle them from within python check out the signal module.

You could associate it with Pyro (http://pythonhosted.org/Pyro4/) the Python Remote Object. It lets you remotely access python objects. It's easily to implement, has low overhead, and isn't as invasive as Twisted.

You can do this using multiprocessing managers (https://docs.python.org/3/library/multiprocessing.html#managers):
Managers provide a way to create data which can be shared between different processes, including sharing over a network between processes running on different machines. A manager object controls a server process which manages shared objects. Other processes can access the shared objects by using proxies.
Example server:
from multiprocessing.managers import BaseManager
class RemoteOperations:
def add(self, a, b):
print('adding in server process!')
return a + b
def multiply(self, a, b):
print('multiplying in server process!')
return a * b
class RemoteManager(BaseManager):
pass
RemoteManager.register('RemoteOperations', RemoteOperations)
manager = RemoteManager(address=('', 12345), authkey=b'secret')
manager.get_server().serve_forever()
Example client:
from multiprocessing.managers import BaseManager
class RemoteManager(BaseManager):
pass
RemoteManager.register('RemoteOperations')
manager = RemoteManager(address=('localhost', 12345), authkey=b'secret')
manager.connect()
remoteops = manager.RemoteOperations()
print(remoteops.add(2, 3))
print(remoteops.multiply(2, 3))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using celery, send async_apply to specific vhost? - python

Related

Django Celery delay() always pushing to default 'celery' queue

Detect if flask is being run via gunicorn?

Is it possible to use django-celery models with RabbitMQ?

How can I ensure a Celery task runs with the right settings?

Communicating with a running python daemon

Categories

Resources