Celery and custom consumers

Celery and custom consumers - python

To my knowledge, Celery acts as both the producer and consumer of messages. This is not what I want to achieve. I want Celery to act as the consumer only, to fire certain tasks based on messages that I send to my AMQP broker of choice. Is this possible?
Or do I need to make soup by adding carrot to my stack?

Celery brokers acts as a message stores and publish them to one or more workers that subscribe for those,
so: celery pulishes messages to a broker (rabbitmq, redist, celery itself through django db, etc..) those messages are retrieved by a worker following the protocol of the broker, that memorizes them (usually they are persistent but maybe it dependes on your broker), and got executed by you workers.
Task results are available on the executing worker task's, and you can configure where to store those results and you can retrieve them with this method .
You can publish tasks with celery passing parameters to your "receiver function" (the task you define, the documentation has some examples, usually you do not want to pass big things here (say a queryset), but only the minimal information that permits you to retrieve what you need when executing the task.
one easy example could be:
You register a task
#task
def add(x,x):
return x+y
and you call the from another module with:
from mytasks import add
metadata1 = 1
metadata2 = 2
myasyncresult = add.delay(1,2)
myasyncresult.get() == 3
EDIT
after your edit I saw that probably you want to construct messages from other sources other that celery, you could see here the message format, they default as pickled objects that respect that format, so you post those message in the right queue of your rabbitmq broker and you are right to go retrieving them from your workers.

Celery uses the message broker architectural pattern. A number of implementations / broker transports can be used with Celery including RabbitMQ and a Django database.
From Wikipedia:
A message broker is an architectural pattern for message validation, message transformation and message routing. It mediates communication amongst applications, minimizing the mutual awareness that applications should have of each other in order to be able to exchange messages, effectively implementing decoupling.
Keeping results is optional and requires a result backend. You can use different broker and result backends. The Celery Getting Started guide contains further information.
The answer to your question is yes you can fire specific tasks passing arguments without addding Carrot to the mix.

Celery Custom Consumer will be a feature released in 3.1v and now is under development, you can read http://docs.celeryproject.org/en/master/userguide/extending.html about it.

In order to consume message from celery you need to create message thats celery can consume. You can create celery message as follows:-
def get_celery_worker_message(task_name,args,kwargs,routing_key,id,exchange=None,exchange_type=None):
message=(args, kwargs, None)
application_headers={
'lang': 'py',
'task': task_name,
'id':id,
'argsrepr': repr(args),
'kwargsrepr': repr(kwargs)
#, 'origin': '#'.join([os.getpid(), socket.gethostname()])
}
properties={
'correlation_id':id,
'content_type': 'application/json',
'content_encoding': 'utf-8',
}
body, content_type, content_encoding = prepare(
message, 'json', 'application/json', 'utf-8',None, application_headers)
prep_message = prepare_message(body,None,content_type,content_encoding,application_headers,properties)
inplace_augment_message(prep_message, exchange, exchange_type, routing_key,id)
# dump_json = json.dumps(prep_message)
# print(f"json encoder:- {dump_json}")
return prep_message
You need to prep the message by first defining serializer, content_type, content_encoding, compression, headers based on the consumer.
def prepare( body, serializer=None, content_type=None,
content_encoding=None, compression=None, headers=None):
# No content_type? Then we're serializing the data internally.
if not content_type:
serializer = serializer
(content_type, content_encoding,
body) = dumps(body, serializer=serializer)
else:
# If the programmer doesn't want us to serialize,
# make sure content_encoding is set.
if isinstance(body, str):
if not content_encoding:
content_encoding = 'utf-8'
body = body.encode(content_encoding)
# If they passed in a string, we can't know anything
# about it. So assume it's binary data.
elif not content_encoding:
content_encoding = 'binary'
if compression:
body, headers['compression'] = compress(body, compression)
return body, content_type, content_encoding
def prepare_message( body, priority=None, content_type=None,
content_encoding=None, headers=None, properties=None):
"""Prepare message data."""
properties = properties or {}
properties.setdefault('delivery_info', {})
properties.setdefault('priority', priority )
return {'body': body,
'content-encoding': content_encoding,
'content-type': content_type,
'headers': headers or {},
'properties': properties or {}}
Once the message is created you need to add arguments to make it readable by celery consumer.
def inplace_augment_message(message, exchange,exchange_type, routing_key,next_delivery_tag):
body_encoding_64 = 'base64'
message['body'], body_encoding = encode_body(
str(json.dumps(message['body'])), body_encoding_64
)
props = message['properties']
props.update(
body_encoding=body_encoding,
delivery_tag=next_delivery_tag,
)
if exchange and exchange_type:
props['delivery_info'].update(
exchange=exchange,
exchange_type=exchange_type,
routing_key=routing_key,
)
elif exchange:
props['delivery_info'].update(
exchange=exchange,
routing_key=routing_key,
)
else:
props['delivery_info'].update(
exchange=None,
routing_key=routing_key,
)
class Base64:
"""Base64 codec."""
def encode(self, s):
return bytes_to_str(base64.b64encode(str_to_bytes(s)))
def decode(self, s):
return base64.b64decode(str_to_bytes(s))
def encode_body( body, encoding=None):
codecs = {'base64': Base64()}
if encoding:
return codecs.get(encoding).encode(body), encoding
return body, encoding

Related

Celery: how to get queue size in a reliable and testable way

I'm losing my mind trying to find a reliable and testable way to get the number of tasks contained in a given Celery queue.
I've already read these two related discussions:
Django Celery get task count
Note: I'm not using Django nor any other Python web framework.
Retrieve list of tasks in a queue in Celery
But I have not been able to solve my issue using the methods described in those threads.
I'm using Redis as backend, but I would like to have a backend independent and flexible solution, especially for tests.
This is my current situation: I've defined an EnhancedCelery class which inherits from Celery and adds a couple of methods, specifically get_queue_size() is the one I'm trying to properly implement/test.
The following is the code in my test case:
celery_test_app = EnhancedCelery(__name__)
# this is needed to avoid exception for ping command
# which is automatically triggered by the worker once started
celery_test_app.loader.import_module('celery.contrib.testing.tasks')
# in memory backend
celery_test_app.conf.broker_url = 'memory://'
celery_test_app.conf.result_backend = 'cache+memory://'
# We have to setup queues manually,
# since it seems that auto queue creation doesn't work in tests :(
celery_test_app.conf.task_create_missing_queues = False
celery_test_app.conf.task_default_queue = 'default'
celery_test_app.conf.task_queues = (
Queue('default', routing_key='task.#'),
Queue('queue_1', routing_key='q1'),
Queue('queue_2', routing_key='q2'),
Queue('queue_3', routing_key='q3'),
)
celery_test_app.conf.task_default_exchange = 'tasks'
celery_test_app.conf.task_default_exchange_type = 'topic'
celery_test_app.conf.task_default_routing_key = 'task.default'
celery_test_app.conf.task_routes = {
'sample_task': {
'queue': 'default',
'routing_key': 'task.default',
},
'sample_task_in_queue_1': {
'queue': 'queue_1',
'routing_key': 'q1',
},
'sample_task_in_queue_2': {
'queue': 'queue_2',
'routing_key': 'q2',
},
'sample_task_in_queue_3': {
'queue': 'queue_3',
'routing_key': 'q3',
},
}
#celery_test_app.task()
def sample_task():
return 'sample_task_result'
#celery_test_app.task(queue='queue_1')
def sample_task_in_queue_1():
return 'sample_task_in_queue_1_result'
#celery_test_app.task(queue='queue_2')
def sample_task_in_queue_2():
return 'sample_task_in_queue_2_result'
#celery_test_app.task(queue='queue_3')
def sample_task_in_queue_3():
return 'sample_task_in_queue_3_result'
class EnhancedCeleryTest(TestCase):
def test_get_queue_size_returns_expected_value(self):
def add_task(task):
task.apply_async()
with start_worker(celery_test_app):
for _ in range(7):
add_task(sample_task_in_queue_1)
for _ in range(4):
add_task(sample_task_in_queue_2)
for _ in range(2):
add_task(sample_task_in_queue_3)
self.assertEqual(celery_test_app.get_queue_size('queue_1'), 7)
self.assertEqual(celery_test_app.get_queue_size('queue_2'), 4)
self.assertEqual(celery_test_app.get_queue_size('queue_3'), 2)
Here are my attempts to implement get_queue_size():
This always returns zero (jobs == 0):
def get_queue_size(self, queue_name: str) -> Optional[int]:
with self.connection_or_acquire() as connection:
channel = connection.default_channel
try:
name, jobs, consumers = channel.queue_declare(queue=queue_name, passive=True)
return jobs
except (ChannelError, NotFound):
pass
This also always returns zero:
def get_queue_size(self, queue_name: str) -> Optional[int]:
inspection = self.control.inspect()
return inspection.active() # zero!
# or:
return inspection.scheduled() # zero!
# or:
return inspection.reserved() # zero!
This works by returning the expected number for each queue, but only in the test environment, because the channel.queues property does not exist when using the redis backend:
def get_queue_size(self, queue_name: str) -> Optional[int]:
with self.connection_or_acquire() as connection:
channel = connection.default_channel
if hasattr(channel, 'queues'):
queue = channel.queues.get(queue_name)
if queue is not None:
return queue.unfinished_tasks

None of the solutions you mentioned are entirely correct in my humble opinion. As you already mentioned this is backend-specific so you would have to wrap handlers for all backends supported by Celery to provide backend-agnostic queue inspection. In the Redis case you have to directly connect to Redis and LLEN the queue you want to inspect. In the case of RabbitMQ you find this information in completely different way. Same story with SQS...
This has all been discussed in the Retrieve list of tasks in a queue in Celery thread...
Finally, there is a reason why Celery does not provide this functionality out of box - the information is, I believe, useless. By the time you get what is in the queue it may already be empty!
If you want to monitor what is going on with your queues I suggest another approach. - Write your own real-time monitor. The example just captures task-failed events, but you should be able to modify it easily to capture all events you care about, and gather data about those tasks (queue, time, host it was executed on, etc). Clearly is an example how it is done in a more serious project.

You can see how it's implemented in the Flower (real-time monitor for Celery) here They have different Broker class implementation for redis and rabbitmq.
Another way - use celery's task events: calculate how many tasks were sent and how many were succeed/failed

How to send task to exchange rather then queue in celery python

What I understood from celery's documentations, when publishing tasks, you send them to exchange first and then exchange delegates it to queues. Now I want to send a task to specific custom made exchange which will delegate all tasks it receives to 3 different queues, which will have different consumers in the background, performing different tasks.
class Tasks(object):
def __init__(self, config_object={}):
self.celery = Celery()
self.celery.config_from_object(config_object)
self.task_publisher = task_publisher
def publish(self, task_name, job_id=None, params={}):
if not job_id:
job_id = uuid.uuid4()
self.celery.send_task(task_name, [job_id, params], queue='new_queue')
class config_object(object):
CELERY_IGNORE_RESULT = True
BROKER_PORT = 5672
BROKER_URL = 'amqp://guest:guest#localhost'
CELERY_RESULT_BACKEND = 'amqp'
tasks_service = Tasks(config_object)
tasks_service.publish('logger.log_event', params={'a': 'b'})
This is how I can send a task to specific queue, if I Dont define the queue it gets sent to a default one, but my question is how do I define the exchange to send to?

not sure if you have solved this problem.
i came across the same thing last week.
I am on Celery 4.1 and the solution I came up with was to just define the exchange name and the routing_key
so in your publish method, you would do something like:
def publish(self, task_name, job_id=None, params={}):
if not job_id:
job_id = uuid.uuid4()
self.celery.send_task(
task_name,
[job_id, params],
exchange='name.of.exchange',
routing_key='*'
)

Route different tasks to different Queues

I am using Celery with RabbitMQ as broker.
The code that creates Celery app instance is
from celery import Celery
name = __file__.split('.')[0]
app = Celery(name)
app.config_from_object('celery_config')
#app.task
def fetch_url(url):
resp = requests.get(url)
print resp.status_code
#app.task
def post(url, **kwargs):
body = kwargs.get(payload)
auth = kwrags.get(auth)
resp = requests.put(url, data=body, auth=auth)
Now I want to have 2 separate Queues, one for GET and one for POST.
Now I know that I must define the 2 queues in celery config module like
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
Queue('get', Exchange('get')),
Queue('post', Exchange('post')),
)
What I don't get is exactly what string to specify for the 'routing_key' option? Should it be the name of the tasks(get & post in this case) or there are rules for defining the routing_key?

No need to define queues or deal with routing keys, bindings or exchanges for simple task routing as your case. This is much simplified now (version 4.1) using the automatic routing.(http://docs.celeryproject.org/en/latest/userguide/routing.html#automatic-routing)
Name your tasks. For example, lets say you give names as below.
#app.task(name='get_task')
def fetch_url(url):
#app.task(name='post_task')
def post(url):
Add below line in your celery config file to route tasks to proper queues.
task_routes = {
'get_task': {'queue': 'get_queue'},
'post_task': {'queue': 'post_queue'}
}
Since task_create_missing_queues config is enabled by default, celery will take care of creating the queues for you.

How to send asynchronous request using flask to an endpoint with small timeout session?

I am new to backend development using Flask and I am getting stuck on a confusing problem. I am trying to send data to an endpoint whose Timeout session is 3000 ms. My code for the server is as follows.
from flask import Flask, request
from gitStat import getGitStat
import requests
app = Flask(__name__)
#app.route('/', methods=['POST', 'GET'])
def handle_data():
params = request.args["text"].split(" ")
user_repo_path = "https://api.github.com/users/{}/repos".format(params[0])
first_response = requests.get(
user_repo_path, auth=(
'Your Github Username', 'Your Github Password'))
repo_commits_path = "https://api.github.com/repos/{}/{}/commits".format(params[
0], params[1])
second_response = requests.get(
repo_commits_path, auth=(
'Your Github Username', 'Your Github Password'))
if(first_response.status_code == 200 and params[2] < params[3] and second_response.status_code == 200):
values = getGitStat(params[0], params[1], params[2], params[3])
response_url = request.args["response_url"]
payload = {
"response_type": "in_channel",
"text": "Github Repo Commits Status",
"attachments": [
{
"text": values
}
]
}
headers = {'Content-Type': 'application/json',
'User-Agent': 'Mozilla /5.0 (Compatible MSIE 9.0;Windows NT 6.1;WOW64; Trident/5.0)'}
response = requests.post(response_url, json = test, headers = headers)
else:
return "Please enter correct details. Check if the username or reponame exists, and/or Starting date < End date. \
Also, date format should be MM-DD"
My server code takes arguments from the request it receives and from that request's JSON object, it extracts parameters for the code. This code executes getGitStats function and sends the JSON payload as defined in the server code to the endpoint it received the request from.
My problem is that I need to send a text confirmation to the endpoint that I have received the request and data will be coming soon. The problem here is that the function, getGitStats take more than a minute to fetch and parse data from Github API.
I searched the internet and found that I need to make this call asynchronous and I can do that using queues. I tried to understand the application using RQ and RabbitMQ but I neither understood nor I was able to convert my code to an asynchronous format. Can somebody give me pointers or any idea on how can I achieve this?
Thank you.
------------Update------------
Threading was able to solve this problem. Create another thread and call the function in that thread.

If you are trying to have a async task in request, you have to decide whether you want the result/progress or not.
You don't care about the result of the task or if there where any errors while processing the task. You can just process this in a Thread and forget about the result.
If you just want to know about success/fail for the task. You can store the state of the task in Database and query it when needed.
If you want progress of the tasks like (20% done ... 40% done). You have to use something more sophisticated like celery, rabbitMQ.
For you i think option #2 fits better. You can create a simple table GitTasks.
GitTasks
------------------------
Id(PK) | Status | Result
------------------------
1 |Processing| -
2 | Done | <your result>
3 | Error | <error details>
You have to create a simple Threaded object in python to processing.
import threading
class AsyncGitTask(threading.Thread):
def __init__(self, task_id, params):
self.task_id = task_id
self.params = params
def run():
## Do processing
## store the result in table for id = self.task_id
You have to create another endpoint to query the status of you task.
#app.route('/TaskStatus/<int:task_id>')
def task_status(task_id):
## query GitTask table and accordingly return data
Now that we have build all the components we have to put them together in your main request.
from Flask import url_for
#app.route('/', methods=['POST', 'GET'])
def handle_data():
.....
## create a new row in GitTasks table, and use its PK(id) as task_id
task_id = create_new_task_row()
async_task = AsyncGitTask(task_id=task_id, params=params)
async_task.start()
task_status_url = url_for('task_status', task_id=task_id)
## This is request you can return text saying
## that "Your task is being processed. To see the progress
## go to <task_status_url>"

Nested web service calls with tornado (async?)

I am implementing a SOAP web service using tornado (and the third party tornadows module). One of the operations in my service needs to call another so I have the chain:
External request in (via SOAPUI) to operation A
Internal request (via requests module) in to operation B
Internal response from operation B
External response from operation A
Because it is all running in one service it is being blocked somewhere though. I'm not familiar with tornado's async functionality.
There is only one request handling method (post) because everything comes in on the single url and then the specific operation (method doing processing) is called based on the SOAPAction request header value. I have decorated my post method with #tornado.web.asynchronous and called self.finish() at the end but no dice.
Can tornado handle this scenario and if so how can I implement it?
EDIT (added code):
class SoapHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def post(self):
""" Method post() to process of requests and responses SOAP messages """
try:
self._request = self._parseSoap(self.request.body)
soapaction = self.request.headers['SOAPAction'].replace('"','')
self.set_header('Content-Type','text/xml')
for operations in dir(self):
operation = getattr(self,operations)
method = ''
if callable(operation) and hasattr(operation,'_is_operation'):
num_methods = self._countOperations()
if hasattr(operation,'_operation') and soapaction.endswith(getattr(operation,'_operation')) and num_methods > 1:
method = getattr(operation,'_operation')
self._response = self._executeOperation(operation,method=method)
break
elif num_methods == 1:
self._response = self._executeOperation(operation,method='')
break
soapmsg = self._response.getSoap().toprettyxml()
self.write(soapmsg)
self.finish()
except Exception as detail:
#traceback.print_exc(file=sys.stdout)
wsdl_nameservice = self.request.uri.replace('/','').replace('?wsdl','').replace('?WSDL','')
fault = soapfault('Error in web service : {fault}'.format(fault=detail), wsdl_nameservice)
self.write(fault.getSoap().toxml())
self.finish()
This is the post method from the request handler. It's from the web services module I'm using (so not my code) but I added the async decorator and self.finish(). All it basically does is call the correct operation (as dictated in the SOAPAction of the request).
class CountryService(soaphandler.SoapHandler):
#webservice(_params=GetCurrencyRequest, _returns=GetCurrencyResponse)
def get_currency(self, input):
result = db_query(input.country, 'currency')
get_currency_response = GetCurrencyResponse()
get_currency_response.currency = result
headers = None
return headers, get_currency_response
#webservice(_params=GetTempRequest, _returns=GetTempResponse)
def get_temp(self, input):
get_temp_response = GetTempResponse()
curr = self.make_curr_request(input.country)
get_temp_response.temp = curr
headers = None
return headers, get_temp_response
def make_curr_request(self, country):
soap_request = """<soapenv:Envelope xmlns:soapenv='http://schemas.xmlsoap.org/soap/envelope/' xmlns:coun='CountryService'>
<soapenv:Header/>
<soapenv:Body>
<coun:GetCurrencyRequestget_currency>
<country>{0}</country>
</coun:GetCurrencyRequestget_currency>
</soapenv:Body>
</soapenv:Envelope>""".format(country)
headers = {'Content-Type': 'text/xml;charset=UTF-8', 'SOAPAction': '"http://localhost:8080/CountryService/get_currency"'}
r = requests.post('http://localhost:8080/CountryService', data=soap_request, headers=headers)
try:
tree = etree.fromstring(r.content)
currency = tree.xpath('//currency')
message = currency[0].text
except:
message = "Failure"
return message
These are two of the operations of the web service (get_currency & get_temp). So SOAPUI hits get_temp, which makes a SOAP request to get_currency (via make_curr_request and the requests module). Then the results should just chain back and be sent back to SOAPUI.
The actual operation of the service makes no sense (returning the currency when asked for the temperature) but i'm just trying to get the functionality working and these are the operations I have.

I don't think that your soap module, or requests is asyncronous.
I believe adding the #asyncronous decorator is only half the battle. Right now you aren't making any async requests inside of your function (every request is blocking, which ties up the server until your method finishes)
You can switch this up by using tornados AsynHttpClient. This can be used pretty much as an exact replacement for requests. From the docoumentation example:
class MainHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
http = tornado.httpclient.AsyncHTTPClient()
http.fetch("http://friendfeed-api.com/v2/feed/bret",
callback=self.on_response)
def on_response(self, response):
if response.error: raise tornado.web.HTTPError(500)
json = tornado.escape.json_decode(response.body)
self.write("Fetched " + str(len(json["entries"])) + " entries "
"from the FriendFeed API")
self.finish()
Their method is decorated with async AND they are making asyn http requests. This is where the flow gets a little strange. When you use the AsyncHttpClient it doesn't lock up the event loop (PLease I just started using tornado this week, take it easy if all of my terminology isn't correct yet). This allows the server to freely processs incoming requests. When your asynchttp request is finished the callback method will be executed, in this case on_response.
Here you can replace requests with the tornado asynchttp client realtively easily. For your soap service, though, things might be more complicated. You could make a local webserivce around your soap client and make async requests to it using the tornado asyn http client???
This will create some complex callback logic which can be fixed using the gen decorator

This issue was fixed since yesterday.
Pull request:
https://github.com/rancavil/tornado-webservices/pull/23
Example: here a simple webservice that doesn't take arguments and returns the version.
Notice you should:
Method declaration: decorate the method with #gen.coroutine
Returning results: use raise gen.Return(data)
Code:
from tornado import gen
from tornadows.soaphandler import SoapHandler
...
class Example(SoapHandler):
#gen.coroutine
#webservice(_params=None, _returns=Version)
def Version(self):
_version = Version()
# async stuff here, let's suppose you ask other rest service or resource for the version details.
# ...
# returns the result.
raise gen.Return(_version)
Cheers!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Celery and custom consumers - python

Celery Custom Consumer will be a feature released in 3.1v and now is under development, you can read http://docs.celeryproject.org/en/master/userguide/extending.html about it.

Related

Celery: how to get queue size in a reliable and testable way

How to send task to exchange rather then queue in celery python

Route different tasks to different Queues

How to send asynchronous request using flask to an endpoint with small timeout session?

Nested web service calls with tornado (async?)

Categories

Resources