Route different tasks to different Queues - python

I am using Celery with RabbitMQ as broker.
The code that creates Celery app instance is
from celery import Celery
name = __file__.split('.')[0]
app = Celery(name)
app.config_from_object('celery_config')
#app.task
def fetch_url(url):
resp = requests.get(url)
print resp.status_code
#app.task
def post(url, **kwargs):
body = kwargs.get(payload)
auth = kwrags.get(auth)
resp = requests.put(url, data=body, auth=auth)
Now I want to have 2 separate Queues, one for GET and one for POST.
Now I know that I must define the 2 queues in celery config module like
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
Queue('get', Exchange('get')),
Queue('post', Exchange('post')),
)
What I don't get is exactly what string to specify for the 'routing_key' option? Should it be the name of the tasks(get & post in this case) or there are rules for defining the routing_key?

No need to define queues or deal with routing keys, bindings or exchanges for simple task routing as your case. This is much simplified now (version 4.1) using the automatic routing.(http://docs.celeryproject.org/en/latest/userguide/routing.html#automatic-routing)
Name your tasks. For example, lets say you give names as below.
#app.task(name='get_task')
def fetch_url(url):
#app.task(name='post_task')
def post(url):
Add below line in your celery config file to route tasks to proper queues.
task_routes = {
'get_task': {'queue': 'get_queue'},
'post_task': {'queue': 'post_queue'}
}
Since task_create_missing_queues config is enabled by default, celery will take care of creating the queues for you.

Related

How to send a progress of operation in a FastAPI app?

I have deployed a fastapi endpoint,
from fastapi import FastAPI, UploadFile
from typing import List
app = FastAPI()
#app.post('/work/test')
async def testing(files: List(UploadFile)):
for i in files:
.......
# do a lot of operations on each file
# after than I am just writing that processed data into mysql database
# cur.execute(...)
# cur.commit()
.......
# just returning "OK" to confirm data is written into mysql
return {"response" : "OK"}
I can request output from the API endpoint and its working fine for me perfectly.
Now, the biggest challenge for me to know how much time it is taking for each iteration. Because in the UI part (those who are accessing my API endpoint) I want to help them show a progress bar (TIME TAKEN) for each iteration/file being processed.
Is there any possible way for me to achieve it? If so, please help me out on how can I proceed further?
Thank you.
Approaches
Polling
The most preferred approach to track the progress of a task is polling:
After receiving a request to start a task on a backend:
Create a task object in the storage (e.g in-memory, redis and etc.). The task object must contain the following data: task ID, status (pending, completed), result, and others.
Run task in the background (coroutines, threading, multiprocessing, task queue like Celery, arq, aio-pika, dramatiq and etc.)
Response immediately the answer 202 (Accepted) by returning the previously received task ID.
Update task status:
This can be from within the task itself, if it knows about the task store and has access to it. Periodically, the task itself updates information about itself.
Or use a task monitor (Observer, producer-consumer pattern), which will monitor the status of the task and its result. And it will also update the information in the storage.
On the client side (front-end) start a polling cycle for the task status to endpoint /task/{ID}/status, which takes information from the task storage.
Streaming response
Streaming is a less convenient way of getting the status of request processing periodically. When we gradually push responses without closing the connection. It has a number of significant disadvantages, for example, if the connection is broken, you can lose information. Streaming Api is another approach than REST Api.
Websockets
You can also use websockets for real-time notifications and bidirectional communication.
Links:
Examples of polling approach for the progress bar and a more detailed description for django + celery can be found at these links:
https://www.dangtrinh.com/2013/07/django-celery-display-progress-bar-of.html
https://buildwithdjango.com/blog/post/celery-progress-bars/
I have provided simplified examples of running background tasks in FastAPI using multiprocessing here:
https://stackoverflow.com/a/63171013/13782669
Old answer:
You could run a task in the background, return its id and provide a /status endpoint that the front would periodically call. In the status response, you could return what state your task is now (for example, pending with the number of the currently processed file). I provided a few simple examples here.
Demo
Polling
Demo of the approach using asyncio tasks (single worker solution):
import asyncio
from http import HTTPStatus
from fastapi import BackgroundTasks
from typing import Dict, List
from uuid import UUID, uuid4
import uvicorn
from fastapi import FastAPI
from pydantic import BaseModel, Field
class Job(BaseModel):
uid: UUID = Field(default_factory=uuid4)
status: str = "in_progress"
progress: int = 0
result: int = None
app = FastAPI()
jobs: Dict[UUID, Job] = {} # Dict as job storage
async def long_task(queue: asyncio.Queue, param: int):
for i in range(1, param): # do work and return our progress
await asyncio.sleep(1)
await queue.put(i)
await queue.put(None)
async def start_new_task(uid: UUID, param: int) -> None:
queue = asyncio.Queue()
task = asyncio.create_task(long_task(queue, param))
while progress := await queue.get(): # monitor task progress
jobs[uid].progress = progress
jobs[uid].status = "complete"
#app.post("/new_task/{param}", status_code=HTTPStatus.ACCEPTED)
async def task_handler(background_tasks: BackgroundTasks, param: int):
new_task = Job()
jobs[new_task.uid] = new_task
background_tasks.add_task(start_new_task, new_task.uid, param)
return new_task
#app.get("/task/{uid}/status")
async def status_handler(uid: UUID):
return jobs[uid]
Adapted example for loop from question
Background processing function is defined as def and FastAPI runs it on the thread pool.
import time
from http import HTTPStatus
from fastapi import BackgroundTasks, UploadFile, File
from typing import Dict, List
from uuid import UUID, uuid4
from fastapi import FastAPI
from pydantic import BaseModel, Field
class Job(BaseModel):
uid: UUID = Field(default_factory=uuid4)
status: str = "in_progress"
processed_files: List[str] = Field(default_factory=list)
app = FastAPI()
jobs: Dict[UUID, Job] = {}
def process_files(task_id: UUID, files: List[UploadFile]):
for i in files:
time.sleep(5) # pretend long task
# ...
# do a lot of operations on each file
# then append the processed file to a list
# ...
jobs[task_id].processed_files.append(i.filename)
jobs[task_id].status = "completed"
#app.post('/work/test', status_code=HTTPStatus.ACCEPTED)
async def work(background_tasks: BackgroundTasks, files: List[UploadFile] = File(...)):
new_task = Job()
jobs[new_task.uid] = new_task
background_tasks.add_task(process_files, new_task.uid, files)
return new_task
#app.get("/work/{uid}/status")
async def status_handler(uid: UUID):
return jobs[uid]
Streaming
async def process_files_gen(files: List[UploadFile]):
for i in files:
time.sleep(5) # pretend long task
# ...
# do a lot of operations on each file
# then append the processed file to a list
# ...
yield f"{i.filename} processed\n"
yield f"OK\n"
#app.post('/work/stream/test', status_code=HTTPStatus.ACCEPTED)
async def work(files: List[UploadFile] = File(...)):
return StreamingResponse(process_files_gen(files))
Below is solution which uses uniq identifiers and globally available dictionary which holds information about the jobs:
NOTE: Code below is safe to use until you use dynamic keys values ( In sample uuid in use) and keep application within single process.
To start the app create a file main.py
Run uvicorn main:app --reload
Create job entry by accessing http://127.0.0.1:8000/
Repeat step 3 to create multiple jobs
Go to http://127.0.0.1/status page to see page statuses.
Go to http://127.0.0.1/status/{identifier} to see progression of the job by the job id.
Code of app:
from fastapi import FastAPI, UploadFile
import uuid
from typing import List
import asyncio
context = {'jobs': {}}
app = FastAPI()
async def do_work(job_key, files=None):
iter_over = files if files else range(100)
for file, file_number in enumerate(iter_over):
jobs = context['jobs']
job_info = jobs[job_key]
job_info['iteration'] = file_number
job_info['status'] = 'inprogress'
await asyncio.sleep(1)
pending_jobs[job_key]['status'] = 'done'
#app.post('/work/test')
async def testing(files: List[UploadFile]):
identifier = str(uuid.uuid4())
context[jobs][identifier] = {}
asyncio.run_coroutine_threadsafe(do_work(identifier, files), loop=asyncio.get_running_loop())
return {"identifier": identifier}
#app.get('/')
async def get_testing():
identifier = str(uuid.uuid4())
context['jobs'][identifier] = {}
asyncio.run_coroutine_threadsafe(do_work(identifier), loop=asyncio.get_running_loop())
return {"identifier": identifier}
#app.get('/status')
def status():
return {
'all': list(context['jobs'].values()),
}
#app.get('/status/{identifier}')
async def status(identifier):
return {
"status": context['jobs'].get(identifier, 'job with that identifier is undefined'),
}

How to send task to exchange rather then queue in celery python

What I understood from celery's documentations, when publishing tasks, you send them to exchange first and then exchange delegates it to queues. Now I want to send a task to specific custom made exchange which will delegate all tasks it receives to 3 different queues, which will have different consumers in the background, performing different tasks.
class Tasks(object):
def __init__(self, config_object={}):
self.celery = Celery()
self.celery.config_from_object(config_object)
self.task_publisher = task_publisher
def publish(self, task_name, job_id=None, params={}):
if not job_id:
job_id = uuid.uuid4()
self.celery.send_task(task_name, [job_id, params], queue='new_queue')
class config_object(object):
CELERY_IGNORE_RESULT = True
BROKER_PORT = 5672
BROKER_URL = 'amqp://guest:guest#localhost'
CELERY_RESULT_BACKEND = 'amqp'
tasks_service = Tasks(config_object)
tasks_service.publish('logger.log_event', params={'a': 'b'})
This is how I can send a task to specific queue, if I Dont define the queue it gets sent to a default one, but my question is how do I define the exchange to send to?
not sure if you have solved this problem.
i came across the same thing last week.
I am on Celery 4.1 and the solution I came up with was to just define the exchange name and the routing_key
so in your publish method, you would do something like:
def publish(self, task_name, job_id=None, params={}):
if not job_id:
job_id = uuid.uuid4()
self.celery.send_task(
task_name,
[job_id, params],
exchange='name.of.exchange',
routing_key='*'
)

How to send asynchronous request using flask to an endpoint with small timeout session?

I am new to backend development using Flask and I am getting stuck on a confusing problem. I am trying to send data to an endpoint whose Timeout session is 3000 ms. My code for the server is as follows.
from flask import Flask, request
from gitStat import getGitStat
import requests
app = Flask(__name__)
#app.route('/', methods=['POST', 'GET'])
def handle_data():
params = request.args["text"].split(" ")
user_repo_path = "https://api.github.com/users/{}/repos".format(params[0])
first_response = requests.get(
user_repo_path, auth=(
'Your Github Username', 'Your Github Password'))
repo_commits_path = "https://api.github.com/repos/{}/{}/commits".format(params[
0], params[1])
second_response = requests.get(
repo_commits_path, auth=(
'Your Github Username', 'Your Github Password'))
if(first_response.status_code == 200 and params[2] < params[3] and second_response.status_code == 200):
values = getGitStat(params[0], params[1], params[2], params[3])
response_url = request.args["response_url"]
payload = {
"response_type": "in_channel",
"text": "Github Repo Commits Status",
"attachments": [
{
"text": values
}
]
}
headers = {'Content-Type': 'application/json',
'User-Agent': 'Mozilla /5.0 (Compatible MSIE 9.0;Windows NT 6.1;WOW64; Trident/5.0)'}
response = requests.post(response_url, json = test, headers = headers)
else:
return "Please enter correct details. Check if the username or reponame exists, and/or Starting date < End date. \
Also, date format should be MM-DD"
My server code takes arguments from the request it receives and from that request's JSON object, it extracts parameters for the code. This code executes getGitStats function and sends the JSON payload as defined in the server code to the endpoint it received the request from.
My problem is that I need to send a text confirmation to the endpoint that I have received the request and data will be coming soon. The problem here is that the function, getGitStats take more than a minute to fetch and parse data from Github API.
I searched the internet and found that I need to make this call asynchronous and I can do that using queues. I tried to understand the application using RQ and RabbitMQ but I neither understood nor I was able to convert my code to an asynchronous format. Can somebody give me pointers or any idea on how can I achieve this?
Thank you.
------------Update------------
Threading was able to solve this problem. Create another thread and call the function in that thread.
If you are trying to have a async task in request, you have to decide whether you want the result/progress or not.
You don't care about the result of the task or if there where any errors while processing the task. You can just process this in a Thread and forget about the result.
If you just want to know about success/fail for the task. You can store the state of the task in Database and query it when needed.
If you want progress of the tasks like (20% done ... 40% done). You have to use something more sophisticated like celery, rabbitMQ.
For you i think option #2 fits better. You can create a simple table GitTasks.
GitTasks
------------------------
Id(PK) | Status | Result
------------------------
1 |Processing| -
2 | Done | <your result>
3 | Error | <error details>
You have to create a simple Threaded object in python to processing.
import threading
class AsyncGitTask(threading.Thread):
def __init__(self, task_id, params):
self.task_id = task_id
self.params = params
def run():
## Do processing
## store the result in table for id = self.task_id
You have to create another endpoint to query the status of you task.
#app.route('/TaskStatus/<int:task_id>')
def task_status(task_id):
## query GitTask table and accordingly return data
Now that we have build all the components we have to put them together in your main request.
from Flask import url_for
#app.route('/', methods=['POST', 'GET'])
def handle_data():
.....
## create a new row in GitTasks table, and use its PK(id) as task_id
task_id = create_new_task_row()
async_task = AsyncGitTask(task_id=task_id, params=params)
async_task.start()
task_status_url = url_for('task_status', task_id=task_id)
## This is request you can return text saying
## that "Your task is being processed. To see the progress
## go to <task_status_url>"

Celery - Chaining remote callbacks

I have three Celery tasks that run on three different servers respectively.
tasks.send_push_notification
tasks.send_sms
tasks.send_email
I want to setup a workflow such that if sending push notification fails, I should try sending sms. And if sending sms fails, I should send email.
If those 3 tasks and their code base was on the same server, I would have followed the example on chained tasks and done something like
from celery import chain
from tasks import send_push_notification, send_sms, send_email
import json
# some paylaod
payload = json.dumps({})
res = chain(
send_push_notification.subtask(payload),
send_sms.subtask(payload),
send_email.subtask(payload)
)()
But the tasks are kept on 3 different servers!
I have tried
# 1
from celery import chain
from my_celery_app import app
res = chain(
app.send_task('tasks.send_push_notification', payload),
app.send_task('tasks.send_sms', payload),
app.send_task('tasks.send_email', payload)
)()
# Which fails because I am chaining tasks not subtasks
and
# 2
from celery import chain, subtask
res = chain(
subtask('tasks.send_push_notification', payload),
subtask('tasks.send_sms', payload),
subtask('tasks.send_email', payload)
)()
# fails because I am not adding the tasks on the broker
How can this be done?
Update:
I can do it using link NOT chain.
from celery import subtask
res = app.send_task(
'tasks.send_push_notification', (payload, ),
link=subtask(
'tasks.send_sms', (payload, ),
link=subtask(
'tasks.send_email', (payload, ),
)
)
)
There is a lot of nesting. And because I actually need to create a database driven workflow, it will be complicated to create it this way.
Why not handle it in your tasks,
def push_notification_task(payload):
if not send_push_notification(payload):
sms_notification_task.delay(payload)
def sms_notification_task(payload):
if not send_sms_notification(payload):
email_notification_task.delay(payload)
def email_notification_task(payload):
send_email_notification(payload)
Moreover, chain will execute all of your tasks in the given order, whereas you want next task to run only if first failed.

Celery and custom consumers

To my knowledge, Celery acts as both the producer and consumer of messages. This is not what I want to achieve. I want Celery to act as the consumer only, to fire certain tasks based on messages that I send to my AMQP broker of choice. Is this possible?
Or do I need to make soup by adding carrot to my stack?
Celery brokers acts as a message stores and publish them to one or more workers that subscribe for those,
so: celery pulishes messages to a broker (rabbitmq, redist, celery itself through django db, etc..) those messages are retrieved by a worker following the protocol of the broker, that memorizes them (usually they are persistent but maybe it dependes on your broker), and got executed by you workers.
Task results are available on the executing worker task's, and you can configure where to store those results and you can retrieve them with this method .
You can publish tasks with celery passing parameters to your "receiver function" (the task you define, the documentation has some examples, usually you do not want to pass big things here (say a queryset), but only the minimal information that permits you to retrieve what you need when executing the task.
one easy example could be:
You register a task
#task
def add(x,x):
return x+y
and you call the from another module with:
from mytasks import add
metadata1 = 1
metadata2 = 2
myasyncresult = add.delay(1,2)
myasyncresult.get() == 3
EDIT
after your edit I saw that probably you want to construct messages from other sources other that celery, you could see here the message format, they default as pickled objects that respect that format, so you post those message in the right queue of your rabbitmq broker and you are right to go retrieving them from your workers.
Celery uses the message broker architectural pattern. A number of implementations / broker transports can be used with Celery including RabbitMQ and a Django database.
From Wikipedia:
A message broker is an architectural pattern for message validation, message transformation and message routing. It mediates communication amongst applications, minimizing the mutual awareness that applications should have of each other in order to be able to exchange messages, effectively implementing decoupling.
Keeping results is optional and requires a result backend. You can use different broker and result backends. The Celery Getting Started guide contains further information.
The answer to your question is yes you can fire specific tasks passing arguments without addding Carrot to the mix.
Celery Custom Consumer will be a feature released in 3.1v and now is under development, you can read http://docs.celeryproject.org/en/master/userguide/extending.html about it.
In order to consume message from celery you need to create message thats celery can consume. You can create celery message as follows:-
def get_celery_worker_message(task_name,args,kwargs,routing_key,id,exchange=None,exchange_type=None):
message=(args, kwargs, None)
application_headers={
'lang': 'py',
'task': task_name,
'id':id,
'argsrepr': repr(args),
'kwargsrepr': repr(kwargs)
#, 'origin': '#'.join([os.getpid(), socket.gethostname()])
}
properties={
'correlation_id':id,
'content_type': 'application/json',
'content_encoding': 'utf-8',
}
body, content_type, content_encoding = prepare(
message, 'json', 'application/json', 'utf-8',None, application_headers)
prep_message = prepare_message(body,None,content_type,content_encoding,application_headers,properties)
inplace_augment_message(prep_message, exchange, exchange_type, routing_key,id)
# dump_json = json.dumps(prep_message)
# print(f"json encoder:- {dump_json}")
return prep_message
You need to prep the message by first defining serializer, content_type, content_encoding, compression, headers based on the consumer.
def prepare( body, serializer=None, content_type=None,
content_encoding=None, compression=None, headers=None):
# No content_type? Then we're serializing the data internally.
if not content_type:
serializer = serializer
(content_type, content_encoding,
body) = dumps(body, serializer=serializer)
else:
# If the programmer doesn't want us to serialize,
# make sure content_encoding is set.
if isinstance(body, str):
if not content_encoding:
content_encoding = 'utf-8'
body = body.encode(content_encoding)
# If they passed in a string, we can't know anything
# about it. So assume it's binary data.
elif not content_encoding:
content_encoding = 'binary'
if compression:
body, headers['compression'] = compress(body, compression)
return body, content_type, content_encoding
def prepare_message( body, priority=None, content_type=None,
content_encoding=None, headers=None, properties=None):
"""Prepare message data."""
properties = properties or {}
properties.setdefault('delivery_info', {})
properties.setdefault('priority', priority )
return {'body': body,
'content-encoding': content_encoding,
'content-type': content_type,
'headers': headers or {},
'properties': properties or {}}
Once the message is created you need to add arguments to make it readable by celery consumer.
def inplace_augment_message(message, exchange,exchange_type, routing_key,next_delivery_tag):
body_encoding_64 = 'base64'
message['body'], body_encoding = encode_body(
str(json.dumps(message['body'])), body_encoding_64
)
props = message['properties']
props.update(
body_encoding=body_encoding,
delivery_tag=next_delivery_tag,
)
if exchange and exchange_type:
props['delivery_info'].update(
exchange=exchange,
exchange_type=exchange_type,
routing_key=routing_key,
)
elif exchange:
props['delivery_info'].update(
exchange=exchange,
routing_key=routing_key,
)
else:
props['delivery_info'].update(
exchange=None,
routing_key=routing_key,
)
class Base64:
"""Base64 codec."""
def encode(self, s):
return bytes_to_str(base64.b64encode(str_to_bytes(s)))
def decode(self, s):
return base64.b64decode(str_to_bytes(s))
def encode_body( body, encoding=None):
codecs = {'base64': Base64()}
if encoding:
return codecs.get(encoding).encode(body), encoding
return body, encoding

Categories

Resources