How to retrieve meta from celery backend storage - python

I am using celery with a flask app to launch some background tasks, I am using mongoDB as a backend.
I would like to store in the backend some information about the task being launch and then be able to retrieve it.
I believe the key is in the use of self.update_state(state= ..., meta = {}) where meta is my custom information. However I do not find anything working.

we assume you have a task like this:
#celery.task(bind=True)
def counter(self):
for i in xrange(100):
time.sleep(1)
self.update(state='PROGRESS', meta={'current': i})
return {'status': 'complete'}
and you have flask route like this:
#app.route('/count/')
def count_100():
"""
this starts a counter task and returns a response immediately
"""
task = counter.delay()
# this will return an empty json object with 202 http code status
# which means requests is still in progress and a Location header
return jsonify(), 202, dict(Location=url_for('status', task_id=task.id))
and finally your task status route is something like this:
from celery.result import AsyncResult
...
#app.route('/status/<task_id>/')
def status(task_id):
task = AsyncResult(task_id) # retrieving the task we started
if task.state == 'PROGRESS':
response = {
'state': task.state,
'current': task.info.get('current', 0)
}
return jsonify(response)

Related

Celery response to be displayed in flask app

Views.py
class UserCreate(Resource):
def post(self):
# try:
celery=create_user_employee.delay(auth_header, form_data, employee_id, employee_response)
# from celery import current_task
if "exc" in celery.get:
# print(celery)
return deliver_response(False, 'Celery task failed'), 400
return deliver_response(True, message="User Creation in Progress", status=200, data=data), 200
Task.py
#celery.task(name='accounts.tasks.create_user_employee')
def create_user_employee(auth_header, form_data, employee_id, employee_response):
try:
# add_employee(form_data, employee_id, eid)
if "streetLane" in form_data:
user_id=form_data.get("personalEmail","")
employee_address=address_post__(form_data, auth_header, user_id)
return deliver_response(True,message="success",status=200),200
except Exception as e:
return deliver_response(False, str(e), status=500), 500
Note:I am not able to return the response to flask app from tasks.py the objective here is that i need to break the views.py func if there is any error response from tasks.py but the result is not bein able to import or print in views.py any help would be great.
...................................................................................................................................................................................
As the celery process is a Async process, you will not get immediate response from celery job.
So this code
celery=create_user_employee.delay(auth_header, form_data, employee_id, employee_response)
doesn't give you result of the task. So celery variable doesn't give any meaningful info of task getting completed. Also don't use celery as a variable name.
So to fetch results of celery task,
from celery import result
res = result.AsyncResult(job_id)
So res variable has res.status and res.result to fetch status of task and result of job_id task. You can set your own job id when you call celery task.
When to fetch those results is upto you. Either write a recursive function that sees the status of job periodically or try to see job status whenever the result is required.
I would expect an Error to be thrown at the place where you have if "exc" in celery.get: ... something like TypeError: argument of type 'function' is not iterable.
Did you try to slightly change that line and have if "exc" in celery.get(): (notice the parenthesis)?
Also, giving a result of delay() a name "celery" is misleading - whoever reads your code may think that is an instance of the Celery class, which is not. That is an instance of the AsyncResult type. get() is just one of its methods.

How to send a progress of operation in a FastAPI app?

I have deployed a fastapi endpoint,
from fastapi import FastAPI, UploadFile
from typing import List
app = FastAPI()
#app.post('/work/test')
async def testing(files: List(UploadFile)):
for i in files:
.......
# do a lot of operations on each file
# after than I am just writing that processed data into mysql database
# cur.execute(...)
# cur.commit()
.......
# just returning "OK" to confirm data is written into mysql
return {"response" : "OK"}
I can request output from the API endpoint and its working fine for me perfectly.
Now, the biggest challenge for me to know how much time it is taking for each iteration. Because in the UI part (those who are accessing my API endpoint) I want to help them show a progress bar (TIME TAKEN) for each iteration/file being processed.
Is there any possible way for me to achieve it? If so, please help me out on how can I proceed further?
Thank you.
Approaches
Polling
The most preferred approach to track the progress of a task is polling:
After receiving a request to start a task on a backend:
Create a task object in the storage (e.g in-memory, redis and etc.). The task object must contain the following data: task ID, status (pending, completed), result, and others.
Run task in the background (coroutines, threading, multiprocessing, task queue like Celery, arq, aio-pika, dramatiq and etc.)
Response immediately the answer 202 (Accepted) by returning the previously received task ID.
Update task status:
This can be from within the task itself, if it knows about the task store and has access to it. Periodically, the task itself updates information about itself.
Or use a task monitor (Observer, producer-consumer pattern), which will monitor the status of the task and its result. And it will also update the information in the storage.
On the client side (front-end) start a polling cycle for the task status to endpoint /task/{ID}/status, which takes information from the task storage.
Streaming response
Streaming is a less convenient way of getting the status of request processing periodically. When we gradually push responses without closing the connection. It has a number of significant disadvantages, for example, if the connection is broken, you can lose information. Streaming Api is another approach than REST Api.
Websockets
You can also use websockets for real-time notifications and bidirectional communication.
Links:
Examples of polling approach for the progress bar and a more detailed description for django + celery can be found at these links:
https://www.dangtrinh.com/2013/07/django-celery-display-progress-bar-of.html
https://buildwithdjango.com/blog/post/celery-progress-bars/
I have provided simplified examples of running background tasks in FastAPI using multiprocessing here:
https://stackoverflow.com/a/63171013/13782669
Old answer:
You could run a task in the background, return its id and provide a /status endpoint that the front would periodically call. In the status response, you could return what state your task is now (for example, pending with the number of the currently processed file). I provided a few simple examples here.
Demo
Polling
Demo of the approach using asyncio tasks (single worker solution):
import asyncio
from http import HTTPStatus
from fastapi import BackgroundTasks
from typing import Dict, List
from uuid import UUID, uuid4
import uvicorn
from fastapi import FastAPI
from pydantic import BaseModel, Field
class Job(BaseModel):
uid: UUID = Field(default_factory=uuid4)
status: str = "in_progress"
progress: int = 0
result: int = None
app = FastAPI()
jobs: Dict[UUID, Job] = {} # Dict as job storage
async def long_task(queue: asyncio.Queue, param: int):
for i in range(1, param): # do work and return our progress
await asyncio.sleep(1)
await queue.put(i)
await queue.put(None)
async def start_new_task(uid: UUID, param: int) -> None:
queue = asyncio.Queue()
task = asyncio.create_task(long_task(queue, param))
while progress := await queue.get(): # monitor task progress
jobs[uid].progress = progress
jobs[uid].status = "complete"
#app.post("/new_task/{param}", status_code=HTTPStatus.ACCEPTED)
async def task_handler(background_tasks: BackgroundTasks, param: int):
new_task = Job()
jobs[new_task.uid] = new_task
background_tasks.add_task(start_new_task, new_task.uid, param)
return new_task
#app.get("/task/{uid}/status")
async def status_handler(uid: UUID):
return jobs[uid]
Adapted example for loop from question
Background processing function is defined as def and FastAPI runs it on the thread pool.
import time
from http import HTTPStatus
from fastapi import BackgroundTasks, UploadFile, File
from typing import Dict, List
from uuid import UUID, uuid4
from fastapi import FastAPI
from pydantic import BaseModel, Field
class Job(BaseModel):
uid: UUID = Field(default_factory=uuid4)
status: str = "in_progress"
processed_files: List[str] = Field(default_factory=list)
app = FastAPI()
jobs: Dict[UUID, Job] = {}
def process_files(task_id: UUID, files: List[UploadFile]):
for i in files:
time.sleep(5) # pretend long task
# ...
# do a lot of operations on each file
# then append the processed file to a list
# ...
jobs[task_id].processed_files.append(i.filename)
jobs[task_id].status = "completed"
#app.post('/work/test', status_code=HTTPStatus.ACCEPTED)
async def work(background_tasks: BackgroundTasks, files: List[UploadFile] = File(...)):
new_task = Job()
jobs[new_task.uid] = new_task
background_tasks.add_task(process_files, new_task.uid, files)
return new_task
#app.get("/work/{uid}/status")
async def status_handler(uid: UUID):
return jobs[uid]
Streaming
async def process_files_gen(files: List[UploadFile]):
for i in files:
time.sleep(5) # pretend long task
# ...
# do a lot of operations on each file
# then append the processed file to a list
# ...
yield f"{i.filename} processed\n"
yield f"OK\n"
#app.post('/work/stream/test', status_code=HTTPStatus.ACCEPTED)
async def work(files: List[UploadFile] = File(...)):
return StreamingResponse(process_files_gen(files))
Below is solution which uses uniq identifiers and globally available dictionary which holds information about the jobs:
NOTE: Code below is safe to use until you use dynamic keys values ( In sample uuid in use) and keep application within single process.
To start the app create a file main.py
Run uvicorn main:app --reload
Create job entry by accessing http://127.0.0.1:8000/
Repeat step 3 to create multiple jobs
Go to http://127.0.0.1/status page to see page statuses.
Go to http://127.0.0.1/status/{identifier} to see progression of the job by the job id.
Code of app:
from fastapi import FastAPI, UploadFile
import uuid
from typing import List
import asyncio
context = {'jobs': {}}
app = FastAPI()
async def do_work(job_key, files=None):
iter_over = files if files else range(100)
for file, file_number in enumerate(iter_over):
jobs = context['jobs']
job_info = jobs[job_key]
job_info['iteration'] = file_number
job_info['status'] = 'inprogress'
await asyncio.sleep(1)
pending_jobs[job_key]['status'] = 'done'
#app.post('/work/test')
async def testing(files: List[UploadFile]):
identifier = str(uuid.uuid4())
context[jobs][identifier] = {}
asyncio.run_coroutine_threadsafe(do_work(identifier, files), loop=asyncio.get_running_loop())
return {"identifier": identifier}
#app.get('/')
async def get_testing():
identifier = str(uuid.uuid4())
context['jobs'][identifier] = {}
asyncio.run_coroutine_threadsafe(do_work(identifier), loop=asyncio.get_running_loop())
return {"identifier": identifier}
#app.get('/status')
def status():
return {
'all': list(context['jobs'].values()),
}
#app.get('/status/{identifier}')
async def status(identifier):
return {
"status": context['jobs'].get(identifier, 'job with that identifier is undefined'),
}

How to trigger a function after return statement in Flask

I have 2 functions.
1st function stores the data received in a list and 2nd function writes the data into a csv file.
I'm using Flask. Whenever a web service has been called it will store the data and send response to it, as soon as it sends response it triggers the 2nd function.
My Code:
from flask import Flask, flash, request, redirect, url_for, session
import json
app = Flask(__name__)
arr = []
#app.route("/test", methods=['GET','POST'])
def check():
arr.append(request.form['a'])
arr.append(request.form['b'])
res = {'Status': True}
return json.dumps(res)
def trigger():
df = pd.DataFrame({'x': arr})
df.to_csv("docs/xyz.csv", index=False)
return
Obviously the 2nd function is not called.
Is there a way to achieve this?
P.S: My real life problem is different where trigger function is time consuming and I don't want user to wait for it to finish execution.
One solution would be to have a background thread that will watch a queue. You put your csv data in the queue and the background thread will consume it. You can start such a thread before first request:
import threading
from multiprocessing import Queue
class CSVWriterThread(threading.Thread):
def __init__(self, *args, **kwargs):
threading.Thread.__init__(self, *args, **kwargs)
self.input_queue = Queue()
def send(self, item):
self.input_queue.put(item)
def close(self):
self.input_queue.put(None)
self.input_queue.join()
def run(self):
while True:
csv_array = self.input_queue.get()
if csv_array is None:
break
# Do something here ...
df = pd.DataFrame({'x': csv_array})
df.to_csv("docs/xyz.csv", index=False)
self.input_queue.task_done()
time.sleep(1)
# Done
self.input_queue.task_done()
return
#app.before_first_request
def activate_job_monitor():
thread = CSVWriterThread()
app.csvwriter = thread
thread.start()
And in your code put the message in the queue before returning:
#app.route("/test", methods=['GET','POST'])
def check():
arr.append(request.form['a'])
arr.append(request.form['b'])
res = {'Status': True}
app.csvwriter.send(arr)
return json.dumps(res)
P.S: My real life problem is different where trigger function is time consuming and I don't want user to wait for it to finish execution.
Consider using celery which is made for the very problem you're trying to solve. From docs:
Celery is a simple, flexible, and reliable distributed system to process vast amounts of messages, while providing operations with the tools required to maintain such a system.
I recommend you integrate celery with your flask app as described here. your trigger method would then become a straightforward celery task that you can execute without having to worry about long response time.
Im actually working on another interesting case on my side where i pass the work off to a python worker that sends the job to a redis queue. There are some great blogs using redis with Flask , you basically need to ensure redis is running (able to connect on port 6379)
The worker would look something like this:
import os
import redis
from rq import Worker, Queue, Connection
listen = ['default']
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(list(map(Queue, listen)))
worker.work()
In my example I have a function that queries a database for usage and since it might be a lengthy process i pass it off to the worker (running as a seperate script)
def post(self):
data = Task.parser.parse_args()
job = q.enqueue_call(
func=migrate_usage, args=(my_args),
result_ttl=5000
)
print("Job ID is: {}".format(job.get_id()))
job_key = job.get_id()
print(str(Job.fetch(job_key, connection=conn).result))
if job:
return {"message": "Job : {} added to queue".format(job_key)}, 201
Credit due to the following article:
https://realpython.com/flask-by-example-implementing-a-redis-task-queue/#install-requirements
You can try use streaming. See next example:
import time
from flask import Flask, Response
app = Flask(__name__)
#app.route('/')
def main():
return '''<div>start</div>
<script>
var xhr = new XMLHttpRequest();
xhr.open('GET', '/test', true);
xhr.onreadystatechange = function(e) {
var div = document.createElement('div');
div.innerHTML = '' + this.readyState + ':' + this.responseText;
document.body.appendChild(div);
};
xhr.send();
</script>
'''
#app.route('/test')
def test():
def generate():
app.logger.info('request started')
for i in range(5):
time.sleep(1)
yield str(i)
app.logger.info('request finished')
yield ''
return Response(generate(), mimetype='text/plain')
if __name__ == '__main__':
app.run('0.0.0.0', 8080, True)
All magic in this example in genarator where you can start response data, after do some staff and yield empty data to end your stream.
For details look at http://flask.pocoo.org/docs/patterns/streaming/.
You can defer route specific actions with limited context by combining after_this_request and response.call_on_close. Note that request and response context won't be available but the route function context remains available. So you'll need to copy any request/response data you'll need into local variables for deferred access.
I moved your array to a local var to show how the function context is preserved. You could change your csv write function to an append so you're not pushing data endlessly into memory.
from flask import Flask, flash, request, redirect, url_for, session
import json
app = Flask(__name__)
#app.route("/test", methods=['GET','POST'])
def check():
arr = []
arr.append(request.form['a'])
arr.append(request.form['b'])
res = {'Status': True}
#flask.after_this_request
def add_close_action(response):
#response.call_on_close
def process_after_request():
df = pd.DataFrame({'x': arr})
df.to_csv("docs/xyz.csv", index=False)
return response
return json.dumps(res)

Create multiple Google Docs using task queue

I'm trying to create multiples Google Docs in background task.
I try to use the taskqueue from Google App Engine but I mustn't understand a point as I kept getting this message :
INFO 2016-05-17 15:38:46,393 module.py:787] default: "POST /update_docs HTTP/1.1" 302 -
WARNING 2016-05-17 15:38:46,393 taskqueue_stub.py:1981] Task task1 failed to execute. This task will retry in 0.800 seconds
Here is my code. I make a multiple call to the method UpdateDocs that need to be executed from the queue.
# Create a GDoc in the queue (called by her)
class UpdateDocs(BaseHandler):
#decorator.oauth_required
def post(self):
try:
http = decorator.http()
service = discovery.build("drive", "v2", http=http)
# Create the file
docs_name = self.request.get('docs_name')
body = {
'mimeType': DOCS_MIMETYPE,
'title': docs_name,
}
service.files().insert(body=body).execute()
except AccessTokenRefreshError:
self.redirect("/")
# Create multiple GDocs by calling the queue
class QueueMultiDocsCreator(BaseHandler):
def get(self):
try:
for i in range(5):
name = "File_n" + str(i)
taskqueue.add(
url='/update_docs',
params={
'docs_name': name,
})
self.redirect('/files')
except AccessTokenRefreshError:
self.redirect('/')
I can see the push queue in the App Engine Console, and every tasks is inside it but they can't run, I don't get why.
Kindly try to specify the worker module in your code.
As shown in Creating a new task, after calling the taskqueue.add() function, it targets the module named worker and invokes its handler by setting the url/update-counter.
class EnqueueTaskHandler(webapp2.RequestHandler):
def post(self):
amount = int(self.request.get('amount'))
task = taskqueue.add(
url='/update_counter',
target='worker',
params={'amount': amount})
self.response.write(
'Task {} enqueued, ETA {}.'.format(task.name, task.eta))
And from what I have read in this blog, a worker is one important piece of a task queue. It is a python process that reads jobs form the queue and executes them one at a time.
I hope that helps.

How to send asynchronous request using flask to an endpoint with small timeout session?

I am new to backend development using Flask and I am getting stuck on a confusing problem. I am trying to send data to an endpoint whose Timeout session is 3000 ms. My code for the server is as follows.
from flask import Flask, request
from gitStat import getGitStat
import requests
app = Flask(__name__)
#app.route('/', methods=['POST', 'GET'])
def handle_data():
params = request.args["text"].split(" ")
user_repo_path = "https://api.github.com/users/{}/repos".format(params[0])
first_response = requests.get(
user_repo_path, auth=(
'Your Github Username', 'Your Github Password'))
repo_commits_path = "https://api.github.com/repos/{}/{}/commits".format(params[
0], params[1])
second_response = requests.get(
repo_commits_path, auth=(
'Your Github Username', 'Your Github Password'))
if(first_response.status_code == 200 and params[2] < params[3] and second_response.status_code == 200):
values = getGitStat(params[0], params[1], params[2], params[3])
response_url = request.args["response_url"]
payload = {
"response_type": "in_channel",
"text": "Github Repo Commits Status",
"attachments": [
{
"text": values
}
]
}
headers = {'Content-Type': 'application/json',
'User-Agent': 'Mozilla /5.0 (Compatible MSIE 9.0;Windows NT 6.1;WOW64; Trident/5.0)'}
response = requests.post(response_url, json = test, headers = headers)
else:
return "Please enter correct details. Check if the username or reponame exists, and/or Starting date < End date. \
Also, date format should be MM-DD"
My server code takes arguments from the request it receives and from that request's JSON object, it extracts parameters for the code. This code executes getGitStats function and sends the JSON payload as defined in the server code to the endpoint it received the request from.
My problem is that I need to send a text confirmation to the endpoint that I have received the request and data will be coming soon. The problem here is that the function, getGitStats take more than a minute to fetch and parse data from Github API.
I searched the internet and found that I need to make this call asynchronous and I can do that using queues. I tried to understand the application using RQ and RabbitMQ but I neither understood nor I was able to convert my code to an asynchronous format. Can somebody give me pointers or any idea on how can I achieve this?
Thank you.
------------Update------------
Threading was able to solve this problem. Create another thread and call the function in that thread.
If you are trying to have a async task in request, you have to decide whether you want the result/progress or not.
You don't care about the result of the task or if there where any errors while processing the task. You can just process this in a Thread and forget about the result.
If you just want to know about success/fail for the task. You can store the state of the task in Database and query it when needed.
If you want progress of the tasks like (20% done ... 40% done). You have to use something more sophisticated like celery, rabbitMQ.
For you i think option #2 fits better. You can create a simple table GitTasks.
GitTasks
------------------------
Id(PK) | Status | Result
------------------------
1 |Processing| -
2 | Done | <your result>
3 | Error | <error details>
You have to create a simple Threaded object in python to processing.
import threading
class AsyncGitTask(threading.Thread):
def __init__(self, task_id, params):
self.task_id = task_id
self.params = params
def run():
## Do processing
## store the result in table for id = self.task_id
You have to create another endpoint to query the status of you task.
#app.route('/TaskStatus/<int:task_id>')
def task_status(task_id):
## query GitTask table and accordingly return data
Now that we have build all the components we have to put them together in your main request.
from Flask import url_for
#app.route('/', methods=['POST', 'GET'])
def handle_data():
.....
## create a new row in GitTasks table, and use its PK(id) as task_id
task_id = create_new_task_row()
async_task = AsyncGitTask(task_id=task_id, params=params)
async_task.start()
task_status_url = url_for('task_status', task_id=task_id)
## This is request you can return text saying
## that "Your task is being processed. To see the progress
## go to <task_status_url>"

Categories

Resources