My REST API written in Python spawns processes that take about 3 minutes to complete. I store the PID in a global array and setup a secondary check method that should confirm whether the process is still running, or if it has finished.
The only methods I can find are to poll the subprocess (which I don't have access to in this route), or try to kill the process to see if it's alive. Is there any clean way to just get a binary answer of if it's still running based on the PID, and if it completed successfully if not?
from flask import Flask, jsonify, request, Response
from subprocess import Popen, PIPE
import os
app = Flask(__name__)
QUEUE_ID = 0
jobs = []
#app.route("/compile", methods=["POST"])
def compileFirmware():
f = request.files['file']
f.save(f.filename)
os.chdir("/opt/src/2.0.x")
process = Popen(['platformio', 'run', '-e', 'mega2560'], stdout=PIPE, stderr=PIPE, universal_newlines=True)
global QUEUE_ID
QUEUE_ID += 1
data = {'id':QUEUE_ID, 'pid':process.pid}
jobs.append(data)
output, errors = process.communicate()
print (output)
print (errors)
response = jsonify()
response.status_code = 202 #accepted
response.headers['location'] = '/queue/' + str(QUEUE_ID)
response.headers.add('Access-Control-Allow-Origin', '*')
return response
#app.route("/queue/<id>", methods=["GET"])
def getStatus(id):
#CHECK PID STATUS HERE
content = {'download_url': 'download.com'}
response = jsonify(content)
response.headers.add('Access-Control-Allow-Origin', '*')
return response
if __name__ == '__main__':
app.run(host='0.0.0.0',port=8080)
Here is a little simulation that works:
from flask import Flask, jsonify, request, Response, abort
from subprocess import Popen, PIPE
import os
app = Flask(__name__)
QUEUE = { }
#app.route("/compile", methods=["POST"])
def compileFirmware():
process = Popen(['python','-c','"import time; time.sleep(300)"'], stdout=PIPE, stderr=PIPE, universal_newlines=True)
QUEUE[str(process.pid)] = process # String because in GET the url param will be interpreted as str
response = jsonify()
response.status_code = 202 #accepted
response.headers['location'] = '/queue/' + str(process.pid)
response.headers.add('Access-Control-Allow-Origin', '*')
return response
#app.route("/queue/<id>", methods=["GET"])
def getStatus(id):
process = QUEUE.get(id, None)
if process is None:
abort(404, description="Process not found")
retcode = process.poll()
if retcode is None:
content = {'download_url': None, 'message': 'Process is still running.'}
else:
# QUEUE.pop(id) # Remove reference from QUEUE ?
content = {'download_url': 'download.com', 'message': f'process has completed with retcode: {retcode}'}
response = jsonify(content)
response.headers.add('Access-Control-Allow-Origin', '*')
return response
if __name__ == '__main__':
app.run(host='0.0.0.0',port=8080)
There are further considerations you must think about if this application is going to be used as more than an individual project.
We use QUEUE global variable to store the states of processes. But in a real project, the deployment via wsgi / gunicorn can have multiple workers with each worker having its own global variable. So for scaling up, consider using a redis / mq data store instead.
Does the QUEUE ever need to be cleaned up? Should it be cleaned up? It has a disadvantage that if you clean it up after the value has been GET once, the next GET fetches 404. It is a design decision if the GET api must be idempotent(most likely yes).
Related
Context:
I have a function in my views command where some varaibales is being send through a background task script after a HTTP post on form. This background script handles a lot of API calls and is converting this to JSON that is gonna be converted Because this can take a long time (because of the way the API Calls) go before the next html page is rendered I had decided to make this one a background task with threading. Eventually if possible I would like a que to be added later on.
The problem:
But the thing is even though I clearly set it to a daemon thread and everything. The code is not being executed. I the Python Console log does not even show the code being executed, so what I am doing wrong here.
Just look at Order 66 if you want the background task itself.
"""
Routes and views for the flask application.
"""
from datetime import datetime
from flask import render_template, jsonify
from FlaskWebPODTracer import app
import json
from itsdangerous import TimedJSONWebSignatureSerializer as Serializer
from flask import request, redirect, url_for, flash
# The main script that has to happen in the background.
import classified.background_script.backgroundtask as Order66
from threading import Thread
import random
import string
from celery import Celery
from flask_mail import Mail, Message
import requests
API_VERSION = "v1/"
Succes_message = "U ontvangt nog een mail wanneer de data klaar is. Bedankt om PODTracer te gebruiken."
Fail_message_1 = "Wijzig de parameters van uw aanvraag en probeer opnieuw."
Fail_message_2 = "Blijft dit voorkomen contacteer dan support."
Fail_message = Fail_message_1 + '\n' + Fail_message_2
#app.route('/handle_data/')
#app.route('/handle_data/', methods=['POST', 'GET'])
def handle_data():
#if request.method == 'GET':
# return render_template(
# "request-completed.html",
# title=
# "Aanvraag ingediend",
# objects = jsonobjects,
# year=datetime.now().year,
# message='Uw aanvraag is voltooid.'
# )
if request.method == 'POST':
email = request.form['inputEmail']
stringpart = request.form['city']
sper_year = int(request.form["Speryear"])
urlpart = request.form["api-url-input"]
url = "Classified" + API_VERSION + urlpart
print(url)
response = requests.get(url)
if response.status_code == 200:
jsonobjects = len(response.json())
task = Thread(group=None, target=None,name=Order66.main, args=(stringpart,url,sper_year,email), daemon=True)
task.start()
state = "succesvol."
message_body = Succes_message
else:
jsonobjects = 0;
state = "onsuccesvol."
message_body = Fail_message
return render_template(
"request-posted.html",
respstate = state,
body = message_body,
title=
"Aanvraag ingediend",
objects = jsonobjects,
year=datetime.now().year,
message='Uw aanvraag is ' + state
)
# TODO RUN THIS IN BACKGROUND
# app.route('/request-completed')
#app.route('/handle_data_fail')
def handle_data_fail():
jsonobjects = 0
state = "onsuccesvol."
message_body = Fail_message
return render_template(
"request-posted.html",
respstate = state,
body = message_body,
title=
"Aanvraag ingediend",
objects = jsonobjects,
year=datetime.now().year,
message='Uw aanvraag is ' + state
)
As discussed in comments, here's an oversimplified example of an event-driven system with RabbitMQ and Flask.
Dependencies you need:
(flask-rabbitmq) ➜ flask-rabbitmq pip freeze
click==8.0.3
Flask==2.0.2
itsdangerous==2.0.1
Jinja2==3.0.3
MarkupSafe==2.0.1
pika==1.2.0
Werkzeug==2.0.2
Try to create a RabbitMQ docker container with the command below:
docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3.9-management
your simple flask app would look like this:
from flask import Flask
from flask import request
import json
from RabbitMQPublisher import publish
app = Flask(__name__)
#app.route('/publish', methods=['POST'])
def index():
if request.method == 'POST':
publish(json.dumps(request.json))
return 'Done'
if __name__ == '__main__':
app.run()
your RabbitMQPublisher.py would be like this:
import pika
def publish(data: str):
connection = pika.BlockingConnection(
pika.ConnectionParameters(host='0.0.0.0', port=5672))
channel = connection.channel()
channel.exchange_declare(exchange='test', exchange_type='fanout')
channel.queue_declare(queue='', exclusive=True)
channel.basic_publish(exchange='test', routing_key='', body='{"data": %s}'%(data))
connection.close()
and finally your script.py would be like this:
import pika
import json
from time import sleep
connection = pika.BlockingConnection(
pika.ConnectionParameters(host='0.0.0.0', port=5672))
channel = connection.channel()
channel.exchange_declare(exchange='test', exchange_type='fanout')
result = channel.queue_declare(queue='', exclusive=True)
channel.queue_bind(exchange='test', queue='')
def callback(ch, method, properties, body):
body = json.loads(body.decode().replace("'", '"'))
print(body)
channel.basic_consume(
queue='', on_message_callback=callback, auto_ack=True)
channel.start_consuming()
inside the callback function in the code above, you can specify you're logic and when your logic finishes you can again use RabbitMQ to call a pattern on flask side or event do http call with requests. that would be your choice.
Using Tornado, I have a POST request that takes a long time as it makes many requests to another API service and processes the data. This can take minutes to fully complete. I don't want this to block the entire web server from responding to other requests, which it currently does.
I looked at multiple threads here on SO, but they are often 8 years old and the code does not work anylonger as tornado removed the "engine" component from tornado.gen.
Is there an easy way to kick off this long get call and not have it block the entire web server in the process? Is there anything I can put in the code to say.. "submit the POST response and work on this one function without blocking any concurrent server requests from getting an immediate response"?
Example:
main.py
def make_app():
return tornado.web.Application([
(r"/v1", MainHandler),
(r"/v1/addfile", AddHandler, dict(folderpaths = folderpaths)),
(r"/v1/getfiles", GetHandler, dict(folderpaths = folderpaths)),
(r"/v1/getfile", GetFileHandler, dict(folderpaths = folderpaths)),
])
if __name__ == "__main__":
app = make_app()
sockets = tornado.netutil.bind_sockets(8888)
tornado.process.fork_processes(0)
tornado.process.task_id()
server = tornado.httpserver.HTTPServer(app)
server.add_sockets(sockets)
tornado.ioloop.IOLoop.current().start()
addHandler.py
class AddHandler(tornado.web.RequestHandler):
def initialize(self, folderpaths):
self.folderpaths = folderpaths
def blockingFunction(self):
time.sleep(320)
post("AWAKE")
def post(self):
user = self.get_argument('user')
folderpath = self.get_argument('inpath')
outpath = self.get_argument('outpath')
workflow_value = self.get_argument('workflow')
status_code, status_text = validateInFolder(folderpath)
if (status_code == 200):
logging.info("Status Code 200")
result = self.folderpaths.add_file(user, folderpath, outpath, workflow_value)
self.write(result)
self.finish()
#At this point the path is validated.
#POST response should be send out. Internal process should continue, new
#requests should not be blocked
self.blockingFunction()
Idea is that if input-parameters are validated the POST response should be sent out.
Then internal process (blockingFunction()) should be started, that should not block the Tornado Server from processing another API POST request.
I tried defining the (blockingFunction()) as async, which allows me to process multiple concurrent user requests - however there was a warning about missing "await" with async method.
Any help welcome. Thank you
class AddHandler(tornado.web.RequestHandler):
def initialize(self, folderpaths):
self.folderpaths = folderpaths
def blockingFunction(self):
time.sleep(320)
post("AWAKE")
async def post(self):
user = self.get_argument('user')
folderpath = self.get_argument('inpath')
outpath = self.get_argument('outpath')
workflow_value = self.get_argument('workflow')
status_code, status_text = validateInFolder(folderpath)
if (status_code == 200):
logging.info("Status Code 200")
result = self.folderpaths.add_file(user, folderpath, outpath, workflow_value)
self.write(result)
self.finish()
#At this point the path is validated.
#POST response should be send out. Internal process should continue, new
#requests should not be blocked
await loop.run_in_executor(None, self.blockingFunction)
#if this had multiple parameters it would be
#await loop.run_in_executor(None, self.blockingFunction, param1, param2)
Thank you #xyres
Further read: https://www.tornadoweb.org/en/stable/faq.html
I need to return a response from my FastAPI path operation, but before this I want to send a slow request and I don't need to wait for result of that request, just log errors if there are any. Can I do this by means of Python and FastAPI? I would not like to add Celery to the project.
Here is what I have so far, but it runs synchronously:
import asyncio
import requests
async def slow_request(data):
url = 'https://external.service'
response = requests.post(
url=url,
json=data,
headers={'Auth-Header': settings.API_TOKEN}
)
if not response.status_code == 200:
logger.error('response:', response.status_code)
logger.error('data', data)
#router.post('/order/')
async def handle_order(order: Order):
json_data = {
'order': order
}
task = asyncio.create_task(
slow_request(json_data)
)
await task
return {'body': {'message': 'success'}}
OK, if nobody wants to post an answer here are the solutions:
Solution #1
We can just remove await task line as alex_noname suggested. It will work because create_task schedules task and we are no longer awaiting for its completion.
#router.post('/order/')
async def handle_order(order: Order):
json_data = {
'order': order
}
task = asyncio.create_task(
slow_request(json_data)
)
return {'body': {'message': 'success'}}
Solution #2
I ended up with BackgroundTasks as HTF suggested as I'm already using FastAPI anyway, and this solution seems more neat to me.
#router.post('/order/')
async def handle_order(order: Order, background_tasks: BackgroundTasks):
json_data = {
'order': order
}
background_tasks.add_task(slow_request, json_data)
return {'body': {'message': 'success'}}
This even works without async before def slow_request(data):
The problem is really two-part
the requests library is synchronous, so requests.post(...) will block the event loop until completed
you don't need the result of the web request to respond to the client, but your current handler cannot respond to the client until the request is completed (even if it was async)
Consider separating the request logic off into another process, so it can happen at its own speed.
The key being that you can put work into a queue of some kind to complete eventually, without directly needing the result for response to the client.
You could use an async http request library and some collection of callbacks, multiprocessing to spawn a new process(es), or something more exotic like an independent program (perhaps with a pipe or sockets to communicate).
Maybe something of this form will work for you
import base64
import json
import multiprocessing
URL_EXTERNAL_SERVICE = "https://example.com"
TIMEOUT_REQUESTS = (2, 10) # always set a timeout for requests
SHARED_QUEUE = multiprocessing.Queue() # may leak as unbounded
async def slow_request(data):
SHARED_QUEUE.put(data)
# now returns on successful queue put, rather than request completion
def requesting_loop(logger, Q, url, token):
while True: # expects to be a daemon
data = json.dumps(Q.get()) # block until retrieval (non-daemon can use sentinel here)
response = requests.post(
url,
json=data,
headers={'Auth-Header': token},
timeout=TIMEOUT_REQUESTS,
)
# raise_for_status() --> try/except --> log + continue
if response.status_code != 200:
logger.error('call to {} failed (code={}) with data: {}'.format(
url, response.status_code,
"base64:" + base64.b64encode(data.encode())
))
def startup(): # run me when starting
# do whatever is needed for logger
# create a pool instead if you may need to process a lot of requests
p = multiprocessing.Process(
target=requesting_loop,
kwargs={"logger": logger, "Q": SHARED_QUEUE, "url": URL_EXTERNAL_SERVICE, "token": settings.API_TOKEN},
daemon=True
)
p.start()
I have a RESTFUL Flask API I am serving with gunicorn and I'm trying to continue running parse_request() after sending a response to whoever made a POST request so they're not left waiting for it to finish
I'm not too sure if this will even achieve what I want but this is the code I have so far.
from threading import Thread
import subprocess
from flask import Flask
import asyncio
application = Flask(__name__)
async def parse_request(data):
try:
command = './webscraper.py -us "{user}" -p "{password}" -url "{url}"'.format(**data)
output = subprocess.check_output(['bash','-c', command])
except Exception as e:
print(e)
#application.route('/scraper/run', methods=['POST'])
def init_scrape():
try:
thread = Thread(target=parse_request, kwargs={'data': request.json})
thread.start()
return jsonify({'Scraping this site: ': request.json["url"]}), 201
except Exception as e:
print(e)
if __name__ == '__main__':
try:
application.run(host="0.0.0.0", port="8080")
except Exception as e:
print(e)
I am sending a POST request similar to this.
localhost:8080/scraper/run
data = {
"user": "username",
"password": "password",
"url": "www.mysite.com"
}
The error I get when sending a POST request is this.
/usr/lib/python3.6/threading.py:864: RuntimeWarning: coroutine 'parse_request' was never awaited
self._target(*self._args, **self._kwargs)
So first things first, why are you calling webscraper.py with subprocess? This is completely pointless. Because webscraper.py is a python script you should be importing the needed functions/classes from webscraper.py and using them directly. Calling it this way is totally defeating what you are wanting to do.
Next, your actual question you have got mixed up between async and threading. I suggest you learn more about it but essentially you want something like the following using Quart which is an async version of Flask, it would suit your situation well.
from quart import Quart, response, jsonify
import asyncio
from webscraper import <Class>, <webscraper_func> # Import what you need or
import webscraper # whatever suits your needs
app = Quart(__name__)
async def parse_request(user, password, url):
webscraper_func(user, password, url)
return 'Success'
#app.route('/scraper/run', methods=['POST'])
async def init_scrape():
user = request.args.get('user')
password = request.args.get('password')
url = request.args.get('url')
asyncio.get_running_loop().run_in_executor(
None,
parse_request(user, password, url)
)
return 'Success'
if __name__ == '__main__':
app.run(host='0.0.0.0', port='8080')
I have 2 functions.
1st function stores the data received in a list and 2nd function writes the data into a csv file.
I'm using Flask. Whenever a web service has been called it will store the data and send response to it, as soon as it sends response it triggers the 2nd function.
My Code:
from flask import Flask, flash, request, redirect, url_for, session
import json
app = Flask(__name__)
arr = []
#app.route("/test", methods=['GET','POST'])
def check():
arr.append(request.form['a'])
arr.append(request.form['b'])
res = {'Status': True}
return json.dumps(res)
def trigger():
df = pd.DataFrame({'x': arr})
df.to_csv("docs/xyz.csv", index=False)
return
Obviously the 2nd function is not called.
Is there a way to achieve this?
P.S: My real life problem is different where trigger function is time consuming and I don't want user to wait for it to finish execution.
One solution would be to have a background thread that will watch a queue. You put your csv data in the queue and the background thread will consume it. You can start such a thread before first request:
import threading
from multiprocessing import Queue
class CSVWriterThread(threading.Thread):
def __init__(self, *args, **kwargs):
threading.Thread.__init__(self, *args, **kwargs)
self.input_queue = Queue()
def send(self, item):
self.input_queue.put(item)
def close(self):
self.input_queue.put(None)
self.input_queue.join()
def run(self):
while True:
csv_array = self.input_queue.get()
if csv_array is None:
break
# Do something here ...
df = pd.DataFrame({'x': csv_array})
df.to_csv("docs/xyz.csv", index=False)
self.input_queue.task_done()
time.sleep(1)
# Done
self.input_queue.task_done()
return
#app.before_first_request
def activate_job_monitor():
thread = CSVWriterThread()
app.csvwriter = thread
thread.start()
And in your code put the message in the queue before returning:
#app.route("/test", methods=['GET','POST'])
def check():
arr.append(request.form['a'])
arr.append(request.form['b'])
res = {'Status': True}
app.csvwriter.send(arr)
return json.dumps(res)
P.S: My real life problem is different where trigger function is time consuming and I don't want user to wait for it to finish execution.
Consider using celery which is made for the very problem you're trying to solve. From docs:
Celery is a simple, flexible, and reliable distributed system to process vast amounts of messages, while providing operations with the tools required to maintain such a system.
I recommend you integrate celery with your flask app as described here. your trigger method would then become a straightforward celery task that you can execute without having to worry about long response time.
Im actually working on another interesting case on my side where i pass the work off to a python worker that sends the job to a redis queue. There are some great blogs using redis with Flask , you basically need to ensure redis is running (able to connect on port 6379)
The worker would look something like this:
import os
import redis
from rq import Worker, Queue, Connection
listen = ['default']
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(list(map(Queue, listen)))
worker.work()
In my example I have a function that queries a database for usage and since it might be a lengthy process i pass it off to the worker (running as a seperate script)
def post(self):
data = Task.parser.parse_args()
job = q.enqueue_call(
func=migrate_usage, args=(my_args),
result_ttl=5000
)
print("Job ID is: {}".format(job.get_id()))
job_key = job.get_id()
print(str(Job.fetch(job_key, connection=conn).result))
if job:
return {"message": "Job : {} added to queue".format(job_key)}, 201
Credit due to the following article:
https://realpython.com/flask-by-example-implementing-a-redis-task-queue/#install-requirements
You can try use streaming. See next example:
import time
from flask import Flask, Response
app = Flask(__name__)
#app.route('/')
def main():
return '''<div>start</div>
<script>
var xhr = new XMLHttpRequest();
xhr.open('GET', '/test', true);
xhr.onreadystatechange = function(e) {
var div = document.createElement('div');
div.innerHTML = '' + this.readyState + ':' + this.responseText;
document.body.appendChild(div);
};
xhr.send();
</script>
'''
#app.route('/test')
def test():
def generate():
app.logger.info('request started')
for i in range(5):
time.sleep(1)
yield str(i)
app.logger.info('request finished')
yield ''
return Response(generate(), mimetype='text/plain')
if __name__ == '__main__':
app.run('0.0.0.0', 8080, True)
All magic in this example in genarator where you can start response data, after do some staff and yield empty data to end your stream.
For details look at http://flask.pocoo.org/docs/patterns/streaming/.
You can defer route specific actions with limited context by combining after_this_request and response.call_on_close. Note that request and response context won't be available but the route function context remains available. So you'll need to copy any request/response data you'll need into local variables for deferred access.
I moved your array to a local var to show how the function context is preserved. You could change your csv write function to an append so you're not pushing data endlessly into memory.
from flask import Flask, flash, request, redirect, url_for, session
import json
app = Flask(__name__)
#app.route("/test", methods=['GET','POST'])
def check():
arr = []
arr.append(request.form['a'])
arr.append(request.form['b'])
res = {'Status': True}
#flask.after_this_request
def add_close_action(response):
#response.call_on_close
def process_after_request():
df = pd.DataFrame({'x': arr})
df.to_csv("docs/xyz.csv", index=False)
return response
return json.dumps(res)