Python asynchronous request handling while calling subprocess - python

I have a REST API written in Flask and using the Tornado web server.
When a request comes in, I execute a long running shell command using tornado.process.Subprocess.
Is there a way to keep handling new requests while the shell command is running, and return the result after the shell command is complete.
I am thinking
from tornado import gen
from tornado.process import Subprocess
#gen.coroutine
def callSubProcess(commandString):
p = Subprocess(commandString)
yield p.wait_for_exit()
raise gen.Return(someResult)
#app.route(url, methods=['POST'])
#gen.coroutine
def process():
result = yield callSubProcess(commandString)
raise gen.Return(result)
http_server = HTTPServer(WSGIContainer(app))
http_server.listen(5000)
IOLoop.instance().start()
But this doesn't seem to work:
I just get 'Future' object is not callable when making a request. What is the problem here?

https://github.com/celery/celery is exactly what you need,
Celery allows you to execute the shell command asynchronously thus you'll be able to handle all coming requests.

To use coroutines, you must use tornado.web.Application, not tornado.wsgi.WSGIContainer. Tornado does not magically give WSGI frameworks like Flask the ability to handle asynchronous operations and coroutines.

Related

Python deploying infinite while loop to production and handling failures

I have a script that waits for tasks from a task queue and then runs them. Something like this minimal example:
import redis
cache = redis.Redis(host='127.0.0.1', port=6379)
import time
def main():
while True:
message = cache.blpop('QUEUE', timeout=0)
work(message)
def work(message):
print(f"beginning work: {message}")
time.sleep(10)
if __name__ == "__main__":
main()
I am not using a web server because the script does not need to answer http requests. However I've confused myself a bit about how to make this script robust against errors in production.
With a web server and gunicorn, gunicorn would handle forking a process for each request. If the request causes an error then the worker dies and the request fails but the server continues to run.
How can I achieve this if I'm not running an http server? I could fork a process to perform the "work" function, but the code performing the fork would still be application code.
Is it possible to deploy a non-http server script like mine using Gunicorn? Is there something else I should be using to handle forking processes?
Or is it reasonable to fork inside the application and deploy to production?
what about this:
while True:
try:
message = cache.blpop('QUEUE', timeout=0)
work(message)
except: Exception as e :
print(e)

python : dynamically spawn multithread workers with flask-socket io and python-binance

Hello fellow developers,
I'm actually trying to create a small webapp that would allow me to monitor multiple binance accounts from a dashboard and maybe in the futur perform some small automatic trading actions.
My frontend is implemented with Vue+quasar and my backend server is based on python Flask for the REST api.
What I would like to do is being able to start a background process dynamically when a specific endpoint of my server is called. Once this process is started on the server, I would like it to communicate via websocket with my Vue client.
Right now I can spawn the worker and create the websocket communication, but somehow, I can't figure out how to make all the threads in my worker to work all together. Let me get a bit more specific:
Once my worker is started, I'm trying to create at least two threads. One is the infinite loop allowing me to automate some small actions and the other one is the flask-socketio server that will handle the sockets connections. Here is the code of that worker :
customWorker.py
import time
from flask import Flask
from flask_socketio import SocketIO, send, emit
import threading
import json
import eventlet
# custom class allowing me to communicate with my mongoDD
from db_wrap import DbWrap
from binance.client import Client
from binance.exceptions import BinanceAPIException, BinanceWithdrawException, BinanceRequestException
from binance.websockets import BinanceSocketManager
def process_message(msg):
print('got a websocket message')
print(msg)
class customWorker:
def __init__(self, workerId, sleepTime, dbWrap):
self.workerId = workerId
self.sleepTime = sleepTime
self.socketio = None
self.dbWrap = DbWrap()
# this retrieves worker configuration from database
self.config = json.loads(self.dbWrap.get_worker(workerId))
keys = self.dbWrap.get_worker_keys(workerId)
self.binanceClient = Client(keys['apiKey'], keys['apiSecret'])
def handle_message(self, data):
print ('My PID is {} and I received {}'.format(os.getpid(), data))
send(os.getpid())
def init_websocket_server(self):
app = Flask(__name__)
socketio = SocketIO(app, async_mode='eventlet', logger=True, engineio_logger=True, cors_allowed_origins="*")
eventlet.monkey_patch()
socketio.on_event('message', self.handle_message)
self.socketio = socketio
self.app = app
def launch_main_thread(self):
while True:
print('My PID is {} and workerId {}'
.format(os.getpid(), self.workerId))
if self.socketio is not None:
info = self.binanceClient.get_account()
self.socketio.emit('my_account', info, namespace='/')
def launch_worker(self):
self.init_websocket_server()
self.socketio.start_background_task(self.launch_main_thread)
self.socketio.run(self.app, host="127.0.0.1", port=8001, debug=True, use_reloader=False)
Once the REST endpoint is called, the worker is spawned by calling birth_worker() method of "Broker" object available within my server :
from custom_worker import customWorker
#...
def create_worker(self, workerid, sleepTime, dbWrap):
worker = customWorker(workerid, sleepTime, dbWrap)
worker.launch_worker()
def birth_worker(workerid, 5, dbwrap):
p = Process(target=self.create_worker, args=(workerid,10, botPipe, dbWrap))
p.start()
So when this is done, the worker is launched in a separate process that successfully creates threads and listens for socket connection. But my problem is that I can't use my binanceClient in my main thread. I think that it is using threads and the fact that I use eventlet and in particular the monkey_patch() function breaks it. When I try to call the binanceClient.get_account() method I get an error AttributeError: module 'select' has no attribute 'poll'
I'm pretty sure about that it comes from monkey_patch because if I use it in the init() method of my worker (before patching) it works and I can get the account info. So I guess there is a conflict here that I've been trying to resolve unsuccessfully.
I've tried using only the thread mode for my socket.io app by using async_mode=threading but then, my flask-socketio app won't start and listen for sockets as the line self.socketio.run(self.app, host="127.0.0.1", port=8001, debug=True, use_reloader=False) blocks everything
I'm pretty sure I have an architecture problem here and that I shouldn't start my app by launching socketio.run. I've been unable to start it with gunicorn for example because I need it to be dynamic and call it from my python scripts. I've been struggling to find the proper way to do this and that's why I'm here today.
Could someone please give me a hint on how is this supposed to be achieved ? How can I dynamically spawn a subprocess that will manage a socket server thread, an infinite loop thread and connections with binanceClient ? I've been roaming stack overflow without success, every advice is welcome, even an architecture reforge.
Here is my environnement:
Manjaro Linux 21.0.1
pip-chill:
eventlet==0.30.2
flask-cors==3.0.10
flask-socketio==5.0.1
pillow==8.2.0
pymongo==3.11.3
python-binance==0.7.11
websockets==8.1

Return HTTP status code from Flask without "returning"

Context
I have a server called "server.py" that functions as a post-commit webhook from GitLab.
Within "server.py", there is a long-running process (~40 seconds)
SSCCE
#!/usr/bin/env python
import time
from flask import Flask, abort, jsonify
debug = True
app = Flask(__name__)
#app.route("/", methods=['POST'])
def compile_metadata():
# the long running process...
time.sleep(40)
# end the long running process
return jsonify({"success": True})
if __name__ == "__main__":
app.run(host='0.0.0.0', port=8082, debug=debug, threaded=True)
Problem Statement
GitLab's webhooks expect return codes to be returned quickly. Since my webhook returns after or around 40 seconds; GitLab sends a retry sending my long running process in a loop until GitLab tries too many times.
Question
Am I able to return a status code from Flask back to GitLab, but still run my long running process?
I've tried adding something like:
...
def compile_metadata():
abort(200)
# the long running process
time.sleep(40)
but abort() only supports failure codes.
I've also tried using #after_this_request:
#app.route("/", methods=['POST'])
def webhook():
#after_this_request
def compile_metadata(response):
# the long running process...
print("Starting long running process...")
time.sleep(40)
print("Process ended!")
# end the long running process
return jsonify({"success": True})
Normally, flask returns a status code only from python's return statement, but I obviously cannot use that before the long running process as it will escape from the function.
Note: I am not actually using time.sleep(40) in my code. That is there only for posterity, and for the SSCCE. It will return the same result
Have compile_metadata spawn a thread to handle the long running task, and then return the result code immediately (i.e., without waiting for the thread to complete). Make sure to include some limitation on the number of simultaneous threads that can be spawned.
For a slightly more robust and scalable solution, consider some sort message queue based solution like celery.
For the record, a simple solution might look like:
import time
import threading
from flask import Flask, abort, jsonify
debug = True
app = Flask(__name__)
def long_running_task():
print 'start'
time.sleep(40)
print 'finished'
#app.route("/", methods=['POST'])
def compile_metadata():
# the long running process...
t = threading.Thread(target=long_running_task)
t.start()
# end the long running process
return jsonify({"success": True})
if __name__ == "__main__":
app.run(host='0.0.0.0', port=8082, debug=debug, threaded=True)
I was able to achieve this by using multiprocessing.dummy.Pool. After using threading.Thread, it proved unhelpful as Flask would still wait for the thread to finish (even with t.daemon = True)
I achieved the result of returning a status code before the long-running task like such:
#!/usr/bin/env python
import time
from flask import Flask, jsonify, request
from multiprocessing.dummy import Pool
debug = True
app = Flask(__name__)
pool = Pool(10)
def compile_metadata(data):
print("Starting long running process...")
print(data['user']['email'])
time.sleep(5)
print("Process ended!")
#app.route('/', methods=['POST'])
def webhook():
data = request.json
pool.apply_async(compile_metadata, [data])
return jsonify({"success": True}), 202
if __name__ == "__main__":
app.run(host='0.0.0.0', port=8082, debug=debug, threaded=True)
When you want to return a response from the server quickly, and still do some time consuming work, generally you should use some sort of shared storage like Redis to quickly store all the stuff you need, then return your status code. So the request gets served very quickly.
And have a separate server routinely work that semantic job queue to do the time consuming work. And then remove the job from the queue once the work is done. Perhaps storing the final result in shared storage as well. This is the normal approach, and it scales very well. For example, if your job queue grows too fast for a single server to keep up with, you can add more servers to work that shared queue.
But even if you don't need scalability, it's a very simple design to understand, implement, and debug. If you ever get an unexpected spike in request load, it just means that your separate server will probably be chugging away all night long. And you have peace of mind that if your servers shut down, you won't lose any unfinished work because they're safe in the shared storage.
But if you have one server do everything, performing the long running tasks asynchronously in the background, I guess maybe just make sure that the background work is happening like this:
------------ Serving Responses
---- Background Work
And not like this:
---- ---- Serving Responses
---- Background Work
Otherwise it would be possible that if the server is performing some block of work in the background, it might be unresponsive to a new request, depending on how long that time consuming work takes (even under very little request load). But if the client times out and retries, I think you're still safe from performing double work. But you're not safe from losing unfinished jobs.

redis get function return None

I am working on a flask application that interacts with redis. This applciation is deployed on heroku, with a redis add on.
When I am doing some testing with the interaction, I am not able to get the key value pair that I just set. Instead, I always get None as a return type. Here is the example:
import Flask
import redis
app = Flask(__name__)
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
redis = redis.from_url(redis_url)
#app.route('/test')
def test():
redis.set("test", "{test1: test}")
print redis.get("test") # print None here
return "what the freak"
if __name__ == "__main__":
app.run(host='0.0.0.0')
As shown above, the test route will print None, means the value is not set. I am confused. When I test the server on my local browser it works, and when I tried interacting with redis using heroku python shell it works too.
testing with python shell:
heroku run python
from server import redis
redis.set('test', 'i am here') # return True
redis.get('test') # return i am here
I am confused now. How should I properly interact with redis using Flask?
Redis-py by default constructs a ConnectionPool client, and this is probably what the from_url helper function is doing. While Redis itself is single threaded, the commands from the connection pool have no guaranteed order of execution. For a single client, construct a redis.StrictRedis client directly, or pass through the param connection_pool=none. This is preferable for simple commands, low in number, as there is less connection management overhead. You can alternatively use a pipeline in the context of a connection pool to serialise a batch operation.
https://redis-py.readthedocs.io/en/latest/#redis.ConnectionPool
https://redis-py.readthedocs.io/en/latest/#redis.Redis.pipeline
I did more experiments on this. It seems there is an issue related to the delay. the below modification will make it work:
#app.route('/test')
def test():
redis.set("test", "{test1: test}")
time.sleep(5) # add the delay needed to the let the set finish
print redis.get("test") # print "{test1: test}" here
return "now it works"
I read the documentation on redis, redis seems to be single threaded. So I am not sure why it will execute the get function call before the set function is done. Someone with more experience please post an explanation.

using twisted adbapi in ZSI soap

I'm new to python and currently researching its viability to be used as a soap server. I currently have a very rough application that uses the mysql blocking api, but would like to try twisted adbapi. I've successfully used twisted adbapi on regular twisted code using reactors, but can't seem to make it work with code below using ZSI framework. It's not returning anything from mysql. Anyone ever used twisted adbapi with ZSI?
import os
import sys
from dpac_server import *
from ZSI.twisted.wsgi import (SOAPApplication,
soapmethod,
SOAPHandlerChainFactory)
from twisted.enterprise import adbapi
import MySQLdb
def _soapmethod(op):
op_request = GED("http://www.example.org/dpac/", op).pyclass
op_response = GED("http://www.example.org/dpac/", op + "Response").pyclass
return soapmethod(op_request.typecode, op_response.typecode,operation=op, soapaction=op)
class DPACServer(SOAPApplication):
factory = SOAPHandlerChainFactory
#_soapmethod('GetIPOperation')
def soap_GetIPOperation(self, request, response, **kw):
dbpool = adbapi.ConnectionPool("MySQLdb", '127.0.0.1','def_user', 'def_pwd', 'def_db', cp_reconnect=True)
def _dbSPGeneric(txn, cmts):
txn.execute("call def_db.getip(%s)", (cmts, ))
return txn.fetchall()
def dbSPGeneric(cmts):
return dbpool.runInteraction(_dbSPGeneric, cmts)
def returnResults(results):
response.Result = results
def showError(msg):
response.Error = msg
response.Result = ""
response.Error = ""
d = dbSPGeneric(request.Cmts)
d.addCallbacks(returnResults, showError)
return request, response
def main():
from wsgiref.simple_server import make_server
from ZSI.twisted.wsgi import WSGIApplication
application = WSGIApplication()
httpd = make_server('127.0.0.1', 8080, application)
application['dpac'] = DPACServer()
print "listening..."
httpd.serve_forever()
if __name__ == '__main__':
main()
The code you posted creates a new ConnectionPool per (some kind of) request and it never stops the pool. This means you'll eventually run out of resources and you won't be able to service any more requests. "Eventually" is probably after one or two or three requests.
If you never get any responses perhaps this isn't the problem you've encountered. It will be a problem at some point though.
On closer inspection, I wonder if this code even runs the Twisted reactor at all. On first read, I thought you were using some ZSI Twisted integration to run your server. Now I see that you're using wsgiref.simple_server. I am moderately confident that this won't work.
You're already using Twisted, use Twisted's WSGI server instead.
Beyond that, verify that ZSI executes your callbacks in the correct thread. The default for WSGI applications is to run in a non-reactor thread. Twisted APIs are not thread-safe, so if ZSI doesn't do something to correct for this, you'll have bugs introduced by using un-thread-safe APIs in threads.

Categories

Resources