I am trying to build an application that lets you schedule and execute multiple long running jobs in a background thread using APScheduler. To control the job schedules and view the (live) output of the jobs I want to send messages to the Flask application that runs in the same process (using Blinker) so I can stream them to a web client using Flask-SocketIO.
I came up with the following code but it seems send_log_update() is not being called at all. Please note that I have not yet added Flask-SocketIO to this example. I first wanted to make sure I could communicate to the Flask application before further complicating things.
Is this a sensible way to go about things? And if so: Am I doing something wrong here? I am not married to any of the used solutions specifically but I do need something like APScheduler to schedule jobs at specific times (instead of just intervals, like in this example).
I have considered the possibility of also using websockets to provide the communication between the background job and the rest of the application but that would be too unreliable. I have to process all output coming from the background process (to send to a log ingester) in addition to streaming it to a web client and I would like to keep the background job as agnostic of any databases and logging frameworks as possible.
# pip install flask apscheduler sqlalchemy blinker
from time import sleep
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
from apscheduler.schedulers.background import BackgroundScheduler
from blinker import signal
from flask import Flask
from pytz import utc
# initialize Flask+SocketIO
app = Flask(__name__)
# signal to communicate between background thread and Flask
logsignal = signal('log')
# handle signals coming from background thread and emit them
# over the websocket
#logsignal.connect_via(app)
def send_log_update(sender, log_line, context, **extra):
# eventually I want to send this to the web client using
# Flask-SocketIO
print('received signal: ' + log_line)
# Background job that will run in the scheduler thread
def background_job():
print('starting background job')
logsignal.send('starting job')
sleep(3)
logsignal.send('job done')
# configure APScheduler
jobstores = {
'default': SQLAlchemyJobStore(url='sqlite:///scheduler.sqlite')
}
job_defaults = {
'coalesce': False,
'max_instances': 1
}
# create and start scheduler
scheduler = BackgroundScheduler(
job_defaults=job_defaults, jobstores=jobstores, timezone=utc)
if __name__ == '__main__':
scheduler.add_job(background_job, 'interval', seconds=5,
replace_existing=True, id='sample_job',
args=[])
scheduler.start()
app.run()
The answer was quite simple, I was using #logsignal.connect_via(app) which restricts the send_log_update() handler to only respond to signals originating from the Flask app. After using the regular #logsignal.connect method the handler got executed. I made into a fully working example with a web interface that shows the log being streamed.
# Runs a scheduled job in a background thread using APScheduler and streams
# it's output to a web client using websockets. Communication between the Flask
# thread and APScheduler thread is being done through (blinker) signals.
#
# Install dependencies (preferably in your virtualenv)
# pip install flask apscheduler sqlalchemy blinker flask-socketio simple-websocket
# and then run with:
# python this_script.py
from time import sleep
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
from apscheduler.schedulers.background import BackgroundScheduler
from blinker import signal
from flask import Flask
from flask_socketio import SocketIO
from pytz import utc
# initialize Flask+SocketIO
app = Flask(__name__)
socketio = SocketIO(app)
# signal to communicate between background thread and Flask
logsignal = signal('log')
# handle signals coming from background thread and emit them
# over the websocket
#logsignal.connect
def send_log_update(log_line):
socketio.emit('logUpdate', log_line)
# Background job that will run in the scheduler thread
def background_job():
logsignal.send('starting job')
sleep(3)
logsignal.send('job done')
# configure APScheduler
jobstores = {
'default': SQLAlchemyJobStore(url='sqlite:///scheduler.sqlite')
}
job_defaults = {
'coalesce': False,
'max_instances': 1
}
# create and start scheduler
scheduler = BackgroundScheduler(
job_defaults=job_defaults, jobstores=jobstores, timezone=utc)
# simple websocket client for testing purposes
#app.route("/")
def info():
return """
<html>
<head>
<script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/4.0.1/socket.io.js" integrity="sha512-q/dWJ3kcmjBLU4Qc47E4A9kTB4m3wuTY7vkFJDTZKjTs8jhyGQnaUrxa0Ytd0ssMZhbNua9hE+E7Qv1j+DyZwA==" crossorigin="anonymous"></script>
</head>
<body>
<h1>Streaming log</h1>
<pre id="log"></pre>
<script type="text/javascript" charset="utf-8">
var socket = io();
socket.on('logUpdate', function(msg) {
let log = document.getElementById('log');
log.append(msg + '\\n');
});
</script>
</body>
</html>
"""
if __name__ == '__main__':
scheduler.add_job(background_job, 'interval', seconds=5,
replace_existing=True, id='sample_job',
args=[])
scheduler.start()
socketio.run(app)
Related
I'm currently following a Udemy course that's having me use flask and socketio to use a neural network model to drive a simulated car. However, as he's explaining the basics of how flask and socketio work, he had us write this code:
import socketio
import eventlet
from flask import Flask
sio = socketio.Server()
app = Flask(__name__)
#sio.on('connect')
def connect(sid, environ):
print('Connected')
if __name__ == "__main__":
app = socketio.Middleware(sio, app)
eventlet.wsgi.server(eventlet.listen(('', 4567)), app)
Which is supposed to print "Connected!" to the console when we connect to the server. Now, I get this message when I run it, so I'm pretty sure I'm connected.
(7532) accepted ('127.0.0.1', 49374)
But it's refusing to print "Connected!" when I connect like it's supposed to, no matter what I try.
EDIT:
So, I'm still not sure what the root cause of this is, but I found out how to fix it.
conda install python-engineio==3.13.2
conda install python-socketio==4.6.1
You might need to run anaconda as an administrator. If so, search for "Anaconda Powershell Prompt" then run it as an admin.
socketio uses special protocol so you need special client to use it.
At least
import socketio
sio = socketio.Client()
con = sio.connect('http://0.0.0.0:4567')
sio.wait()
I will not work with web browser. With web browser you can see only accepted.
Browser have to load web page which uses special JavaScript module to use socketio.
You can find more details in socketio documentation: Client
EDIT:
And here server which you can test with web browser.
When you open http://0.0.0.0:4567 in browser then index() sends to browser HTML with JavaScript code which loads special library and uses socketio to send own event. And you should see Connected, my event. When you close page or browser then you should see Disconnected
It is based on example from documentation for flask-socketio
import socketio
import eventlet
from flask import Flask
sio = socketio.Server()
app = Flask(__name__)
#app.route('/')
def index():
return """
Hello World!
<script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/4.0.1/socket.io.js" integrity="sha512-q/dWJ3kcmjBLU4Qc47E4A9kTB4m3wuTY7vkFJDTZKjTs8jhyGQnaUrxa0Ytd0ssMZhbNua9hE+E7Qv1j+DyZwA==" crossorigin="anonymous"></script>
<script type="text/javascript" charset="utf-8">
var socket = io();
socket.on('connect', function() {
socket.emit('my event', {data: 'Im connected!'});
});
</script>
"""
#sio.on('connect')
def connect(sid, environ):
print('Connected')
#sio.on('disconnect')
def disconnect(sid): # without `environ`
print('Disconnected')
#sio.on('my event')
def my_event(sid, environ):
print('my event')
if __name__ == "__main__":
app = socketio.Middleware(sio, app)
eventlet.wsgi.server(eventlet.listen(('', 4567)), app)
Hello fellow developers,
I'm actually trying to create a small webapp that would allow me to monitor multiple binance accounts from a dashboard and maybe in the futur perform some small automatic trading actions.
My frontend is implemented with Vue+quasar and my backend server is based on python Flask for the REST api.
What I would like to do is being able to start a background process dynamically when a specific endpoint of my server is called. Once this process is started on the server, I would like it to communicate via websocket with my Vue client.
Right now I can spawn the worker and create the websocket communication, but somehow, I can't figure out how to make all the threads in my worker to work all together. Let me get a bit more specific:
Once my worker is started, I'm trying to create at least two threads. One is the infinite loop allowing me to automate some small actions and the other one is the flask-socketio server that will handle the sockets connections. Here is the code of that worker :
customWorker.py
import time
from flask import Flask
from flask_socketio import SocketIO, send, emit
import threading
import json
import eventlet
# custom class allowing me to communicate with my mongoDD
from db_wrap import DbWrap
from binance.client import Client
from binance.exceptions import BinanceAPIException, BinanceWithdrawException, BinanceRequestException
from binance.websockets import BinanceSocketManager
def process_message(msg):
print('got a websocket message')
print(msg)
class customWorker:
def __init__(self, workerId, sleepTime, dbWrap):
self.workerId = workerId
self.sleepTime = sleepTime
self.socketio = None
self.dbWrap = DbWrap()
# this retrieves worker configuration from database
self.config = json.loads(self.dbWrap.get_worker(workerId))
keys = self.dbWrap.get_worker_keys(workerId)
self.binanceClient = Client(keys['apiKey'], keys['apiSecret'])
def handle_message(self, data):
print ('My PID is {} and I received {}'.format(os.getpid(), data))
send(os.getpid())
def init_websocket_server(self):
app = Flask(__name__)
socketio = SocketIO(app, async_mode='eventlet', logger=True, engineio_logger=True, cors_allowed_origins="*")
eventlet.monkey_patch()
socketio.on_event('message', self.handle_message)
self.socketio = socketio
self.app = app
def launch_main_thread(self):
while True:
print('My PID is {} and workerId {}'
.format(os.getpid(), self.workerId))
if self.socketio is not None:
info = self.binanceClient.get_account()
self.socketio.emit('my_account', info, namespace='/')
def launch_worker(self):
self.init_websocket_server()
self.socketio.start_background_task(self.launch_main_thread)
self.socketio.run(self.app, host="127.0.0.1", port=8001, debug=True, use_reloader=False)
Once the REST endpoint is called, the worker is spawned by calling birth_worker() method of "Broker" object available within my server :
from custom_worker import customWorker
#...
def create_worker(self, workerid, sleepTime, dbWrap):
worker = customWorker(workerid, sleepTime, dbWrap)
worker.launch_worker()
def birth_worker(workerid, 5, dbwrap):
p = Process(target=self.create_worker, args=(workerid,10, botPipe, dbWrap))
p.start()
So when this is done, the worker is launched in a separate process that successfully creates threads and listens for socket connection. But my problem is that I can't use my binanceClient in my main thread. I think that it is using threads and the fact that I use eventlet and in particular the monkey_patch() function breaks it. When I try to call the binanceClient.get_account() method I get an error AttributeError: module 'select' has no attribute 'poll'
I'm pretty sure about that it comes from monkey_patch because if I use it in the init() method of my worker (before patching) it works and I can get the account info. So I guess there is a conflict here that I've been trying to resolve unsuccessfully.
I've tried using only the thread mode for my socket.io app by using async_mode=threading but then, my flask-socketio app won't start and listen for sockets as the line self.socketio.run(self.app, host="127.0.0.1", port=8001, debug=True, use_reloader=False) blocks everything
I'm pretty sure I have an architecture problem here and that I shouldn't start my app by launching socketio.run. I've been unable to start it with gunicorn for example because I need it to be dynamic and call it from my python scripts. I've been struggling to find the proper way to do this and that's why I'm here today.
Could someone please give me a hint on how is this supposed to be achieved ? How can I dynamically spawn a subprocess that will manage a socket server thread, an infinite loop thread and connections with binanceClient ? I've been roaming stack overflow without success, every advice is welcome, even an architecture reforge.
Here is my environnement:
Manjaro Linux 21.0.1
pip-chill:
eventlet==0.30.2
flask-cors==3.0.10
flask-socketio==5.0.1
pillow==8.2.0
pymongo==3.11.3
python-binance==0.7.11
websockets==8.1
I want to send socket from asynchronous class in my Flask project. But when I send it, it takes enormous time before it arrives to JavaScript. I am sending it as:
socket_io.emit("event_name", {"foo": "bar"}, broadcast=True, namespace="/com")
App with socketio is initialised as:
app = Flask(__name__, template_folder="templates", static_folder="static", static_url_path="/static")
socketio = SocketIO(app=app, cookie="cookie_name", async_mode=None)
And it is started by this command:
socketio.run(app=app, host="0.0.0.0", port=5000, log_output=False)
My Python library versions are:
# Python == 3.8.5
Flask==1.1.2
Flask-SocketIO==5.0.1
python-engineio==4.0.0
python-socketio==5.04
gevent==20.12.1
gevent-websocket==0.10.1
JavaScript SocketIO: v3.0.4
When I send socket normally by emit command in socket_io handler, it works ok. But when I want to send same socket from external process, it takes a long time.
Does anyone know, how can I solve this problem?
Thank you
The problem was with monkey patching and 32-bit Python3. I must install 64-bit Python3 and then add this at the first line:
import gevent.monkey; gevent.monkey.patch_all()
My end goal is to have a button on my website (dashboard created in React) which allows me to run a Selenium test (written in python).
I am using socket.io in hopes that I can stream test results live back to dashboard, but I seem to be hitting some sort of time limit at about 29 seconds.
To debug I made this test case, which completes on the server side, but my connection is severed before emit('test_progress', 29) happens.
from flask import Flask, render_template
from flask_socketio import SocketIO, join_room, emit
import time
app = Flask(__name__)
app.config['SECRET_KEY'] = 'secret!'
socketio = SocketIO(app)
#socketio.on('run_test')
def handle_run_test(run_test):
print('received run_test')
for x in range(1, 30):
time.sleep(1)
print(x)
emit('test_progress', x)
time.sleep(1)
print('TEST FINISHED')
emit('test_finished', {'data':None})
if __name__ == '__main__':
socketio.run(app, debug=True)
(Some of) my JavaScript
import settings from './settings.js';
import io from 'socket.io-client';
const socket = io(settings.socketio);
socket.on('test_progress', function(data){
console.log(data);
});
My console in browser
...
App.js:154 27
App.js:154 28
polling-xhr.js:269 POST http://127.0.0.1:5000/socket.io/?EIO=3&transport=polling&t=Mbl7mEI&sid=72903901182d49eba52a4a813772eb06 400 (BAD REQUEST)
...
(reconnects)
Eventually, I'll have a test running that could take 40-60 seconds instead of the arbitrary time.sleep(1) calls, so I would like the function to be able to use more than 29 seconds. Am I going about this wrong or is there a way to change this time limit?
My solution was to use threading as described in this question
I also needed to implement #copy_current_request_context so that the thread could communicate
Context
I have a server called "server.py" that functions as a post-commit webhook from GitLab.
Within "server.py", there is a long-running process (~40 seconds)
SSCCE
#!/usr/bin/env python
import time
from flask import Flask, abort, jsonify
debug = True
app = Flask(__name__)
#app.route("/", methods=['POST'])
def compile_metadata():
# the long running process...
time.sleep(40)
# end the long running process
return jsonify({"success": True})
if __name__ == "__main__":
app.run(host='0.0.0.0', port=8082, debug=debug, threaded=True)
Problem Statement
GitLab's webhooks expect return codes to be returned quickly. Since my webhook returns after or around 40 seconds; GitLab sends a retry sending my long running process in a loop until GitLab tries too many times.
Question
Am I able to return a status code from Flask back to GitLab, but still run my long running process?
I've tried adding something like:
...
def compile_metadata():
abort(200)
# the long running process
time.sleep(40)
but abort() only supports failure codes.
I've also tried using #after_this_request:
#app.route("/", methods=['POST'])
def webhook():
#after_this_request
def compile_metadata(response):
# the long running process...
print("Starting long running process...")
time.sleep(40)
print("Process ended!")
# end the long running process
return jsonify({"success": True})
Normally, flask returns a status code only from python's return statement, but I obviously cannot use that before the long running process as it will escape from the function.
Note: I am not actually using time.sleep(40) in my code. That is there only for posterity, and for the SSCCE. It will return the same result
Have compile_metadata spawn a thread to handle the long running task, and then return the result code immediately (i.e., without waiting for the thread to complete). Make sure to include some limitation on the number of simultaneous threads that can be spawned.
For a slightly more robust and scalable solution, consider some sort message queue based solution like celery.
For the record, a simple solution might look like:
import time
import threading
from flask import Flask, abort, jsonify
debug = True
app = Flask(__name__)
def long_running_task():
print 'start'
time.sleep(40)
print 'finished'
#app.route("/", methods=['POST'])
def compile_metadata():
# the long running process...
t = threading.Thread(target=long_running_task)
t.start()
# end the long running process
return jsonify({"success": True})
if __name__ == "__main__":
app.run(host='0.0.0.0', port=8082, debug=debug, threaded=True)
I was able to achieve this by using multiprocessing.dummy.Pool. After using threading.Thread, it proved unhelpful as Flask would still wait for the thread to finish (even with t.daemon = True)
I achieved the result of returning a status code before the long-running task like such:
#!/usr/bin/env python
import time
from flask import Flask, jsonify, request
from multiprocessing.dummy import Pool
debug = True
app = Flask(__name__)
pool = Pool(10)
def compile_metadata(data):
print("Starting long running process...")
print(data['user']['email'])
time.sleep(5)
print("Process ended!")
#app.route('/', methods=['POST'])
def webhook():
data = request.json
pool.apply_async(compile_metadata, [data])
return jsonify({"success": True}), 202
if __name__ == "__main__":
app.run(host='0.0.0.0', port=8082, debug=debug, threaded=True)
When you want to return a response from the server quickly, and still do some time consuming work, generally you should use some sort of shared storage like Redis to quickly store all the stuff you need, then return your status code. So the request gets served very quickly.
And have a separate server routinely work that semantic job queue to do the time consuming work. And then remove the job from the queue once the work is done. Perhaps storing the final result in shared storage as well. This is the normal approach, and it scales very well. For example, if your job queue grows too fast for a single server to keep up with, you can add more servers to work that shared queue.
But even if you don't need scalability, it's a very simple design to understand, implement, and debug. If you ever get an unexpected spike in request load, it just means that your separate server will probably be chugging away all night long. And you have peace of mind that if your servers shut down, you won't lose any unfinished work because they're safe in the shared storage.
But if you have one server do everything, performing the long running tasks asynchronously in the background, I guess maybe just make sure that the background work is happening like this:
------------ Serving Responses
---- Background Work
And not like this:
---- ---- Serving Responses
---- Background Work
Otherwise it would be possible that if the server is performing some block of work in the background, it might be unresponsive to a new request, depending on how long that time consuming work takes (even under very little request load). But if the client times out and retries, I think you're still safe from performing double work. But you're not safe from losing unfinished jobs.