Embed an http server in a long running python process - python

Using python 3. Lets say I am processing a loop around a large database query. Effectively my loop around the result set cursor can be a billion iterations.
I'd like to give a user an ability to call out to an http server embedded in the same process that would give some statistics on the progress of the query.
So far I have tried this with IOLoop using Tornado http server. The problem is that I have to basically transfer the control back to IOLoop on some number of rows to get the HTTP request to get serviced. That seems wasteful. Transferring that control has a price. Tornado would let me support multiple connections, but I don't actually care for that - one connection is fine.
What I would prefer would be to simply interrupt the loop, service the HTTP request and resume.

I guess this is probably open to too many possiblities...but using tornado I've just started an instance on a thread. Threading issues with python aside, it basically does what i want.

Related

Is there any point of using connection pools with a non multi threaded web server?

I have built a webserver written in python using the flask framework and psycopg2 and I have some questions about concurrent processing as it relates to dbs and the server itself. I am using gunicorn to start my app with
web:gunicorn app:app.
From my understanding a webserver such as this processes requests one at a time. So, if someone makes a get or post request to the server the server must finish responding to that get or post request before it can then move on to another request. If this is the case, then why would I need to make more than one connection cursor object? For example, if someone were making a post request that requires me to update something in the db, then my server can't process requests until I return out of that post end point anyway so that one connection object isn't bottle necking anything is it?
Ultimately, I am trying to allow my server to process a large number of requests simultaneously. In order to do so, I think I would first have to make multiple instances of my server, and THEN the connection pool comes into play right? I think in order to make multiple instances of my server (apologies if any terminologies are being used incorrectly here), I would do one of these things:
one way would be to: I would need to use multiple threads and if the machine my application is deployed on in the cloud has multiple cpu cores, then it can do this(?). However, I have read that python does not support "True multithreading" meaning a multi threaded program is not actually running concurrently, it's just switching back and forth between those threads really quickly, so would this really be any different than my set up currently?
the second way: use multiple gunicorn workers, or use multiple dynos. I think this route is the solution here, but I don't understand the theory on how to set this up at all. If I spawn additional gunicorn workers, what is happening behind the scenes? Would this still all run on my heroku application instance? Does the amount of cores I have access to on heroku affect this in anyway? Also, regardless of which way I pick, what would I be looking to change in the app.py code or would the change solely be inside the procfile?
Assuming I manage to set up multithreading or gunicorn workers, how would this then affect the connection pool set up/what should I do in regards to the connection pool? If anyone familiar with this can help provide some theory or explanations or some resources, I would greatly appreciate it. Thanks all!
From my experience with python here's what I've learned...
If you are using multiple threads or async then you need to use a pool or an async connection
If you have multiple processes and your code is strictly synchronous with no threads then a pool is not necessary. You can reuse a single connection for each process since they are not shared between each other.
Threads dont speed up execution speed in python usually since python will only ever run one thread at a time. Though they can help speed if threads need to block.
For web servers the true bottle neck is IO usually, meaning connecting to db or read file or w.e. Multiple process and making those process async gives the greatest performance. Starlette is a async version of Flask... kinda and is usually much faster when setup properly and using async libraries

How much data can a websocket consumer handle?

I've built a simple application using Django Channels where a Consumer receives data from an external service, the consumer than sends this data to some subscribers on another consumer.
I'm new to websockets and i had a little concern: the consumer is receiving a lot of data, in the order of 100 (or more) JSON records per second. At what point should i be worried about this service crashing or running into performance issues? Is there some sort of limit for what i'm doing?
there is not explicit limit, however it is worth nothing that for each instances (open connection) of the consumer you can only process one WS message at once.
So if you have a single websocket connection and are sending lots and lots of WS messages down that connection if the consumer does work on these (eg write them to the db) the queue of messages might fill up and you will get an error.
their are a few solutions to this,
Open multiple ws connections and share out the load
in your consumer before doing any work that will take time put it onto a work queue and have some background tasks (that you do not await) consume it.
For this second option it is probably a good idea to create this background queue in your on_connect method and handle shutting it down/waiting for it to flush everything in the on disconnect method.
--
if you are expecting a massive amount of data and don't want to fork out for a costly (high memory) VM you might be better of using a server that is no written in python.
My suggestion for a python developer would be https://docs.vapor.codes/4.0/websockets/ this is a Swift server framework, Swift linguistically if very close to TypeAnotated python so is easier than other high performance options to pick up for a python dev.

Can I have Python code to continue executing after I call Flask app.run?

I have just started with Python, although I have been programming in other languages over the past 30 years. I wanted to keep my first application simple, so I started out with a little home automation project hosted on a Raspberry Pi.
I got my code to work fine (controlling a valve, reading a flow sensor and showing some data on a display), but when I wanted to add some web interactivity it came to a sudden halt.
Most articles I have found on the subject suggest to use the Flask framework to compose dynamic web pages. I have tried, and understood, the basics of Flask, but I just can't get around the issue that Flask is blocking once I call the "app.run" function. The rest of my python code waits for Flask to return, which never happens. I.e. no more water flow measurement, valve motor steering or display updating.
So, my basic question would be: What tool should I use in order to serve a simple dynamic web page (with very low load, like 1 request / week), in parallel to my applications main tasks (GPIO/Pulse counting)? All this in the resource constrained environment of a Raspberry Pi (3).
If you still suggest Flask (because it seems very close to target), how should I arrange my code to keep handling the real-world events, such as mentioned above?
(This last part might be tough answering without seeing the actual code, but maybe it's possible answering it in a "generic" way? Or pointing to existing examples that I might have missed while searching.)
You're on the right track with multithreading. If your monitoring code runs in a loop, you could define a function like
def monitoring_loop():
while True:
# do the monitoring
Then, before you call app.run(), start a thread that runs that function:
import threading
from wherever import monitoring_loop
monitoring_thread = threading.Thread(target = monitoring_loop)
monitoring_thread.start()
# app.run() and whatever else you want to do
Don't join the thread - you want it to keep running in parallel to your Flask app. If you joined it, it would block the main execution thread until it finished, which would be never, since it's running a while True loop.
To communicate between the monitoring thread and the rest of the program, you could use a queue to pass messages in a thread-safe way between them.
The way I would probably handle this is to split your program into two distinct separately running programs.
One program handles the GPIO monitoring and communication, and the other program is your small Flask server. Since they run as separate processes, they won't block each other.
You can have the two processes communicate through a small database. The GIPO interface can periodically record flow measurements or other relevant data to a table in the database. It can also monitor another table in the database that might serve as a queue for requests.
Your Flask instance can query that same database to get the current statistics to return to the user, and can submit entries to the requests queue based on user input. (If the GIPO process updates that requests queue with the current status, the Flask process can report that back out.)
And as far as what kind of database to use on a little Raspberry Pi, consider sqlite3 which is a very small, lightweight file-based database well supported as a standard library in Python. (It doesn't require running a full "database server" process.)
Good luck with your project, it sounds like fun!
Hi i was trying the connection with dronekit_sitl and i got the same issue , after 30 seconds the connection was closed.To get rid of that , there are 2 solutions:
You use the decorator before_request:in this one you define a method that will handle the connection before each request
You use the decorator before_first_request : in this case the connection will be made once the first request will be called and the you can handle the object in the other route using a global variable
For more information https://pythonise.com/series/learning-flask/python-before-after-request

Do I need celery when I am using gevent?

I am working on a django web app that has functions (say for e.g. sync_files()) that take a long time to return. When I use gevent, my app does not block when sync_file() runs and other clients can connect and interact with the webapp just fine.
My goal is to have the webapp responsive to other clients and not block. I do not expect a zillion users to connect to my webapp (perhaps max 20 connections), and I do not want to set this up to become the next twitter. My app is running on a vps, so I need something light weight.
So in my case listed above, is it redundant to use celery when I am using gevent? Is there a specific advantage to using celery? I prefer not to use celery since it is yet another service that will be running on my machine.
edit: found out that celery can run the worker pool on gevent. I think I am a litle more unsure about the relationship between gevent & celery.
In short you do need a celery.
Even if you use gevent and have concurrency, the problem becomes request timeout. Lets say your task takes 10 minutes to run however the typical request timeout is about up to a minute. So what will happen if you trigger the task directly within a view is that the server will start processing it however after a minute a client (browser) will probably disconnect the connection since it will think the server is offline. As a result, your data can become corrupt since you cannot be guaranteed what will happen when connection will close. Celery solves this because it will trigger a background process which will process the task independent of the view. So the user will get the view response right away and at the same time the server will start processing the task. That is a correct pattern to handle any scenarios which require lots of processing.

Write to CouchDB in the background to improve timing/reliability

I have a high-throughput WSGI app that receives many POSTs per second from a remote server and writes documents to a couchdb server. Currently, the writes to couchdb can only happen between request and response. That means
if writes to couchdb are slow, then the client must sit waiting for a response while we write to the database, and
if there are problems writing to the couchdb server, there is no way to wait a few minutes and retry
Are there any existing solutions to queue up the writes to couch in the background (I'm looking at something like celery, for example), or will I need to roll my own solution?
celery could do that, yes, or any other task queue of a similar nature.
(Alternatively you could go one step lower and use a any message queue server, plus your own independent worker process that consumes from the message queue.)

Categories

Resources