Python Flask website from endless data loop? - python

I've built some websites using Flask before, including one which used websockets, but this time I'm not sure how to begin.
I currently have an endless loop in Python which gets sensor data from a ZeroMQ socket. It roughly looks like this:
import zeromq
socket = zeromq.create_socket()
while True:
data_dict = socket.receive_json()
print data_dict # {'temperature': 34.6, 'speed': 12.8, etc.}
I now want to create a dashboard showing the incoming sensor data in real time in some nice charts. Since it's in Python and I'm familiar with Flask and websockets I would like to use that.
The websites I built before were basic request/reply based ones though. How on earth would I create a Flask website from a continuous loop?

The Web page will only be interested on the latest value within a reasonable interval from the user's point of view..., say, 3 seconds, so you can retrieve values in the background using a separate thread.
This is an example of how to use the threading module to update a latest value in the background:
import threading
import random
import time
_last_value = None
def get_last_value():
return _last_value
def retrieve_value():
global _last_value
while True:
_last_value = random.randint(1, 100)
time.sleep(3)
threading.Thread(target=retrieve_value, daemon=True).start()
for i in range(20):
print(i, get_last_value())
time.sleep(1)
In your case, it would be something like:
import threading
import zeromq
_socket = zeromq.create_socket()
_last_data_dict = {}
def get_latest_data():
return _last_data_dict
def retrieve_value():
global _last_data_dict
while True:
_last_data_dict = _socket.receive_json()
threading.Thread(target=retrieve_value, daemon=True).start()

Basically, what you need is some form of storage two processes can access at the same time.
If you don't want to leave the comfort of a single python executable, you should look into threading:
https://docs.python.org/2/library/thread.html
Otherwise, you could write two different python scripts (one for sensor readout, one for flask), let the one write into a file and the next one reading from it (or use a pipe in Linux, no idea what Windows offers), and run both processes at the same time and let your OS handle the "threading".
The second approach has the advantage of your OS taking care of performance, but you loose a lot of freedom in locking and reading the file. There may be some weird behavior if your server reads in the instant your sensor-script writes, but I did similar things without problems and I dimly recall that an OS should take care of consistent file-states whenever it's read or written to.

Related

speeding up urlib.urlretrieve

I am downloading pictures from the internet, and as it turns out, I need to download lots of pictures. I am using a version of the following code fragment (actually looping through the links I intend to download and downloading the pictures :
import urllib
urllib.urlretrieve(link, filename)
I am downloading roughly 1000 pictures every 15 minutes, which is awfully slow based on the number of pictures I need to download.
For efficiency, I set a timeout every 5 seconds (still many downloads last much longer):
import socket
socket.setdefaulttimeout(5)
Besides running a job on a computer cluster to parallelize downloads, is there a way to make the picture download faster / more efficient?
my code above was very naive as I did not take advantage of multi-threading. It obviously takes for url requests to be responded but there is no reason why the computer cannot make further requests while the proxy server responds.
Doing the following adjustments, you can improve efficiency by 10x - and there are further ways for improving efficiency, with packages such as scrapy.
To add multi-threading, do something like the following, using the multiprocessing package:
1) encapsulate the url retrieving in a function:
import import urllib.request
def geturl(link,i):
try:
urllib.request.urlretrieve(link, str(i)+".jpg")
except:
pass
2) then create a collection with all urls as well as names you want for the downloaded pictures:
urls = [url1,url2,url3,urln]
names = [i for i in range(0,len(urls))]
3)Import the Pool class from the multiprocessing package and create an object using such class (obviously you would include all imports in the first line of your code in a real program):
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(100)
then use the pool.starmap() method and pass the function and the arguments of the function.
results = pool.starmap(geturl, zip(links, d))
note: pool.starmap() works only in Python 3
When a program enters I/O wait, the execution is paused so that the kernel can perform the low-level operations associated with the I/O request (this is called a context switch) and is not resumed until the I/O operation is completed.
Context switching is quite a heavy operation. It requires us to save the state of our program (losing any sort of caching we had at the CPU level) and give up the use of the CPU. Later, when we are allowed to run again, we must spend time reinitializing our program on the motherboard and getting ready to resume (of course, all this happens behind the scenes).
With concurrency, on the other hand, we typically have a thing called an “event loop” running that manages what gets to run in our program, and when. In essence, an event loop is simply a list of functions that need to be run. The function at the top of the list gets run, then the next, etc.
The following shows a simple example of an event loop:
from Queue import Queue
from functools import partial
eventloop = None
class EventLoop(Queue):
def start(self):
while True:
function = self.get()
function()
def do_hello():
global eventloop
print "Hello"
eventloop.put(do_world)
def do_world():
global eventloop
print "world"
eventloop.put(do_hello)
if __name__ == "__main__":
eventloop = EventLoop()
eventloop.put(do_hello)
eventloop.start()
If the above seems like something you may use, and you'd also like to see how gevent, tornado, and AsyncIO, can help with your issue, then head out to your (University) library, check out High Performance Python by Micha Gorelick and Ian Ozsvald, and read pp. 181-202.
Note: above code and text are from the book mentioned.

Python - Executing Functions During a raw_input() Prompt [duplicate]

I'm trying to write a simple Python IRC client. So far I can read data, and I can send data back to the client if it automated. I'm getting the data in a while True, which means that I cannot enter text while at the same time reading data. How can I enter text in the console, that only gets sent when I press enter, while at the same time running an infinite loop?
Basic code structure:
while True:
read data
#here is where I want to write data only if it contains '/r' in it
Another way to do it involves threads.
import threading
# define a thread which takes input
class InputThread(threading.Thread):
def __init__(self):
super(InputThread, self).__init__()
self.daemon = True
self.last_user_input = None
def run(self):
while True:
self.last_user_input = input('input something: ')
# do something based on the user input here
# alternatively, let main do something with
# self.last_user_input
# main
it = InputThread()
it.start()
while True:
# do something
# do something with it.last_user_input if you feel like it
What you need is an event loop of some kind.
In Python you have a few options to do that, pick one you like:
Twisted https://twistedmatrix.com/trac/
Asyncio https://docs.python.org/3/library/asyncio.html
gevent http://www.gevent.org/
and so on, there are hundreds of frameworks for this, you could also use any of the GUI frameworks like tkinter or PyQt to get a main event loop.
As comments have said above, you can use threads and a few queues to handle this, or an event based loop, or coroutines or a bunch of other architectures. Depending on your target platforms one or the other might be best. For example on windows the console API is totally different to unix ptys. Especially if you later need stuff like colour output and so on, you might want to ask more specific questions.
You can use a async library (see answer of schlenk) or use https://docs.python.org/2/library/select.html
This module provides access to the select() and poll() functions
available in most operating systems, epoll() available on Linux 2.5+
and kqueue() available on most BSD. Note that on Windows, it only
works for sockets; on other operating systems, it also works for other
file types (in particular, on Unix, it works on pipes). It cannot be
used on regular files to determine whether a file has grown since it
was last read.

Python asyncio read file and execute another activity at intervals

I admit to being very lazy: I need to do this fairly quickly and cannot get my head round the Python3 asyncio module. (Funnily, I found the boost one fairly intuitive.)
I need to readline a file object (a pipe) that will block from time to time. During this, I want to be able fire off another activity at set intervals (say every 30 minutes), regardless of the availability of anything to read from the file.
Can anyone help me with a skeleton to do this using python3 asyncio? (I cannot install a third-party module such as twisted.)
asyncio (as well as other asynchronous libraries like twisted and tornado) doesn't support non-blocking IO for files -- only sockets and pipes are processed asynchronously.
The main reason is: Unix systems have no good way to process files. Say, on Linux any file read is blocking operation.
See also https://groups.google.com/forum/#!topic/python-tulip/MvpkQeetWZA
UPD.
For schedule periodic activity I guess to use asyncio.Task:
#asyncio.coroutine
def periodic(reader, delay):
data = yield from reader.read_exactly(100) # read 100 bytes
yield from asyncio.sleep(delay)
task = asyncio.Task(reader, 30*60)
Snippet assumes reader is asyncio.StreamReader instance.

Building an HTTP API for continuously running python process

TL;DR: I have a beautifully crafted, continuously running piece of Python code controlling and reading out a physics experiment. Now I want to add an HTTP API.
I have written a module which controls the hardware using USB. I can script several types of autonomously operating experiments, but I'd like to control my running experiment over the internet. I like the idea of an HTTP API, and have implemented a proof-of-concept using Flask's development server.
The experiment runs as a single process claiming the USB connection and periodically (every 16 ms) all data is read out. This process can write hardware settings and commands, and reads data and command responses.
I have a few problems choosing the 'correct' way to communicate with this process. It works if the HTTP server only has a single worker. Then, I can use python's multiprocessing.Pipe for communication. Using more-or-less low-level sockets (or things like zeromq) should work, even for request/response, but I have to implement some sort of protocol: send {'cmd': 'set_voltage', 'value': 900} instead of calling hardware.set_voltage(800) (which I can use in the stand-alone scripts). I can use some sort of RPC, but as far as I know they all (SimpleXMLRPCServer, Pyro) use some sort of event loop for the 'server', in this case the process running the experiment, to process requests. But I can't have an event loop waiting for incoming requests; it should be reading out my hardware! I googled around quite a bit, but however I try to rephrase my question, I end up with Celery as the answer, which mostly fires off one job after another, but isn't really about communicating with a long-running process.
I'm confused. I can get this to work, but I fear I'll be reinventing a few wheels. I just want to launch my app in the terminal, open a web browser from anywhere, and monitor and control my experiment.
Update: The following code is a basic example of using the module:
from pysparc.muonlab.muonlab_ii import MuonlabII
muonlab = MuonlabII()
muonlab.select_lifetime_measurement()
muonlab.set_pmt1_voltage(900)
muonlab.set_pmt1_threshold(500)
lifetimes = []
while True:
data = muonlab.read_lifetime_data()
if data:
print "Muon decays detected with lifetimes", data
lifetimes.extend(data)
The module lives at https://github.com/HiSPARC/pysparc/tree/master/pysparc/muonlab.
My current implementation of the HTTP API lives at https://github.com/HiSPARC/pysparc/blob/master/bin/muonlab_with_http_api.
I'm pretty happy with the module (with lots of tests) but the HTTP API runs using Flask's single-threaded development server (which the documentation and the internet tells me is a bad idea) and passes dictionaries through a Pipe as some sort of IPC. I'd love to be able to do something like this in the above script:
while True:
data = muonlab.read_lifetime_data()
if data:
print "Muon decays detected with lifetimes", data
lifetimes.extend(data)
process_remote_requests()
where process_remote_requests is a fairly short function to call the muonlab instance or return data. Then, in my Flask views, I'd have something like:
muonlab = RemoteMuonlab()
#app.route('/pmt1_voltage', methods=['GET', 'PUT'])
def get_data():
if request.method == 'PUT':
voltage = request.form['voltage']
muonlab.set_pmt1_voltage(voltage)
else:
voltage = muonlab.get_pmt1_voltage()
return jsonify(voltage=voltage)
Getting the measurement data from the app is perhaps less of a problem, since I could store that in SQLite or something else that handles concurrent access.
But... you do have an IO loop; it runs every 16ms.
You can use BaseHTTPServer.HTTPServer in such a case; just set the timeout attribute to something small. bascially...
class XmlRPCApi:
def do_something(self):
print "doing something"
server = SimpleXMLRPCServer(("localhost", 8000))
server.register_instance(XMLRpcAPI())
server.timeout = 0
while True:
sleep(0.016)
do_normal_thing()
x.handle_request()
Edit: python has a built in server, also built on BaseHTTPServer, capable of serving a flask app. since flask.Flask() happens to be a wsgi compliant application, your process_remote_requests() should look like this:
import wsgiref.simple_server
remote_server = wsgire.simple_server('localhost', 8000, app)
# app here is just your Flask() application!
# as before, set timeout to zero so that you can go right back
# to your event loop if there are no requests to handle
remote_server.timeout = 0
def process_remote_requests():
remote_server.handle_request()
This works well enough if you have only short running requests; but if you need to handle requests that may possibly take longer than your event loop's normal polling interval, or if you need to handle more requests than you have polls per unit of time, then you can't use this approach, exactly.
You don't necessarily need to fork off another process, though, You can potentially get by using a pool of workers in another thread. roughly:
import threading
import wsgiref.simple_server
remote_server = wsgire.simple_server('localhost', 8000, app)
POOL_SIZE = 10 # or some other value.
pool = [threading.Thread(target=remote_server.serve_forever) for dummy in xrange(POOL_SIZE)]
for thread in pool:
thread.daemon = True
thread.start()
while True:
pass # normal experiment processing here; don't handle requests in this thread.
However; this approach has one major shortcoming, you now have to deal with concurrency! It's not safe to manipulate your program state as freely as you could with the above loop, since you might be, concurrently manipulating that same state in the main thread (or another http server thread). It's up to you to know when this is valid, wrapping each resource with some sort of mutex lock or whatever is appropriate.

Pre-generating GUIDs for use in python?

I have a python program that needs to generate several guids and hand them back with some other data to a client over the network. It may be hit with a lot of requests in a short time period and I would like the latency to be as low as reasonably possible.
Ideally, rather than generating new guids on the fly as the client waits for a response, I would rather be bulk-generating a list of guids in the background that is continually replenished so that I always have pre-generated ones ready to hand out.
I am using the uuid module in python on linux. I understand that this is using the uuidd daemon to get uuids. Does uuidd already take care of pre-genreating uuids so that it always has some ready? From the documentation it appears that it does not.
Is there some setting in python or with uuidd to get it to do this automatically? Is there a more elegant approach then manually creating a background thread in my program that maintains a list of uuids?
Are you certain that the uuid module would in fact be too slow to handle the requests you expect in a timely manner? I would be very surprised if UUID generation accounted for a bottleneck in your application.
I would first build the application to simply use the uuid module and then if you find that this module is in fact slowing things down you should investigate a way to keep a pre-generated list of UUIDs around.
I have tested the performance of the uuid module for generating uuids:
>>> import timeit
>>> timer=timeit.Timer('uuid.uuid1()','import uuid')
>>> timer.repeat(3, 10000)
[0.84600019454956055, 0.8469998836517334, 0.84400010108947754]
How many do you need? Is 10000 per second not enough?
Suppose you have a thread to keep topping up a pool of uuid's.
Here is a very simple version
import uuid,threading,time
class UUID_Pool(threading.Thread):
pool_size=10000
def __init__(self):
super(UUID_Pool,self).__init__()
self.daemon=True
self.uuid_pool=set(uuid.uuid1() for x in range(self.pool_size))
def run(self):
while True:
while len(self.uuid_pool) < self.pool_size:
self.uuid_pool.add(uuid.uuid1())
time.sleep(0.01) # top up the pool 100 times/sec
uuid_pool = UUID_Pool()
uuid_pool.start()
get_uuid = uuid_pool.uuid_pool.pop # make a local binding
uuid=get_uuid() # ~60x faster than uuid.uuid1() on my computer
You'd also need to handle the case where your burst empties the pool by using uuid's faster than the thread can generate them.

Categories

Resources