I am trying to rewrite python script using asyncio (async & await ).
The script read data from a file and then check every line against a webservice using requests library. I learned that I have to change requests to ‘httpx’ library that is async enabled, but didn’t grasp the idea of how to use asyncio and hope to learn from this example.
After changing requests to httpx the script looks like this.
#!/usr/bin/env python3
import sys
import json
import httpx
def main( ):
with open('output','w') as res:
for rr in sys.stdin:
r=rr.rstrip().split();
rsp=httpx.post('http://localhost:8080/api/v1/service',
->>>json={'f1': r[0], 'f2': r[1] })
json_result=json.loads(rsp.text)
if json_result['error'] :
print (rr.rstrip(),file=res);
print (rsp.text,file=res,flush=True)
main()
I can run several instances of this script as the service is capable of processing several requests in parallel. I would like to change the script so that it can send several requests in parallel itself.
I would like to keep overall structure of the script in regard to a “for loop”, as in more complicated scenario input data will be read from a database (for record in query:). If possible I would like reading input data to pause if the preconfigured number of outstanding requests is reached and resumed when number of such requests is dropped.
Is it possible? What is the best way to accomplish that?
Related
I am trying to load a JSON from a http address using dask and then put it into a dataframe in order to plot some experiment data with dash. The goal is to fetch the data in realtime and show realtime plots of the machines (example data can be found here: http://aav.rz-berlin.mpg.de:17668/retrieval/data/getData.json?pv=FHIMP%3AHeDrop%3AForepressure_Droplet_Src)
This is what I tried:
import json
import dask.bag as db
mybag = db.read_text("http://aav.rz-berlin.mpg.de:17668/retrieval/data/getData.json?pv=FHIMP%3AHeDrop%3AForepressure_Droplet_Src").map(json.loads)
mybag.to_dataframe()
but mybag.to_dataframe() freezes my code.
I also tried:
import dask.dataframe as dd
dd.read_json('url')
which returned "ValueError: Expected object or value". So according to the error message, there's no JSON at all. Does the problem derive from the JSON consisting of a meta and a data field?
Sidequestion: Does my system even make sense like this if I want to provide a Webapp for monitoring? It's my first time working with Dash and Dask. Dask basically does the work of a backend here if I understood it right and there's no need to have it standing on it's own if I have an API that's sending me JSON data.
Dask is not, generally, a realtime/streaming analysis engine. Mostly, things are expected to be functional, that running the same task with the same arguments is guaranteed to produce the same output - clearly not the case here.
Realtime analysis can be produced by the client.submit API, which creates arbitrary tasks at the time of invocation. However, it still requires that the task be finite in order for other tasks to then take the result and operate on it. Reading the from given URL never ends.
If you want to use dask in conjunction with streaming data, or generally want to work on streaming data in python, you might want to try streamz. The sources listed are mostly polling (repeat some action on a timer to check for new events) or driven by inbound events (like a server waiting for connections). You could easily make a source for the HTTP endpoint, though:
from streamz import Source, Stream
import aiohttp
#Stream.register_api(staticmethod)
class from_http(Source):
def __init__(self, url, chunk_size=1024, **kwargs):
self.url = url
self.chunk_size = chunk_size
super().__init__(**kwargs)
async def run(self):
async with aiohttp.ClientSession() as session:
async with session.get(self.url) as resp:
async for chunk in resp.content.iter_chunked(self.chunk_size):
await self.emit(chunk, asynchronous=True)
The output of this streaming node is chunks of binary data - it would be up to you to write downstream nodes which can parse this into JSON (since the chunk boundaries won't respect the JSON record terminators).
A small note on the error in the second snippet in the question, the string is passed to dd.read_json, which triggers the error:
import dask.dataframe as dd
dd.read_json('url') # note that a string is passed here
I want to use excel as a front-end, which will continuously update multiple tables in real-time (every 2 seconds). I have a python program that prepares all the data tables, but it is running on some other server. I am storing python data in a Redis cache as a key-value pair.
E.g.
'bitcoin':'bitcoin,2021-04-23 14:23:23,49788,dollars,4068890,INR,100000'
'doge':'doge,2021-04-23 14:23:23,0.2334,dollars,21,INR,1000'
But, now I also want to use the same data in excel. Furthermore, I found that I can use excel RTD functions to update data in excel in real-time. But, I have no idea how will python send data to the excel RTD function.
As per my understanding, I need to set up some RTD server in python and that will inject data to the excel RTD function. But how ?, I am not quite sure. Please help me with the required infrastructure or any code examples in python.
Note: I cannot use xlwings and pyxll(paid) for some reasons.
Thanking you in advance.
You can do this with xlOil which is free (disclaimer: I wrote it). It allows you to write an async generator function in python which is presented to Excel as an RTD function.
As an example, the following code defines an RTD worksheet function pyGetUrl which will fetch a URL every N seconds. I'm not familiar with Redis, but I can see several async python client libraries which should be able to replace aiohttp in the below to access your data.
import aiohttp
import ssl
import xloil as xlo
# This is the implementation: it pulls the URL and returns the response as text
async def _getUrlImpl(url):
async with aiohttp.ClientSession() as session:
async with session.get(url, ssl=ssl.SSLContext()) as response:
return await response.text()
#
# We declare an async gen function which calls the implementation either once,
# or at regular intervals
#
#xlo.func(local=False, rtd=True)
async def pyGetUrl(url, seconds=0):
yield await _getUrlImpl(url)
while seconds > 0:
await asyncio.sleep(seconds)
yield await _getUrlImpl(url)
Rolling your own RTD server using Excel's COM interface and pywin32 may also be viable, you can look at this python example to see it done. You'll need to add a ProgId and CLSID to the windows registry so Excel can find your server; the example show you how to do this. Fair warning: this recent questioner was unable to make the example work. I also tried the example and had even less luck, so some debugging may be required.
I've built some websites using Flask before, including one which used websockets, but this time I'm not sure how to begin.
I currently have an endless loop in Python which gets sensor data from a ZeroMQ socket. It roughly looks like this:
import zeromq
socket = zeromq.create_socket()
while True:
data_dict = socket.receive_json()
print data_dict # {'temperature': 34.6, 'speed': 12.8, etc.}
I now want to create a dashboard showing the incoming sensor data in real time in some nice charts. Since it's in Python and I'm familiar with Flask and websockets I would like to use that.
The websites I built before were basic request/reply based ones though. How on earth would I create a Flask website from a continuous loop?
The Web page will only be interested on the latest value within a reasonable interval from the user's point of view..., say, 3 seconds, so you can retrieve values in the background using a separate thread.
This is an example of how to use the threading module to update a latest value in the background:
import threading
import random
import time
_last_value = None
def get_last_value():
return _last_value
def retrieve_value():
global _last_value
while True:
_last_value = random.randint(1, 100)
time.sleep(3)
threading.Thread(target=retrieve_value, daemon=True).start()
for i in range(20):
print(i, get_last_value())
time.sleep(1)
In your case, it would be something like:
import threading
import zeromq
_socket = zeromq.create_socket()
_last_data_dict = {}
def get_latest_data():
return _last_data_dict
def retrieve_value():
global _last_data_dict
while True:
_last_data_dict = _socket.receive_json()
threading.Thread(target=retrieve_value, daemon=True).start()
Basically, what you need is some form of storage two processes can access at the same time.
If you don't want to leave the comfort of a single python executable, you should look into threading:
https://docs.python.org/2/library/thread.html
Otherwise, you could write two different python scripts (one for sensor readout, one for flask), let the one write into a file and the next one reading from it (or use a pipe in Linux, no idea what Windows offers), and run both processes at the same time and let your OS handle the "threading".
The second approach has the advantage of your OS taking care of performance, but you loose a lot of freedom in locking and reading the file. There may be some weird behavior if your server reads in the instant your sensor-script writes, but I did similar things without problems and I dimly recall that an OS should take care of consistent file-states whenever it's read or written to.
I have a legacy code base of many specialized web scrapers, all relying on making synchronous requests to web servers, running while True with a sleep statement at the end. This code base is in Python 2, and it's likely not feasible to move to Python 3 and take advantage of Python 3 async features.
Ideally I'd like to rewrite this set of many individual web scraping scripts as a single pipeline featuring the following
asynchronous web requests (in Python 2)
asynchronous writes to csv
non-blocking sleep statements so that each individual page is scraped at a set frequency
This seems like an easy problem in Python 3 between asyncio and coroutines generally. Can someone recommend how I'd do this/some example resources for doing this in Python 2.
Thanks for any advice.
What you could do is put each function in a different file then when you want them to all go you can do.
import os
os.system('python file1.py')
os.system('python file2.py')
os.system('python file3.py')
os.system('python file4.py')
I admit to being very lazy: I need to do this fairly quickly and cannot get my head round the Python3 asyncio module. (Funnily, I found the boost one fairly intuitive.)
I need to readline a file object (a pipe) that will block from time to time. During this, I want to be able fire off another activity at set intervals (say every 30 minutes), regardless of the availability of anything to read from the file.
Can anyone help me with a skeleton to do this using python3 asyncio? (I cannot install a third-party module such as twisted.)
asyncio (as well as other asynchronous libraries like twisted and tornado) doesn't support non-blocking IO for files -- only sockets and pipes are processed asynchronously.
The main reason is: Unix systems have no good way to process files. Say, on Linux any file read is blocking operation.
See also https://groups.google.com/forum/#!topic/python-tulip/MvpkQeetWZA
UPD.
For schedule periodic activity I guess to use asyncio.Task:
#asyncio.coroutine
def periodic(reader, delay):
data = yield from reader.read_exactly(100) # read 100 bytes
yield from asyncio.sleep(delay)
task = asyncio.Task(reader, 30*60)
Snippet assumes reader is asyncio.StreamReader instance.