I have a need to async read StdIn in order to get messages (json terminated by \r\n) and after processing async write updated message to StdOut.
At the moment I am doing it synchronous like:
class SyncIOStdInOut():
def write(self, payload: str):
sys.stdout.write(payload)
sys.stdout.write('\r\n')
sys.stdout.flush()
def read(self) -> str:
payload=sys.stdin.readline()
return payload
How to do the same but asynchronously?
Here's an example of echo stdin to stdout using asyncio streams (for Unix).
import asyncio
import sys
async def connect_stdin_stdout():
loop = asyncio.get_event_loop()
reader = asyncio.StreamReader()
protocol = asyncio.StreamReaderProtocol(reader)
await loop.connect_read_pipe(lambda: protocol, sys.stdin)
w_transport, w_protocol = await loop.connect_write_pipe(asyncio.streams.FlowControlMixin, sys.stdout)
writer = asyncio.StreamWriter(w_transport, w_protocol, reader, loop)
return reader, writer
async def main():
reader, writer = await connect_stdin_stdout()
while True:
res = await reader.read(100)
if not res:
break
writer.write(res)
if __name__ == "__main__":
asyncio.run(main())
As a ready-to-use solution, you could use aioconsole library. It implements a similar approach, but also provide additional useful asynchronous equivalents to input, print, exec and code.interact:
from aioconsole import get_standard_streams
async def main():
reader, writer = await get_standard_streams()
Update:
Let's try to figure out how the function connect_stdin_stdout works.
Get the current event loop:
loop = asyncio.get_event_loop()
Create StreamReader instance.
reader = asyncio.StreamReader()
Generally, StreamReader/StreamWriter classes are not intended to be directly instantiated and should only be used as a result of functions such as open_connection() and start_server().
StreamReader provides a buffered asynchronous interface to some data stream. Some source(library code) calls its functions such as feed_data, feed_eof, the data is buffered and can be read using the documented interface coroutine read(), readline(), and etc.
Create StreamReaderProtocol instance.
protocol = asyncio.StreamReaderProtocol(reader)
This class is derived from asyncio.Protocol and FlowControlMixin and helps to adapt between Protocol and StreamReader. It overrides such Protocol methods as data_received, eof_received and calls StreamReader methods feed_data.
Register standard input stream stdin in the event loop.
await loop.connect_read_pipe(lambda: protocol, sys.stdin)
The connect_read_pipe function takes as a pipe parameter a file-like object. stdin is a file-like object. From now, all data read from the stdin will fall into the StreamReaderProtocol and then pass into StreamReader
Register standard output stream stdout in the event loop.
w_transport, w_protocol = await loop.connect_write_pipe(FlowControlMixin, sys.stdout)
In connect_write_pipe you need to pass a protocol factory that creates protocol instances that implement flow control logic for StreamWriter.drain(). This logic is implemented in the class FlowControlMixin. Also StreamReaderProtocol inherited from it.
Create StreamWriter instance.
writer = asyncio.StreamWriter(w_transport, w_protocol, reader, loop)
This class forwards the data passed to it using functions write(), writelines() and etc. to the underlying transport.
protocol is used to support the drain() function to wait for the moment that the underlying transport has flushed its internal buffer and is available for writing again.
reader is an optional parameter and can be None, it is also used to support the drain() function, at the start of this function it is checked if an exception was set for the reader, for example, due to a connection lost (relevant for sockets and bidirectional connections), then drain() will also throw an exception.
You can read more about StreamWriter and drain() function in this great answer.
Update 2:
To read lines with \r\n separator readuntil can be used
This is another way you can async read from stdin (reads a single line at a time).
async def async_read_stdin()->str:
loop = asyncio.get_event_loop()
return await loop.run_in_executor(None, sys.stdin.readline)
Related
I need to read from a StreamReader when a new connection is made in my Server instance created by using asyncio.start_server without consuming the internal buffer.
The first thing I tried to accomplish is to create a deep copy of the StreamReader object passed to the callback that handles the connection but it throws an error.
The second thing is to create a StreamReader from scratch by copying the buffer and the transport of the passed StreamReader
async def _handle_req(self, reader: StreamReader, writer: StreamWriter):
reader_copy = StreamReader()
reader_copy._buffer = deepcopy(reader._buffer)
reader_copy.set_transport(reader._transport)
However when it comes to invoke readline from reader_copy it throws an IncompleteReadError exception. After doing some debugging I found that the exception is thrown by the internal readuntil function of streams.py StreamReader class (line 626) because for some reason the internal self._eof flag is set to True. This happens at line 515 in the _wait_for_data method
...
self._waiter = self._loop.create_future()
try:
await self._waiter
finally:
self._waiter = None
because the await self._waiter for some reason set it to True.
Is there an alternative solution or a way to create a copy of the StreamReader?
I have a large file, with a JSON record on each line. I'm writing a script to upload a subset of these records to CouchDB via the API, and experimenting with different approaches to see what works the fastest. Here's what I've found to work fastest to slowest (on a CouchDB instance on my localhost):
Read each needed record into memory. After all records are in memory, generate an upload coroutine for each record, and gather/run all the coroutines at once
Synchronously read file and when a needed record is encountered, synchronously upload
Use aiofiles to read the file, and when a needed record is encountered, asynchronously update
Approach #1 is much faster than the other two (about twice as fast). I am confused why approach #2 is faster than #3, especially in contrast to this example here, which takes half as much time to run asynchronously than synchronously (sync code not provided, had to rewrite it myself). Is it the context switching from file i/o to HTTP i/o, especially with file reads ocurring much more often than API uploads?
For additional illustration, here's some Python pseudo-code that represents each approach:
Approach 1 - Sync File IO, Async HTTP IO
import json
import asyncio
import aiohttp
records = []
with open('records.txt', 'r') as record_file:
for line in record_file:
record = json.loads(line)
if valid(record):
records.append(record)
async def batch_upload(records):
async with aiohttp.ClientSession() as session:
tasks = []
for record in records:
task = async_upload(record, session)
tasks.append(task)
await asyncio.gather(*tasks)
asyncio.run(batch_upload(properties))
Approach 2 - Sync File IO, Sync HTTP IO
import json
with open('records.txt', 'r') as record_file:
for line in record_file:
record = json.loads(line)
if valid(record):
sync_upload(record)
Approach 3 - Async File IO, Async HTTP IO
import json
import asyncio
import aiohttp
import aiofiles
async def batch_upload()
async with aiohttp.ClientSession() as session:
async with open('records.txt', 'r') as record_file:
line = await record_file.readline()
while line:
record = json.loads(line)
if valid(record):
await async_upload(record, session)
line = await record_file.readline()
asyncio.run(batch_upload())
The file I'm developing this with is about 1.3 GB, with 100000 records total, 691 of which I upload. Each upload begins with a GET request to see if the record already exists in CouchDB. If it does, then a PUT is performed to update the CouchDB record with any new information; if it doesn't, then a the record is POSTed to the db. So, each upload consists of two API requests. For dev purposes, I'm only creating records, so I run the GET and POST requests, 1382 API calls total.
Approach #1 takes about 17 seconds, approach #2 takes about 33 seconds, and approach #3 takes about 42 seconds.
your code uses async but it does the work synchronously and in this case it will be slower than the sync approach. Asyc won't speed up the execution if not constructed/used effectively.
You can create 2 coroutines and make them run in parallel.. perhaps that speeds up the operation.
Example:
#!/usr/bin/env python3
import asyncio
async def upload(event, queue):
# This logic is not so correct when it comes to shutdown,
# but gives the idea
while not event.is_set():
record = await queue.get()
print(f'uploading record : {record}')
return
async def read(event, queue):
# dummy logic : instead read here and populate the queue.
for i in range(1, 10):
await queue.put(i)
# Initiate shutdown..
event.set()
async def main():
event = asyncio.Event()
queue = asyncio.Queue()
uploader = asyncio.create_task(upload(event, queue))
reader = asyncio.create_task(read(event, queue))
tasks = [uploader, reader]
await asyncio.gather(*tasks)
if __name__ == '__main__':
asyncio.run(main())
I'm setting a python websocket client that should make send and receive request's as described:
Connect to the websocket.
Send the request to get current timestamp.
Receive back the current timestamp.
Compare times , if times are synced continue, if not reply ("not_synced!").
Send the machine name (in this case it is defined in the config file)
The server response back with a timestamp in the future when it is
Expecting a ping, the time is saved in config file
Close connection and wait for current time to match the time in the future!
By now, I have perfectly created functions for reading/saving in strings in the config file, comparing the received time with current time.
The only issue I can`t figure out how to solve it's the communication to the server, actually I want to define one function that should do all the communication through.
Tried defining function without asyncio, I couldn't return received message.
While using asyncio, I couldn't pass the argument in function (actually the message string!)
import asyncio
import websockets
async def connect(msg):
async with websockets.connect("ws://connect.websocket.in /xnode?room_id=19210") as socket: # the opencfg function reads a file, in this case, line 4 of config file where url is stored
await socket.send(msg)
result =await socket.recv()
return result
asyncio.get_event_loop().run_until_complete(connect())
def connect2(msg):
soc= websockets.connect("ws://connect.websocket.in /xnode?room_id=19210")
soc.send(msg)
result=soc.recv()
return result
print(connect2("gettime"))
If you would try to send "gettime" , you will receive back the current timestamp, and after sending the "|online" you should receive back a value that is equal to current timestamp + 10.
You have the websocketurl so try it for yourself.
I changed your code to use asynio.gather to get the return value and passed "gettime" to the function:
import asyncio
import websockets
address = "ws://connect.websocket.in/xnode?room_id=19210"
async def connect(msg):
async with websockets.connect(address) as socket:
await socket.send(msg)
result = await socket.recv()
return result
result = asyncio.get_event_loop().run_until_complete(asyncio.gather(connect("gettime")))
print(result)
Output
['1564626191']
You can reuse the code by putting it into a function definition:
def get_command(command):
loop = asyncio.get_event_loop()
result = loop.run_until_complete(asyncio.gather(connect(command)))
return result
result = get_command("gettime")
print(result)
I'm experimenting with Content-Disposition on tornado. My code for reading and writing of file looks like this:
with open(file_name, 'rb') as f:
while True:
data = f.read(4096)
if not data:
break
self.write(data)
self.finish()
I expected the memory usage to be consistent since it is not reading everything at once. But the resource monitor shows:
In use Available
12.7 GB 2.5GB
Sometimes it will even BSOD my computer...
How do I download a large file (say 12GB in size)?
tornado 6.0 provide a api download large file may use like below:
import aiofiles
async def get(self):
self.set_header('Content-Type', 'application/octet-stream')
# the aiofiles use thread pool,not real asynchronous
async with aiofiles.open(r"F:\test.xyz","rb") as f:
while True:
data = await f.read(1024)
if not data:
break
self.write(data)
# flush method call is import,it makes low memory occupy,beacuse it send it out timely
self.flush()
just use aiofiles but not use the self.flush() may not solve the trouble.
just look at the method self.write():
def write(self, chunk: Union[str, bytes, dict]) -> None:
"""Writes the given chunk to the output buffer.
To write the output to the network, use the `flush()` method below.
If the given chunk is a dictionary, we write it as JSON and set
the Content-Type of the response to be ``application/json``.
(if you want to send JSON as a different ``Content-Type``, call
``set_header`` *after* calling ``write()``).
Note that lists are not converted to JSON because of a potential
cross-site security vulnerability. All JSON output should be
wrapped in a dictionary. More details at
http://haacked.com/archive/2009/06/25/json-hijacking.aspx/ and
https://github.com/facebook/tornado/issues/1009
"""
if self._finished:
raise RuntimeError("Cannot write() after finish()")
if not isinstance(chunk, (bytes, unicode_type, dict)):
message = "write() only accepts bytes, unicode, and dict objects"
if isinstance(chunk, list):
message += (
". Lists not accepted for security reasons; see "
+ "http://www.tornadoweb.org/en/stable/web.html#tornado.web.RequestHandler.write" # noqa: E501
)
raise TypeError(message)
if isinstance(chunk, dict):
chunk = escape.json_encode(chunk)
self.set_header("Content-Type", "application/json; charset=UTF-8")
chunk = utf8(chunk)
self._write_buffer.append(chunk)
at the ending of the code:it just append the data you want send to the _write_buffer.
the data would be sent when the get or post method has finished and the finish method be called.
the document about tornado 's handler flush is :
http://www.tornadoweb.org/en/stable/web.html?highlight=flush#tornado.web.RequestHandler.flush
In my application, I'm trying to create a handler that streams large files out to the client. These files are created by another module (tarfile to be exact).
What I want is a file-like object that instead of writing to a socket or an actual file on the disk, proxies to the RequestHandler.write method.
Here's what my current naive implementation looks like:
import tornado.gen
import tornado.ioloop
import tornado.web
class HandlerFileObject(object):
def __init__(self, handler):
self.handler = handler
#tornado.gen.coroutine
def write(self, data):
self.handler.write(data)
yield self.handler.flush()
def close(self):
self.handler.finish()
class DownloadHandler(tornado.web.RequestHandler):
def get(self):
self.set_status(200)
self.set_header("Content-Type", "application/octet-stream")
fp = HandlerFileObject(self)
with open('/dev/zero', 'rb') as devzero:
for _ in range(100*1024):
fp.write(devzero.read(1024))
fp.close()
if __name__ == '__main__':
app = tornado.web.Application([
(r"/", DownloadHandler)
])
app.listen(8888)
tornado.ioloop.IOLoop.instance().start()
It works, but the problem is that all of the data is loaded into RAM and is not released until I stop the application.
What would be a better/more idiomatic/resourceful way of going about this?
get() also needs to be a coroutine and yield when calling fp.write(). By making write a coroutine you've made your object less file-like - most callers will simply ignore its return value, masking exceptions and interfering with flow control. The file-like interface is synchronous so you'll probably need to do these operations in other threads so you can block them as needed.