I have a Flask server that accepts HTTP requests from a client. This HTTP server needs to delegate work to a third-party server using a websocket connection (for performance reasons).
I find it hard to wrap my head around how to create a permanent websocket connection that can stay open for HTTP requests. Sending requests to the websocket server in a run-once script works fine and looks like this:
async def send(websocket, payload):
await websocket.send(json.dumps(payload).encode("utf-8"))
async def recv(websocket):
data = await websocket.recv()
return json.loads(data)
async def main(payload):
uri = f"wss://the-third-party-server.com/xyz"
async with websockets.connect(uri) as websocket:
future = send(websocket, payload)
future_r = recv(websocket)
_, output = await asyncio.gather(future, future_r)
return output
asyncio.get_event_loop().run_until_complete(main({...}))
Here, main() establishes a WSS connection and closes it when done, but how can I keep that connection open for incoming HTTP requests, such that I can call main() for each of those without re-establising the WSS connection?
The main problem there is that when you code a web app responding http(s), your code have a "life cycle" that is very peculiar to that: usually you have a "view" function that will get the request data, perform all actions needed to gather the response data and return it.
This "view" function in most web frameworks has to be independent from the rest of the system - it should be able to perform its duty relying on no other data or objects than what it gets when called - which are the request data, and system configurations - that gives the application server (the framework parts designed to actually connect your program to the internet) can choose a variety of ways to serve your program: they may run your view function in several parallel threads, or in several parallel processes, or even in different processes in various containers or physical servers: you application would not need to care about that.
If you want a resource that is available across calls to your view functions, you need to break out of this paradigm. For example, typically, frameworks will want to create a pool of database connections, so that views on the same process can re-use those connections. These database connections are usually supplied by the framework itself, which implements a mechanism for allowing then to be reused, and be available in a transparent way, as needed. You have to recreate a mechanism of the same sort if you want to keep a websocket connection alive.
In a certain way, you need a Python object that can mediate your websocket data behaving like a "server" for your web view functions.
That is simpler to do than it sounds - a special Python class designed to have a single instance per process, which keeps the connections, and is able to send and receive data received from parallel calls without mangling it is enough. A callable that will ensure this instance exists in the current process is enough to work under any strategy configured to serve your app to the web.
If you are using Flask, which does not use asyncio, you get a further complication - you will loose the async-ability inside your views, they will have to wait for the websocket requisition to be completed - it will then be the job of your application server to have your view in different threads or processes to ensure availability. And, it is your job to have the asyncio loop for your websocket running in a separate thread, so that it can make the requests it needs.
Here is some example code.
Please note that apart from using a single websocket per process,
this has no provisions in case of failure of any kind, but,
most important: it does nothing in parallel: all
pairs of send-recv are blocking, as you give no clue of
a mechanism that would allow one to pair each outgoing message
with its response.
import asyncio
import threading
from queue import Queue
class AWebSocket:
instance = None
def __new__(cls, *args, **kw):
if cls.instance:
return cls.instance
return super().__new__(cls, *args, **kw)
def __init__(self, *args, **kw):
cls = self.__class__
if cls.instance:
# init will be called even if new finds the existing instance,
# so we have to check again
return
self.outgoing = Queue()
self.responses = Queue()
self.socket_thread = threading.Thread(target=self.start_socket)
self.socket_thread.start()
def start_socket():
# starts an async loop in a separate thread, and keep
# the web socket running, in this separate thread
asyncio.get_event_loop().run_until_complete(self.core())
def core(self):
self.socket = websockets.connect(uri)
async def _send(self, websocket, payload):
await websocket.send(json.dumps(payload).encode("utf-8"))
async def _recv(self, websocket):
data = await websocket.recv()
return json.loads(data)
async def core(self):
uri = f"wss://the-third-party-server.com/xyz"
async with websockets.connect(uri) as websocket:
self.websocket = websocket
while True:
# This code is as you wrote it:
# it essentially blocks until a message is sent
# and the answer is received back.
# You have to have a mechanism in your websocket
# messages allowing you to identify the corresponding
# answer to each request. On doing so, this is trivially
# paralellizable simply by calling asyncio.create_task
# instead of awaiting on asyncio.gather
payload = self.outgoing.get()
future = self._send(websocket, payload)
future_r = self._recv(websocket)
_, response = await asyncio.gather(future, future_r)
self.responses.put(response)
def send(self, payload):
# This is the method you call from your views
# simply do:
# `output = AWebSocket().send(payload)`
self.outgoing.put(payload)
return self.responses.get()
Related
I'm building an async library with aiohttp. The library has a single client that on instantiation creates a ClientSession and uses it to make requests to an API (it's an REST API wrapper)
The problem i'm facing is how to cleanly close the client session on exit?
If the session is not explicitly closed a whole lot of errors come out but i can't simply use context managers to close the session since i don't know when the program will end.
A tipical use would be this:
from mylibrary import Client
client = Client()
async main():
await client.get_foo(...)
await client.patch_bar(...)
asyncio.run(main())
I could add await client.close_session() on main but I want to remove this responsability from the enduser so ideally the client would automatically close the ClientSession when the program ends.
How can I do this?
I have tried using __del__ on the client to get the loop and close the session without success as well as using the atexit library, but it seems that by the time these run the asyncio loop has already been destroyed and I still get the warnings.
The specific error is:
Fatal error on SSL transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x0000013ACFD54AF0>
transport: <_ProactorSocketTransport fd=1052 read=<_OverlappedFuture cancelled>>
I did some research on this error and google seems to think it's because I need to implement flow control, I have however and this error only occurs if I don't explicitly close the session.
Unfortunately, it seems like the only clean pattern that can apply there is to make your client itself an (async) context manager, and require that your users use it in a with block.
The __del__ method could work in some cases - but it would require that code from your users would not "leak" the Client instance itself.
so, the code is trivial - the burden on your users is not zero:
class Client:
...
async def __aenter__(self):
return self
async def __aexit__(self, exc_type, exc_value, tb):
await self.close_session()
Creating a pseudo-hook on loop.stop:
Another way, though not "clean" and not guaranteed to work, could be to decorate the running loop stop function to add a call to close_session.
If the user code just "halts" and does not tear down the loop properly, this can't help anyway - but I guess it might be an option for "well behaved" users.
The big problem here is this is not documented - but taking a pick on asyncio internals, it looks it always will go through self.stop().
import asyncio
class ShutDownCb:
def __init__(self, cb):
self.cb = cb
self.stopping = False
loop = self.loop = asyncio.get_running_loop()
self.original_stop = loop.stop
loop.stop = self.new_stop
async def _stop(self):
self.task.result()
return self.original_stop()
def new_stop(self):
if not self.stopping:
self.stopping = True
self.task = asyncio.create_task(self.cb())
asyncio.create_task(self._stop())
return
return self.original_stop()
class Client:
def __init__(self, ...):
...
ShutDownCb(self.close_session)
Disclaimer: I'm new to using asyncio so this might be an easy fix.
I'm trying to write tests for the endpoints of an asynchronous grpc-server. The server has to regularly check something using a function that runs in an infinite loop, and still be responsive when the infinite loop is sleeping - which is why I'm using grpc-asyncio and pytest-asyncio.
example test (event_loop created by pytest-asyncio):
#pytest.mark.asyncio
async def test_endpoint(
event_loop,
test_client: test_pb2_grpc.TesterStub,
):
await serve() # THIS BLOCKS THE REST OF THE TEST
response = await test_client.TemporaryEndpointForTesting(request=test_pb2.TmpRequest())
assert response
client fixture:
#pytest.fixture
def test_client() -> test_pb2_grpc.TesterStub:
port = 50551
channel = aio.insecure_channel(f'localhost:{port}')
stub = test_pb2_grpc.TesterStub(channel)
return stub
server endpoints:
class Servicer(test_pb2_grpc.TesterServicer):
# ENDPOINT
async def TemporaryEndpointForTesting(self, request, context):
print("request received!")
return test_pb2.TmpResponse()
async def infinite_loop(self):
await asyncio.sleep(1.0)
print("<looping>")
return asyncio.ensure_future(self.infinite_loop())
server startup:
async def serve():
port = 50551
server: aio.Server = aio.server()
servicer = Servicer()
test_pb2_grpc.add_TesterServicer_to_server(servicer, server)
server.add_insecure_port(f'[::]:{port}')
task_1 = asyncio.create_task(servicer.infinite_loop())
task_2 = asyncio.create_task(server.start())
task_3 = asyncio.create_task(server.wait_for_termination())
await task_1
await task_2
await task_3
The goal is to set up the server, and then send requests to it to see if it responds as expected. When I start the server separately using await serve() and then run my tests, it seems to work flawlessly. But when I try to start it from the testcase, it gets stuck ... which I sort of get, since it's awaiting the (infinite) server-tasks to finish, but I don't know how to get around this. I thought using a different event_loop for the server-tasks would do the trick ...
new_event_loop = asyncio.new_event_loop()
task_1 = new_event_loop.create_task(servicer.infinite_loop())
task_2 = new_event_loop.create_task(server.start())
task_3 = new_event_loop.create_task(server.wait_for_termination())
but that didn't work either.
Best-case would be a way to start up the server within a fixture so I can just pass it to all test functions. I'm guessing this could also be done using threading, but that seems a bit superfluous considering the server is already using asyncio.
I've been at this all day, any help would be well appreciated.
(using Python 3.9)
This isn't really the solution I was looking for, but I just decided to go to end-to-end tests directly rather than trying to figure this out. So I'm using the Python Docker SDK to start the server via a pytest fixture and just send client commands to it. Or I start it using a debugger if that's needed.
Not as convenient as I'd like it to be, but I spent too much time on this issue already and this way it's tested e2e right away.
I am using Python 3.6, asyncio and the websockets library. I am trying to build a client for a websocket-based service which works as follows:
The client can send JSON requests with a custom id, a method and some params. The service will reply with a JSON payload with the same id echoed, and data as a result of the method call.
I would like to have an abstraction on top of this device that would work sort of like this:
wsc = get_websocket_connection()
async def call_method(method, **params):
packet = make_json_packet(method, params)
await wsc.send(packet)
resp = await wsc.recv()
return decode_json_packet(resp)
async def working_code():
separate_request = asyncio.ensure_future(call_method("quux"))
first_result = await call_method("foo", x=1)
second_result = await call_method("bar", y=first_result)
print(second_result)
return await separate_request
Now, I expect the separate_request to wait asynchronously while first_result and second_results are processed. But I have no guarantee that the wsc.recv() call will return the matching response; in fact, I have no guarantees that the service returns the responses in order of requests.
I can use the id field to disambiguate the responses. But how can I write the call_method() so that it manages the requests internally and resumes the "right" coroutine when the corresponding reply is received?
when I've done this sort of thing before I've tended to split things out into two parts:
"sending code" (can be multiple threads) this sets up where responses should go to (i.e. a dict of ids to functions or Futures), then sends the request and blocks for the response
"receiving code" (probably one thread per socket) that monitors all inbound traffic and passes responses off to whichever code is interested in the id. this is also a sensible place to handle the socket being closed unexpectedly which should push an exception out as appropriate
this is probably a few hundred lines of code and pretty application specific…
I have a python program which on certain event (for example on curl request) would calculate the function value. What I need is the moment the function executes, some data needs to be posted to tornado websocket. I have looked around internet and found examples on how to create websocket but all these examples cover scenarios where the data is invoked inside the websocket handler
Referring to this code for example:
https://github.com/benjaminmbrown/real-time-data-viz-d3-crossfilter-websocket-tutorial/blob/master/rt-data-viz/websocket_server.py
Can someone guide me on how can I post message on websocket. Basically I have tornado API where if user do a curl request I would like to log that message to websocket
You can do it by creating a registry of all active websockets and use it to send messages on a certain event.
class WebsocketRegistry:
def __init__(self):
self._active_websockets = []
def add_listener(self, listener):
self._active_websockets.append(listener)
def remove_listener(self, listener):
self._active_websockets.remove(listener)
def send_messages(self, msg_txt):
for ws in self._active_websockets:
ws.write_message(msg_txt)
registry = WebsocketRegistry()
class WSHandler(tornado.websocket.WebSocketHandler):
def open(self, *args, **kwargs):
super(WSHandler, self).open(*args, **kwargs)
registry.add_listener(self)
def on_close(self):
super(WSHandler, self).on_close()
registry.remove_listener(self)
P.S. Take note that if you plan to scale your app with 2+ instances, this won't work and you would have to use, for example, a message queue (RabbitMQ is good) to deliver events to the all opened websockets. But overall approach would be the same: MQ would be a registry and websockets subscribe on messages (and unsubscribe on closing) on connection.
Using python/tornado I wanted to set up a little "trampoline" server that allows two devices to communicate with each other in a RESTish manner. There's probably vastly superior/simpler "off the shelf" ways to do this. I'd welcome those suggestions, but I still feel it would be educational to figure out how to do my own using tornado.
Basically, the idea was that I would have the device in the role of server doing a longpoll with a GET. The client device would POST to the server, at which point the POST body would be transferred as the response of the blocked GET. Before the POST responded, it would block. The server side then does a PUT with the response, which is transferred to the blocked POST and return to the device. I thought maybe I could do this with tornado.queues. But that appears to not have worked out. My code:
import tornado
import tornado.web
import tornado.httpserver
import tornado.queues
ToServerQueue = tornado.queues.Queue()
ToClientQueue = tornado.queues.Queue()
class Query(tornado.web.RequestHandler):
def get(self):
toServer = ToServerQueue.get()
self.write(toServer)
def post(self):
toServer = self.request.body
ToServerQueue.put(toServer)
toClient = ToClientQueue.get()
self.write(toClient)
def put(self):
ToClientQueue.put(self.request.body)
self.write(bytes())
services = tornado.web.Application([(r'/query', Query)], debug=True)
services.listen(49009)
tornado.ioloop.IOLoop.instance().start()
Unfortunately, the ToServerQueue.get() does not actually block until the queue has an item, but rather returns a tornado.concurrent.Future. Which is not a legal value to pass to the self.write() call.
I guess my general question is twofold:
1) How can one HTTP verb invocation (e.g. get, put, post, etc) block and then be signaled by another HTTP verb invocation.
2) How can I share data from one invocation to another?
I've only really scratched the simple/straightforward use cases of making little REST servers with tornado. I wonder if the coroutine stuff is what I need, but haven't found a good tutorial/example of that to help me see the light, if that's indeed the way to go.
1) How can one HTTP verb invocation (e.g. get, put, post,u ne etc) block and then be signaled by another HTTP verb invocation.
2) How can I share data from one invocation to another?
The new RequestHandler object is created for every request. So you need some coordinator e.g. queues or locks with state object (in your case it would be re-implementing queue).
tornado.queues are queues for coroutines. Queue.get, Queue.put, Queue.join return Future objects, that need to be "resolved" - scheduled task done either with success or exception. To wait until future is resolved you should yielded it (just like in the doc examples of tornado.queues). The verbs method also need to be decorated with tornado.gen.coroutine.
import tornado.gen
class Query(tornado.web.RequestHandler):
#tornado.gen.coroutine
def get(self):
toServer = yield ToServerQueue.get()
self.write(toServer)
#tornado.gen.coroutine
def post(self):
toServer = self.request.body
yield ToServerQueue.put(toServer)
toClient = yield ToClientQueue.get()
self.write(toClient)
#tornado.gen.coroutine
def put(self):
yield ToClientQueue.put(self.request.body)
self.write(bytes())
The GET request will last (wait in non-blocking manner) until something will be available on the queue (or timeout that can be defined as Queue.get arg).
tornado.queues.Queue provides also get_nowait (there is put_nowait as well) that don't have to be yielded - returns immediately item from queue or throws exception.