Best approach to multiple websocket client connections in Python? - python

I appreciate that the question I am about to ask is rather broad but, as a newcomer to Python, I am struggling to find the [best] way of doing something which would be trivial in, say, Node.js, and pretty trivial in other environments such as C#.
Let's say that there is a warehouse full of stuff. And let's say that there is a websocket interface onto that warehouse with two characteristics: on client connection it pumps out a full list of the warehouse's current inventory, and it then follows that up with further streaming updates when the inventory changes.
The web is full of examples of how, in Python, you connect to the warehouse and respond to changes in its state. But...
What if I want to connect to two warehouses and do something based on the combined information retrieved separately from each one? And what if I want to do things based on factors such as time, rather than solely being driven by inventory changes and incoming websocket messages?
In all the examples I've seen - and it's beginning to feel like hundreds - there is, somewhere, in some form, a run() or a run_forever() or a run_until_complete() etc. In other words, the I/O may be asynchronous, but there is always a massive blocking operation in the code, and always two fundamental assumptions which don't fit my case: that there will only be one websocket connection, and that all processing will be driven by events sent out by the [single] websocket server.
It's very unclear to me whether the answer to my question is some sort of use of multiple event loops, or of multiple threads, or something else.
To date, experimenting with Python has felt rather like being on the penthouse floor, admiring the quirky but undeniably elegant decor. But then you get in the elevator, press the button marked "parallelism" or "concurrency", and the evelator goes into freefall, eventually depositing you in a basement filled with some pretty ugly and steaming pipes.
... Returning from flowery metaphors back to the technical, the key thing I'm struggling with is the Python equivalent of, say, Node.js code which could be as trivially simple as the following example [left inelegant for simplicity]:
var aggregateState = { ... some sort of representation of combined state ... };
var socket1 = new WebSocket("wss://warehouse1");
socket1.on("message", OnUpdateFromWarehouse);
var socket2 = new WebSocket("wss://warehouse2");
socket2.on("message", OnUpdateFromWarehouse);
function OnUpdateFromWarehouse(message)
{
... Take the information and use it to update aggregate state from both warehouses ...
}

Answering my own question, in the hope that it may help other Python newcomers... asyncio seems to be the way to go (though there are gotchas such as the alarming ease with which you can deadlock the event loop).
Assuming the use of an asyncio-friendly websocket module such as websockets, what seems to work is a framework along the following lines - shorn, for simplicity, of logic such as reconnects. (The premise remains a warehouse which sends an initial list of its full inventory, and then sends updates to that initial state.)
class Warehouse:
def __init__(self, warehouse_url):
self.warehouse_url = warehouse_url
self.inventory = {} # Some description of the warehouse's inventory
async def destroy():
if (self.websocket.open):
self.websocket.close() # Terminates any recv() in wait_for_incoming()
await self.incoming_message_task # keep asyncio happy by awaiting the "background" task
async def start(self):
try:
# Connect to the warehouse
self.websocket = await connect(self.warehouse_url)
# Get its initial message which describes its full state
initial_inventory = await self.websocket.recv()
# Store the initial inventory
process_initial_inventory(initial_inventory)
# Set up a "background" task for further streaming reads of the web socket
self.incoming_message_task = asyncio.create_task(self.wait_for_incoming())
# Done
return True
except:
# Connection failed (or some unexpected error)
return False
async def wait_for_incoming(self):
while self.websocket.open:
try:
update_message = await self.websocket.recv()
asyncio.create_task(self.process_update_message(update_message))
except:
# Presumably, socket closure
pass
def process_initial_inventory(self, initial_inventory_message):
... Process initial_inventory_message into self.inventory ...
async def process_update_message(self, update_message):
... Merge update_message into self.inventory ...
... And fire some sort of event so that the object's
... creator can detect the change. There seems to be no ...
... consensus about what is a pythonic way of implementing events, ...
... so I'll declare that - potentially trivial - element as out-of-scope ...
After completing the initial connection logic, one key thing is setting up a "background" task which repeatedly reads further update messages coming in over the websocket. The code above doesn't include any firing of events, but there are all sorts of ways in which process_update_message() can/could do this (many of them trivially simple), allowing the object's creator to deal with notifications whenever and however it sees fit. The streaming messages will continue to be received, and any events will be continued to be fired, for as long as the object's creator continues to play nicely with asyncio and to participate in co-operative multitasking.
With that in place, a connection can be established along the following lines:
async def main():
warehouse1 = Warehouse("wss://warehouse1")
if await warehouse1.start():
... Connection succeeded. Update messages will now be processed
in the "background" provided that other users of the event loop
yield in some way ...
else:
... Connection failed ...
asyncio.run(main())
Multiple warehouses can be initiated in several ways, including doing a create_task(warehouse.start()) on each one and then doing a gather on the tasks to ensure/check that they're all okay.
When it's time to quit, to keep asyncio happy, and to stop it complaining about orphaned tasks, and to allow everything to shut down nicely, it's necessary to call destroy() on each warehouse.
But there's one common element which this doesn't cover. Extending the original premise above, let's say that the warehouse also accepts requests from our websocket client, such as "ship X to Y". The success/failure responses to these requests will come in alongside the general update messages; it generally won't be possible to guarantee that the first recv() after the send() of a request will be the response to that request. This complicates process_update_message().
The best answer I've found may or may not be considered "pythonic" because it uses a Future in a way which is strongly analogous to a TaskCompletionSource in .NET.
Let's invent a couple of implementation details; any real-world scenario is likely to look something like this:
We can supply a request_id when submitting an instruction to the warehouse
The success/failure response from the warehouse repeats the request_id back to us (and thus also distinguishing between command-response messages versus inventory-update messages)
The first step is to have a dictionary which maps the ID of pending, in-progress requests to Future objects:
def __init__(self, warehouse_url):
...
self.pending_requests = {}
The definition of a coroutine which sends a request then looks something like this:
async def send_request(self, some_request_definition)
# Allocate a unique ID for the request
request_id = <some unique request id>
# Create a Future for the pending request
request_future = asyncio.Future()
# Store the map of the ID -> Future in the dictionary of pending requests
self.pending_requests[request_id] = request_future
# Build a request message to send to the server, somehow including the request_id
request_msg = <some request definition, including the request_id>
# Send the message
await self.websocket.send(request_msg)
# Wait for the future to complete - we're now asynchronously awaiting
# activity in a separate function
await asyncio.wait_for(command_future, timeout = None)
# Return the result of the Future as the return value of send_request()
return request_future.result()
A caller can create a request and wait for its asynchronous response using something like the following:
some_result = await warehouse.send_request(<some request def>)
The key to making this all work is then to modify and extend process_update_message() to do the following:
Distinguish between request responses versus inventory updates
For the former, extract the request ID (which our invented scenario says gets repeated back to us)
Look up the pending Future for the request
Do a set_result() on it (whose value can be anything depending on what the server's response says). This releases send_request() and causes the await from it to be resolved.
For example:
async def process_update_message(self, update_message):
if <some test that update_message is a request response>:
request_id = <extract the request ID repeated back in update_message>
# Get the Future for this request ID
request_future = self.pending_requests[request_id]
# Create some sort of return value for send_request() based on the response
return_value = <some result of the request>
# Complete the Future, causing send_request() to return
request_future.set_result(return_value)
else:
... handle inventory updates as before ...

I've not used sockets with asyncio, but you're likely just looking for asyncio's open_connection
async def socket_activity(address, callback):
reader, _ = await asyncio.open_connection(address)
while True:
message = await reader.read()
if not message: # empty bytes on EOF
break # connection was closed
await callback(message)
Then add these to the event loop
tasks = [] # keeping a reference prevents these from being garbage collected
for address in ["wss://warehouse1", "wss://warehouse2"]:
tasks.append(asyncio.create_task(
socket_activity(address, callback)
))
# return tasks # or work with them
If you want to wait in a coroutine until N operations are complete, you can use .gather()
Alternatively, you may find Tornado does everything you want and more (I based my Answer off this one)
Tornado websocket client: how to async on_message? (coroutine was never awaited)

Related

redis block until key exists

I'm new to Redis and was wondering if there is a way to be able to await geting a value by it's key until the key exists. Minimal code:
async def handler():
data = await self._fetch(key)
async def _fetch(key):
return self.redis_connection.get(key)
As you know, if such key doesnt exist, it return's None. But since in my project, seting key value pair to redis takes place in another application, I want the redis_connection get method to block untill key exists.
Is such expectation even valid?
It is not possible to do what you are trying to do without implementing some sort of polling redis GET on your client. On that case your client would have to do something like:
async def _fetch(key):
val = self.redis_connection.get(key)
while val is None:
# Sleep and retry here
asyncio.sleep(1)
val = self.redis_connection.get(key)
return val
However I would ask you to completelly reconsider the pattern you are using for this problem.
It seems to me that what you need its to do something like Pub/Sub https://redis.io/topics/pubsub.
So the app that performs the SET becomes a publisher, and the app that does the GET and waits until the key is available becomes the subscriber.
I did a bit of research on this and it looks like you can do it with asyncio_redis:
Subscriber https://github.com/jonathanslenders/asyncio-redis/blob/b20d4050ca96338a129b30370cfaa22cc7ce3886/examples/pubsub/receiver.py.
Sender(Publisher): https://github.com/jonathanslenders/asyncio-redis/blob/b20d4050ca96338a129b30370cfaa22cc7ce3886/examples/pubsub/sender.py
Hope this helps.
Except the keyspace notification method mentioned by #Itamar Haber, another solution is the blocking operations on LIST.
handler method calls BRPOP on an empty LIST: BRPOP notify-list timeout, and blocks until notify-list is NOT empty.
The other application pushes the value to the LIST when it finishes setting the key-value pair as usual: SET key value; LPUSH notify-list value.
handler awake from the blocking operation with the value you want, and the notify-list is destroyed by Redis automatically.
The advantage of this solution is that you don't need to modify your handler method too much (with the keyspace notification solution, you need to register a callback function). While the disadvantage is that you have to rely on the notification of another application (with keyspace notification solution, Redis does the notification automatically).
The closest you can get to this behavior is by enabling keyspace notifications and subscribing to the relevant channels (possibly by pattern).
Note, however, that notifications rely on PubSub that is not guaranteed to deliver messages (at-most-once semantics).
After Redis 5.0 there is built-in stream which supports blocking read. The following are sample codes with redis-py.
#add value to my_stream
redis.xadd('my_stream',{'key':'str_value'})
#read from beginning of stream
last_id='0'
#blocking read until there is value
last_stream_item = redis.xread({"my_stream":last_id},block=0)
#update last_id
last_id = last_stream_item[0][1][0][0]
#wait for next value to arrive on stream
last_stream_item = redis.xread({"my_stream":last_id},block=0)

Python 3 (Bot) script stops working

I'm trying to connect to a TeamSpeak server using the QueryServer to make a bot. I've taken advice from this thread, however I still need help.
This is The TeamSpeak API that I'm using.
Before the edits, this was the summary of what actually happened in my script (1 connection):
It connects.
It checks for channel ID (and it's own client ID)
It joins the channel and starts reading everything
If someone says an specific command, it executes the command and then it disconnects.
How can I make it so it doesn't disconnect? How can I make the script stay in a "waiting" state so it can keep reading after the command is executed?
I am using Python 3.4.1.
I tried learning Threading but either I'm dumb or it doesn't work the way I thought it would. There's another "bug", once waiting for events, if I don't trigger anything with a command, it disconnects after 60 seconds.
#Librerias
import ts3
import threading
import datetime
from random import choice, sample
# Data needed #
USER = "thisisafakename"
PASS = "something"
HOST = "111.111.111.111"
PORT = 10011
SID = 1
class BotPrincipal:
def __init__(self, manejador=False):
self.ts3conn = ts3.query.TS3Connection(HOST, PORT)
self.ts3conn.login(client_login_name=USER, client_login_password=PASS)
self.ts3conn.use(sid=SID)
channelToJoin = Bot.GettingChannelID("TestingBot")
try: #Login with a client that is ok
self.ts3conn.clientupdate(client_nickname="The Reader Bot")
self.MyData = self.GettingMyData()
self.MoveUserToChannel(ChannelToJoin, Bot.MyData["client_id"])
self.suscribirEvento("textchannel", ChannelToJoin)
self.ts3conn.on_event = self.manejadorDeEventos
self.ts3conn.recv_in_thread()
except ts3.query.TS3QueryError: #Name already exists, 2nd client connect with this info
self.ts3conn.clientupdate(client_nickname="The Writer Bot")
self.MyData = self.GettingMyData()
self.MoveUserToChannel(ChannelToJoin, Bot.MyData["client_id"])
def __del__(self):
self.ts3conn.close()
def GettingMyData(self):
respuesta = self.ts3conn.whoami()
return respuesta.parsed[0]
def GettingChannelID(self, nombre):
respuesta = self.ts3conn.channelfind(pattern=ts3.escape.TS3Escape.unescape(nombre))
return respuesta.parsed[0]["cid"]
def MoveUserToChannel(self, idCanal, idUsuario, passCanal=None):
self.ts3conn.clientmove(cid=idCanal, clid=idUsuario, cpw=passCanal)
def suscribirEvento(self, tipoEvento, idCanal):
self.ts3conn.servernotifyregister(event=tipoEvento, id_=idCanal)
def SendTextToChannel(self, idCanal, mensajito="Error"):
self.ts3conn.sendtextmessage(targetmode=2, target=idCanal, msg=mensajito) #This works
print("test") #PROBLEM HERE This doesn't work. Why? the line above did work
def manejadorDeEventos(sender, event):
message = event.parsed[0]['msg']
if "test" in message: #This works
Bot.SendTextToChannel(ChannelToJoin, "This is a test") #This works
if __name__ == "__main__":
Bot = BotPrincipal()
threadprincipal = threading.Thread(target=Bot.__init__)
threadprincipal.start()
Prior to using 2 bots, I tested to launch the SendTextToChannel when it connects and it works perfectly, allowing me to do anything that I want after it sends the text to the channel. The bug that made entire python code stop only happens if it's triggered by the manejadorDeEventos
Edit 1 - Experimenting with threading.
I messed it up big time with threading, getting to the result where 2 clients connect at same time. Somehow i think 1 of them is reading the events and the other one is answering. The script doesn't close itself anymore and that's a win, but having a clone connection doesn't looks good.
Edit 2 - Updated code and actual state of the problem.
I managed to make the double connection works more or less "fine", but it disconnects if nothing happens in the room for 60 seconds. Tried using Threading.timer but I'm unable to make it works. The entire question code has been updated for it.
I would like an answer that helps me to do both reading from the channel and answering to it without the need of connect a second bot for it (like it's actually doing...) And I would give extra points if the answer also helps me to understand an easy way to make a query to the server each 50 seconds so it doesn't disconnects.
From looking at the source, recv_in_thread doesn't create a thread that loops around receiving messages until quit time, it creates a thread that receives a single message and then exits:
def recv_in_thread(self):
"""
Calls :meth:`recv` in a thread. This is useful,
if you used ``servernotifyregister`` and you expect to receive events.
"""
thread = threading.Thread(target=self.recv, args=(True,))
thread.start()
return None
That implies that you have to repeatedly call recv_in_thread, not just call it once.
I'm not sure exactly where to do so from reading the docs, but presumably it's at the end of whatever callback gets triggered by a received event; I think that's your manejadorDeEventos method? (Or maybe it's something related to the servernotifyregister method? I'm not sure what servernotifyregister is for and what on_event is for…)
That manejadorDeEventos brings up two side points:
You've declared manejadorDeEventos wrong. Every method has to take self as its first parameter. When you pass a bound method, like self.manejadorDeEventos, that bound self object is going to be passed as the first argument, before any arguments that the caller passes. (There are exceptions to this for classmethods and staticmethods, but those don't apply here.) Also, within that method, you should almost certainly be accessing self, not a global variable Bot that happens to be the same object as self.
If manejadorDeEventos is actually the callback for recv_in_thread, you've got a race condition here: if the first message comes in before your main threads finishes the on_event assignment, the recv_on_thread won't be able to call your event handler. (This is exactly the kind of bug that often shows up one time in a million, making it a huge pain to debug when you discover it months after deploying or publishing your code.) So, reverse those two lines.
One last thing: a brief glimpse at this library's code is a bit worrisome. It doesn't look like it's written by someone who really knows what they're doing. The method I copied above only has 3 lines of code, but it includes a useless return None and a leaked Thread that can never be joined, not to mention that the whole design of making you call this method (and spawn a new thread) after each event received is weird, and even more so given that it's not really explained. If this is the standard client library for a service you have to use, then you really don't have much choice in the matter, but if it's not, I'd consider looking for a different library.

Alternative to a while loop in twisted which doesn't block the reactor thread

I'm making a chat application in twisted. Suppose my server is designed in such a way that whenever it detects a client online, it sends the client all the pending-messages (those messages of that client which were cached in a python-list on the server because it was offline) one-by-one in a while loop until the list is exhausted. Something like this:
class MyChat(LineReceiver):
def connectionMade(self):
self.factory.clients.append(self)
while True:
#retrieve first message from a list of pending-messages(queue) of "self"
msg = self.retrieveFromQueue(self)
if msg != "empty":
self.transport.write(msg)
else:
break
def lineReceived(self, line):
...
def connectionLost(self, reason):
...
def retrieveFromQueue(self, who):
msglist = []
if who in self.factory.userMessages:
msglist = self.factory.userMessages[who]
if msglist != []:
msg = msglist.pop(0) #msglist is a list of strings
self.factory.userMessages[self] = msglist
return msg
else:
return "empty"
factory.userMessages = {} #dict of list of incoming messages of users who aren't online
So according to my understanding of Twisted, the while loop will block the main reactor thread and any interaction from any other client with the server will not be registered by the server. If that's the case, I want an alternate code/method to this approach which will not block the twisted thread.
Update: There may be 2000-3000 pending messages per user because of the nature of the app.
I think that https://glyph.twistedmatrix.com/2011/11/blocking-vs-running.html addresses this point.
The answer here depends on what exactly self.retrieveFromQueue(self) does. You implied it's something like:
if self.list_of_messages:
return self.list_of_messages.pop(0)
return b"empty"
If this is the case, then the answer is one thing. On the other hand, if the implementation is something more like:
return self.remote_mq_client.retrieve_queue_item(self.queue_identifier)
then the answer might be something else entirely. However, note that it's the implementation of retrieveFromQueue upon which the answer appears to hinge.
That there is a while loop isn't quite as important. The while loop reflects the fact that (to use Glyph's words), this code is getting work done.
You may decide that the amount of work this loop represents is too great to all get done at one time. If there are hundreds of millions of queued messages then copying them one by one into the connection's send buffer will probably use both a noticable amount of time and memory. In this case, you may wish to consider the producer/consumer pattern and its support in Twisted. This won't make the code any less (or more) "blocking" but it will make it run for shorter periods of time at a time.
So the questions to answer here are really:
whether or not retrieveFromQueue blocks
if it does not block, whether or not there will be so many queued messages that processing them all will cause connectionMade to run for so long that other clients notice a disruption in service

Using DeferredQueue for inter-task communication in Twisted

I have a Client that currently does the following:
connects
collects some data locally
sends that data to a server
repeats
if disconnected, reconnects and continues the above (not shown)
Like this:
def do_send(self):
def get_data():
# do something
return data
def send_data(data)
self.sendMessage(data)
return deferToThread(get_data).addCallback(send_data)
def connectionMade(self):
WebSocketClientProtocol.connectionMade(self)
self.sender = task.LoopingCall(self.do_send)
self.sender.start(60)
However, when disconnected, I would like the data collection to continue, probably queuing and writing to file at a certain limit. I have reviewed the DeferredQueue object which seems like what I need, but I can't seem to crack it.
In pseudo-code, it would go something like this:
queue = DeferredQueue
# in a separate class from the client protocol
def start_data_collection():
self.collecter = task.LoopingCall(self.get_data)
self.sender.start(60)
def get_data()
# do something
queue.put(data)
Then have the client protocol check the queue, which is where I get lost. Is DeferredQueue what I need, or is there a better way?
A list would work just as well. You'll presumably get lost in the same place - how do you have the client protocol check the list?
Either way, here's one answer:
queued = []
...
connecting = endpoint.connect(factory)
def connected(protocol):
if queued:
sending = protocol.sendMessage(queued.pop(0))
sending.addCallback(sendNextMessage, protocol)
sending.addErrback(reconnect)
connecting.addCallback(connected)
The idea here is that at some point an event happens: your connection is established. This example represents that event as the connecting Deferred. When the event happens, connected is called. This example pops the first item from the queue (a list) and sends it. It waits for the send to be acknowledged and then sends the next message. It also implies some logic about handling errors by reconnecting.
Your code could look different. You could use the Protocol.connectionMade callback to represent the connection event instead. The core idea is the same - define callbacks to handle certain events when they happen. Whether you use an endpoint's connect Deferred or a protocol's connectionMade doesn't really matter.

python twisted - Timeouting on a sent message that did not get a response

I am creating a sort of a client-server implementation, and I'd like to make sure that every sent message gets a response. So I want to create a timeout mechanism, which doesn't check if the message itself is delivered, but rather checks if the delivered message gets a response.
IE, for two computers 1 and 2:
1: send successfully: "hello"
2: <<nothing>>
...
1: Didn't get a response for my "hello" --> timeout
I thought of doing it by creating a big boolean array with id for each message, which will hold a "in progress" flag, and will be set when the message's response is received.
I was wondering perhaps there was a better way of doing that.
Thanks,
Ido.
There is a better way, which funnily enough I myself just implemented here. It uses the TimeoutMixin to achieve the timeout behaviour you need, and a DeferredLock to match up the correct replies with what was sent.
from twisted.internet import defer
from twisted.protocols.policies import TimeoutMixin
from twisted.protocols.basic import LineOnlyReceiver
class PingPongProtocol(LineOnlyReceiver, TimeoutMixin):
def __init__(self):
self.lock = defer.DeferredLock()
self.deferred = None
def sendMessage(self, msg):
result = self.lock.run(self._doSend, msg)
return result
def _doSend(self, msg):
assert self.deferred is None, "Already waiting for reply!"
self.deferred = defer.Deferred()
self.deferred.addBoth(self._cleanup)
self.setTimeout(self.DEFAULT_TIMEOUT)
self.sendLine(msg)
return self.deferred
def _cleanup(self, res):
self.deferred = None
return res
def lineReceived(self, line):
if self.deferred:
self.setTimeout(None)
self.deferred.callback(line)
# If not, we've timed out or this is a spurious line
def timeoutConnection(self):
self.deferred.errback(
Timeout("Some informative message"))
I haven't tested this, it's more of a starting point. There are a few things you might want to change here to suit your purposes:
I use a LineOnlyReceiver — that's not relevant to the problem itself, and you'll need to replace sendLine/lineReceived with the appropriate API calls for your protocol.
This is for a serial connection, so I don't deal with connectionLost etc. You might need to.
I like to keep state directly in the instance. If you need extra state information, set it up in _doSend and clean it up in _cleanup. Some people don't like that — the alternative is to create nested functions inside _doSend that close over the state information that you need. You'll still need that self.deferred there though, otherwise lineReceived (or dataReceived) has no idea what to do.
How to use it
Like I said, I created this for serial communications, where I don't have to worry about factories, connectTCP, etc. If you're using TCP communications, you'll need to figure out the extra glue you need.
# Create the protocol somehow. Maybe this actually happens in a factory,
# in which case, the factory could have wrapper methods for this.
protocol = PingPongProtocol()
def = protocol.sendMessage("Hi there!")
def.addCallbacks(gotHiResponse, noHiResponse)

Categories

Resources