Using DeferredQueue for inter-task communication in Twisted - python

I have a Client that currently does the following:
connects
collects some data locally
sends that data to a server
repeats
if disconnected, reconnects and continues the above (not shown)
Like this:
def do_send(self):
def get_data():
# do something
return data
def send_data(data)
self.sendMessage(data)
return deferToThread(get_data).addCallback(send_data)
def connectionMade(self):
WebSocketClientProtocol.connectionMade(self)
self.sender = task.LoopingCall(self.do_send)
self.sender.start(60)
However, when disconnected, I would like the data collection to continue, probably queuing and writing to file at a certain limit. I have reviewed the DeferredQueue object which seems like what I need, but I can't seem to crack it.
In pseudo-code, it would go something like this:
queue = DeferredQueue
# in a separate class from the client protocol
def start_data_collection():
self.collecter = task.LoopingCall(self.get_data)
self.sender.start(60)
def get_data()
# do something
queue.put(data)
Then have the client protocol check the queue, which is where I get lost. Is DeferredQueue what I need, or is there a better way?

A list would work just as well. You'll presumably get lost in the same place - how do you have the client protocol check the list?
Either way, here's one answer:
queued = []
...
connecting = endpoint.connect(factory)
def connected(protocol):
if queued:
sending = protocol.sendMessage(queued.pop(0))
sending.addCallback(sendNextMessage, protocol)
sending.addErrback(reconnect)
connecting.addCallback(connected)
The idea here is that at some point an event happens: your connection is established. This example represents that event as the connecting Deferred. When the event happens, connected is called. This example pops the first item from the queue (a list) and sends it. It waits for the send to be acknowledged and then sends the next message. It also implies some logic about handling errors by reconnecting.
Your code could look different. You could use the Protocol.connectionMade callback to represent the connection event instead. The core idea is the same - define callbacks to handle certain events when they happen. Whether you use an endpoint's connect Deferred or a protocol's connectionMade doesn't really matter.

Related

Best approach to multiple websocket client connections in Python?

I appreciate that the question I am about to ask is rather broad but, as a newcomer to Python, I am struggling to find the [best] way of doing something which would be trivial in, say, Node.js, and pretty trivial in other environments such as C#.
Let's say that there is a warehouse full of stuff. And let's say that there is a websocket interface onto that warehouse with two characteristics: on client connection it pumps out a full list of the warehouse's current inventory, and it then follows that up with further streaming updates when the inventory changes.
The web is full of examples of how, in Python, you connect to the warehouse and respond to changes in its state. But...
What if I want to connect to two warehouses and do something based on the combined information retrieved separately from each one? And what if I want to do things based on factors such as time, rather than solely being driven by inventory changes and incoming websocket messages?
In all the examples I've seen - and it's beginning to feel like hundreds - there is, somewhere, in some form, a run() or a run_forever() or a run_until_complete() etc. In other words, the I/O may be asynchronous, but there is always a massive blocking operation in the code, and always two fundamental assumptions which don't fit my case: that there will only be one websocket connection, and that all processing will be driven by events sent out by the [single] websocket server.
It's very unclear to me whether the answer to my question is some sort of use of multiple event loops, or of multiple threads, or something else.
To date, experimenting with Python has felt rather like being on the penthouse floor, admiring the quirky but undeniably elegant decor. But then you get in the elevator, press the button marked "parallelism" or "concurrency", and the evelator goes into freefall, eventually depositing you in a basement filled with some pretty ugly and steaming pipes.
... Returning from flowery metaphors back to the technical, the key thing I'm struggling with is the Python equivalent of, say, Node.js code which could be as trivially simple as the following example [left inelegant for simplicity]:
var aggregateState = { ... some sort of representation of combined state ... };
var socket1 = new WebSocket("wss://warehouse1");
socket1.on("message", OnUpdateFromWarehouse);
var socket2 = new WebSocket("wss://warehouse2");
socket2.on("message", OnUpdateFromWarehouse);
function OnUpdateFromWarehouse(message)
{
... Take the information and use it to update aggregate state from both warehouses ...
}
Answering my own question, in the hope that it may help other Python newcomers... asyncio seems to be the way to go (though there are gotchas such as the alarming ease with which you can deadlock the event loop).
Assuming the use of an asyncio-friendly websocket module such as websockets, what seems to work is a framework along the following lines - shorn, for simplicity, of logic such as reconnects. (The premise remains a warehouse which sends an initial list of its full inventory, and then sends updates to that initial state.)
class Warehouse:
def __init__(self, warehouse_url):
self.warehouse_url = warehouse_url
self.inventory = {} # Some description of the warehouse's inventory
async def destroy():
if (self.websocket.open):
self.websocket.close() # Terminates any recv() in wait_for_incoming()
await self.incoming_message_task # keep asyncio happy by awaiting the "background" task
async def start(self):
try:
# Connect to the warehouse
self.websocket = await connect(self.warehouse_url)
# Get its initial message which describes its full state
initial_inventory = await self.websocket.recv()
# Store the initial inventory
process_initial_inventory(initial_inventory)
# Set up a "background" task for further streaming reads of the web socket
self.incoming_message_task = asyncio.create_task(self.wait_for_incoming())
# Done
return True
except:
# Connection failed (or some unexpected error)
return False
async def wait_for_incoming(self):
while self.websocket.open:
try:
update_message = await self.websocket.recv()
asyncio.create_task(self.process_update_message(update_message))
except:
# Presumably, socket closure
pass
def process_initial_inventory(self, initial_inventory_message):
... Process initial_inventory_message into self.inventory ...
async def process_update_message(self, update_message):
... Merge update_message into self.inventory ...
... And fire some sort of event so that the object's
... creator can detect the change. There seems to be no ...
... consensus about what is a pythonic way of implementing events, ...
... so I'll declare that - potentially trivial - element as out-of-scope ...
After completing the initial connection logic, one key thing is setting up a "background" task which repeatedly reads further update messages coming in over the websocket. The code above doesn't include any firing of events, but there are all sorts of ways in which process_update_message() can/could do this (many of them trivially simple), allowing the object's creator to deal with notifications whenever and however it sees fit. The streaming messages will continue to be received, and any events will be continued to be fired, for as long as the object's creator continues to play nicely with asyncio and to participate in co-operative multitasking.
With that in place, a connection can be established along the following lines:
async def main():
warehouse1 = Warehouse("wss://warehouse1")
if await warehouse1.start():
... Connection succeeded. Update messages will now be processed
in the "background" provided that other users of the event loop
yield in some way ...
else:
... Connection failed ...
asyncio.run(main())
Multiple warehouses can be initiated in several ways, including doing a create_task(warehouse.start()) on each one and then doing a gather on the tasks to ensure/check that they're all okay.
When it's time to quit, to keep asyncio happy, and to stop it complaining about orphaned tasks, and to allow everything to shut down nicely, it's necessary to call destroy() on each warehouse.
But there's one common element which this doesn't cover. Extending the original premise above, let's say that the warehouse also accepts requests from our websocket client, such as "ship X to Y". The success/failure responses to these requests will come in alongside the general update messages; it generally won't be possible to guarantee that the first recv() after the send() of a request will be the response to that request. This complicates process_update_message().
The best answer I've found may or may not be considered "pythonic" because it uses a Future in a way which is strongly analogous to a TaskCompletionSource in .NET.
Let's invent a couple of implementation details; any real-world scenario is likely to look something like this:
We can supply a request_id when submitting an instruction to the warehouse
The success/failure response from the warehouse repeats the request_id back to us (and thus also distinguishing between command-response messages versus inventory-update messages)
The first step is to have a dictionary which maps the ID of pending, in-progress requests to Future objects:
def __init__(self, warehouse_url):
...
self.pending_requests = {}
The definition of a coroutine which sends a request then looks something like this:
async def send_request(self, some_request_definition)
# Allocate a unique ID for the request
request_id = <some unique request id>
# Create a Future for the pending request
request_future = asyncio.Future()
# Store the map of the ID -> Future in the dictionary of pending requests
self.pending_requests[request_id] = request_future
# Build a request message to send to the server, somehow including the request_id
request_msg = <some request definition, including the request_id>
# Send the message
await self.websocket.send(request_msg)
# Wait for the future to complete - we're now asynchronously awaiting
# activity in a separate function
await asyncio.wait_for(command_future, timeout = None)
# Return the result of the Future as the return value of send_request()
return request_future.result()
A caller can create a request and wait for its asynchronous response using something like the following:
some_result = await warehouse.send_request(<some request def>)
The key to making this all work is then to modify and extend process_update_message() to do the following:
Distinguish between request responses versus inventory updates
For the former, extract the request ID (which our invented scenario says gets repeated back to us)
Look up the pending Future for the request
Do a set_result() on it (whose value can be anything depending on what the server's response says). This releases send_request() and causes the await from it to be resolved.
For example:
async def process_update_message(self, update_message):
if <some test that update_message is a request response>:
request_id = <extract the request ID repeated back in update_message>
# Get the Future for this request ID
request_future = self.pending_requests[request_id]
# Create some sort of return value for send_request() based on the response
return_value = <some result of the request>
# Complete the Future, causing send_request() to return
request_future.set_result(return_value)
else:
... handle inventory updates as before ...
I've not used sockets with asyncio, but you're likely just looking for asyncio's open_connection
async def socket_activity(address, callback):
reader, _ = await asyncio.open_connection(address)
while True:
message = await reader.read()
if not message: # empty bytes on EOF
break # connection was closed
await callback(message)
Then add these to the event loop
tasks = [] # keeping a reference prevents these from being garbage collected
for address in ["wss://warehouse1", "wss://warehouse2"]:
tasks.append(asyncio.create_task(
socket_activity(address, callback)
))
# return tasks # or work with them
If you want to wait in a coroutine until N operations are complete, you can use .gather()
Alternatively, you may find Tornado does everything you want and more (I based my Answer off this one)
Tornado websocket client: how to async on_message? (coroutine was never awaited)

PyQt and TCP/IP

OK, so i have a pretty simple turn based application (game).
Each user sends a request on the server and then wait for response. The important is, that only ONE user makes the request on the server (send his actions) and all other users are just waiting, if the server sends them some data, so they must always check (in loop) if something is coming from server.
I´m using the built-in module of python "socket" and the way i manage the clients is this: For every user i create one thread, in which runs infinite loop, until the application ends, which checks for request (if it is the turn of the user) or checks if it got anything to sent to the other users. Now let´s move to Clients. Every client has again one thread with infinite loop, waiting for data from server.
The problem is that the GUI is made in PyQt4.4, where i cant get into the loop of the PyQt itself (although i have seen, that it is possible to do this with twisted, but then i would have to rewrite my code) so i have to use the thread, that means i can use classic python threading library or QThread, but QThread sadly doesn´t have any Events, which are pretty crucial because i want always wait after the message from the server for the response of the program, so i can send response to the server again. On the other hand, I am not sure, if i can use Thread from threading to emit signals. So which one is the right one to go?
Btw: is actually ok, to run the infinite client and server side loop? Because in every tutorial I have seen, the client close the connection right after he gets his answer.
EDIT:
here´s some of the code
Server side loop for connection with client:
while self.running:
if self.is_on_turn == p and self.reply is not None:
cmd = conn.recv(1024)
if cmd == '':
conn.close()
return
cmd = eval(cmd)
if self.is_on_turn != p: # User is not on turn
print "END"
conn.sendall("END")
else:
self.queue.put(cmd)
ev.wait() # Here works another program with the message and decide what to do next
ev.clear() #
conn.sendall(str(self.message))
if self.reply:
ev.wait() #
ev.clear() #
if self.reply:
r = conn.recv(1024)
if r == '':
conn.close()
return
self.queue.put(eval(r))
ev.wait() #
ev.clear() #
conn.sendall(str(self.message))
conn.close()
Client side loop:
def main_loop(self, pipe, conn, e, o): #e is event, o is bool (whether the client has to answer back to the server)
is_on_turn = conn.recv(4096)
pipe.send((is_on_turn))
while True:
if is_on_turn == h or o.value and o.value is not None:
conn.send(str(pipe.recv()))
pipe.send(eval(conn.recv(4096)))
e.wait()
e.clear()
The pipe is there, because I made it in multiprocessing at first, there should the emit signal for the PyQt instead, but as I said, I am not sure which approach to use
So the result is, that I have just used QTcpServer and QTcpSocket as sugessted by ekhumoro, which resulted in much cleaner code and easier management :)

Alternative to a while loop in twisted which doesn't block the reactor thread

I'm making a chat application in twisted. Suppose my server is designed in such a way that whenever it detects a client online, it sends the client all the pending-messages (those messages of that client which were cached in a python-list on the server because it was offline) one-by-one in a while loop until the list is exhausted. Something like this:
class MyChat(LineReceiver):
def connectionMade(self):
self.factory.clients.append(self)
while True:
#retrieve first message from a list of pending-messages(queue) of "self"
msg = self.retrieveFromQueue(self)
if msg != "empty":
self.transport.write(msg)
else:
break
def lineReceived(self, line):
...
def connectionLost(self, reason):
...
def retrieveFromQueue(self, who):
msglist = []
if who in self.factory.userMessages:
msglist = self.factory.userMessages[who]
if msglist != []:
msg = msglist.pop(0) #msglist is a list of strings
self.factory.userMessages[self] = msglist
return msg
else:
return "empty"
factory.userMessages = {} #dict of list of incoming messages of users who aren't online
So according to my understanding of Twisted, the while loop will block the main reactor thread and any interaction from any other client with the server will not be registered by the server. If that's the case, I want an alternate code/method to this approach which will not block the twisted thread.
Update: There may be 2000-3000 pending messages per user because of the nature of the app.
I think that https://glyph.twistedmatrix.com/2011/11/blocking-vs-running.html addresses this point.
The answer here depends on what exactly self.retrieveFromQueue(self) does. You implied it's something like:
if self.list_of_messages:
return self.list_of_messages.pop(0)
return b"empty"
If this is the case, then the answer is one thing. On the other hand, if the implementation is something more like:
return self.remote_mq_client.retrieve_queue_item(self.queue_identifier)
then the answer might be something else entirely. However, note that it's the implementation of retrieveFromQueue upon which the answer appears to hinge.
That there is a while loop isn't quite as important. The while loop reflects the fact that (to use Glyph's words), this code is getting work done.
You may decide that the amount of work this loop represents is too great to all get done at one time. If there are hundreds of millions of queued messages then copying them one by one into the connection's send buffer will probably use both a noticable amount of time and memory. In this case, you may wish to consider the producer/consumer pattern and its support in Twisted. This won't make the code any less (or more) "blocking" but it will make it run for shorter periods of time at a time.
So the questions to answer here are really:
whether or not retrieveFromQueue blocks
if it does not block, whether or not there will be so many queued messages that processing them all will cause connectionMade to run for so long that other clients notice a disruption in service

How can I create a non-http proxy with Twisted

How can I create a non-http proxy with Twisted. Instead I would like to do it for the Terraria protocol which is made entirely of binary data. I see that they have a built-in proxy for HTTP connections, but this application needs to act more like an entry point which is forwarded to a set server (almost like a BNC on IRC).
I can't figure out how to read the data off of one connection and send it to the other connection.
I have already tried using a socket for this task, but the blocking recv and send methods do not work well as two connections need to be live at the same time.
There are several different ways to create proxies in Twisted. The basic technique is built on peering, by taking two different protocols, on two different ports, and somehow gluing them together so that they can exchange data with each other.
The simplest proxy is a port-forwarder. Twisted ships with a port-forwarder implementation, see http://twistedmatrix.com/documents/current/api/twisted.protocols.portforward.html for the (underdocumented) classes ProxyClient and ProxyServer, although the actual source at http://twistedmatrix.com/trac/browser/tags/releases/twisted-11.0.0/twisted/protocols/portforward.py might be more useful to read through. From there, we can see the basic technique of proxying in Twisted:
def dataReceived(self, data):
self.peer.transport.write(data)
When a proxying protocol receives data, it puts it out to the peer on the other side. That's it! Quite simple. Of course, you'll usually need some extra setup... Let's look at a couple of proxies I've written before.
This is a proxy for Darklight, a little peer-to-peer system I wrote. It is talking to a backend server, and it wants to only proxy data if the data doesn't match a predefined header. You can see that it uses ProxyClientFactory and endpoints (fancy ClientCreator, basically) to start proxying, and when it receives data, it has an opportunity to examine it before continuing, either to keep proxying or to switch protocols.
class DarkServerProtocol(Protocol):
"""
Shim protocol for servers.
"""
peer = None
buf = ""
def __init__(self, endpoint):
self.endpoint = endpoint
print "Protocol created..."
def challenge(self, challenge):
log.msg("Challenged: %s" % challenge)
# ...omitted for brevity...
return is_valid(challenge)
def connectionMade(self):
pcf = ProxyClientFactory()
pcf.setServer(self)
d = self.endpoint.connect(pcf)
d.addErrback(lambda failure: self.transport.loseConnection())
self.transport.pauseProducing()
def setPeer(self, peer):
# Our proxy passthrough has succeeded, so we will be seeing data
# coming through shortly.
log.msg("Established passthrough")
self.peer = peer
def dataReceived(self, data):
self.buf += data
# Examine whether we have received a challenge.
if self.challenge(self.buf):
# Excellent; change protocol.
p = DarkAMP()
p.factory = self.factory
self.transport.protocol = p
p.makeConnection(self.transport)
elif self.peer:
# Well, go ahead and send it through.
self.peer.transport.write(data)
This is a rather complex chunk of code which takes two StatefulProtocols and glues them together rather forcefully. This is from a VNC proxy (https://code.osuosl.org/projects/twisted-vncauthproxy to be precise), which needs its protocols to do a lot of pre-authentication stuff before they are ready to be joined. This kind of proxy is the worst case; for speed, you don't want to interact with the data going over the proxy, but you need to do some setup beforehand.
def start_proxying(result):
"""
Callback to start proxies.
"""
log.msg("Starting proxy")
client_result, server_result = result
success = True
client_success, client = client_result
server_success, server = server_result
if not client_success:
success = False
log.err("Had issues on client side...")
log.err(client)
if not server_success:
success = False
log.err("Had issues on server side...")
log.err(server)
if not success:
log.err("Had issues connecting, disconnecting both sides")
if not isinstance(client, Failure):
client.transport.loseConnection()
if not isinstance(server, Failure):
server.transport.loseConnection()
return
server.dataReceived = client.transport.write
client.dataReceived = server.transport.write
# Replay last bits of stuff in the pipe, if there's anything left.
data = server._sful_data[1].read()
if data:
client.transport.write(data)
data = client._sful_data[1].read()
if data:
server.transport.write(data)
server.transport.resumeProducing()
client.transport.resumeProducing()
log.msg("Proxying started!")
So, now that I've explained that...
I also wrote Bravo. As in, http://www.bravoserver.org/. So I know a bit about Minecraft, and thus about Terraria. You will probably want to parse the packets coming through your proxy on both sides, so your actual proxying might start out looking like this, but it will quickly evolve as you begin to understand the data you're proxying. Hopefully this is enough to get you started!

python twisted - Timeouting on a sent message that did not get a response

I am creating a sort of a client-server implementation, and I'd like to make sure that every sent message gets a response. So I want to create a timeout mechanism, which doesn't check if the message itself is delivered, but rather checks if the delivered message gets a response.
IE, for two computers 1 and 2:
1: send successfully: "hello"
2: <<nothing>>
...
1: Didn't get a response for my "hello" --> timeout
I thought of doing it by creating a big boolean array with id for each message, which will hold a "in progress" flag, and will be set when the message's response is received.
I was wondering perhaps there was a better way of doing that.
Thanks,
Ido.
There is a better way, which funnily enough I myself just implemented here. It uses the TimeoutMixin to achieve the timeout behaviour you need, and a DeferredLock to match up the correct replies with what was sent.
from twisted.internet import defer
from twisted.protocols.policies import TimeoutMixin
from twisted.protocols.basic import LineOnlyReceiver
class PingPongProtocol(LineOnlyReceiver, TimeoutMixin):
def __init__(self):
self.lock = defer.DeferredLock()
self.deferred = None
def sendMessage(self, msg):
result = self.lock.run(self._doSend, msg)
return result
def _doSend(self, msg):
assert self.deferred is None, "Already waiting for reply!"
self.deferred = defer.Deferred()
self.deferred.addBoth(self._cleanup)
self.setTimeout(self.DEFAULT_TIMEOUT)
self.sendLine(msg)
return self.deferred
def _cleanup(self, res):
self.deferred = None
return res
def lineReceived(self, line):
if self.deferred:
self.setTimeout(None)
self.deferred.callback(line)
# If not, we've timed out or this is a spurious line
def timeoutConnection(self):
self.deferred.errback(
Timeout("Some informative message"))
I haven't tested this, it's more of a starting point. There are a few things you might want to change here to suit your purposes:
I use a LineOnlyReceiver — that's not relevant to the problem itself, and you'll need to replace sendLine/lineReceived with the appropriate API calls for your protocol.
This is for a serial connection, so I don't deal with connectionLost etc. You might need to.
I like to keep state directly in the instance. If you need extra state information, set it up in _doSend and clean it up in _cleanup. Some people don't like that — the alternative is to create nested functions inside _doSend that close over the state information that you need. You'll still need that self.deferred there though, otherwise lineReceived (or dataReceived) has no idea what to do.
How to use it
Like I said, I created this for serial communications, where I don't have to worry about factories, connectTCP, etc. If you're using TCP communications, you'll need to figure out the extra glue you need.
# Create the protocol somehow. Maybe this actually happens in a factory,
# in which case, the factory could have wrapper methods for this.
protocol = PingPongProtocol()
def = protocol.sendMessage("Hi there!")
def.addCallbacks(gotHiResponse, noHiResponse)

Categories

Resources