Twisted: callbacks design for a chatty protocol

Twisted: callbacks design for a chatty protocol - python

I am using twisted to handle a text based protocol. Initially the client connects to server. After connecting, server will send commands to which the client should respond to. Each type of commands takes different amount of time to formulate a response. (eg: an return a value from local hash versus returning a value from a complex MySQL query on a big database). The response from client should go in the order the command was received. Server will not wait for a response from one command before sending another command, but expects response in the order command was sent. Client cannot expect any order for commands sent from server.
Following is a minimal code showing the outline of how my program works currently.
class ExternalListener(LineReceiver):
def connectionMade(self):
log.msg("Listener: New connection")
def lookupMethod(self, command):
return getattr(self, 'do_' + command.lower(), None)
def lineReceived(self, verb):
method = self.lookupMethod(verb)
method(verb)
def do_cmd1(self, verb):
d = self.getResult(verb)
d.addCallback(self._cbValidate1)
def _cbValidate1(self):
resp = "response"
self.transport.write(resp)
def do_cmd2(self, verb):
d = self.getResult(verb)
d.addCallback(self._cbValidate1)
def _cbValidate2(self):
resp = "response"
self.transport.write(resp)
As it can be seen, this will not take care of ordering of responses. I am not in a position to use DeferredList because deferreds are created as and when a command is received and there is no list of deferreds which I can put in a DeferredList.
What is the twisted way to handle this scenario?
Thanks and Regards,

One solution is to use a protocol with tags for requests and responses. This means you can generate responses in any order. See AMP (or even IMAP4) as an example of such a protocol.
However, if the protocol is out of your control and you cannot fix it, then a not-quite-as-nice solution is to buffer the responses to ensure proper ordering.
I don't think there is any particularly Twisted solution here, it's just a matter of holding responses to newer requests until all of the responses to older requests has been sent. That's probably a matter of using a counter internally to assign an ordering to responses and then implementing the logic to either buffer a response if it needs to wait or send it and all appropriate buffered responses if it doesn't.

Related

Does setting socket timeout cancel the initial request

I have a request that can only run once. At times, the request takes much longer than it should.
If I were to set a default socket timeout value (using socket.setdefaulttimeout(5)), and it took longer than 5 seconds, will the original request be cancelled so it's safe to retry (see example code below)?
If not, what is the best way to cancel the original request and retry it again ensuring it never runs more than once.
import socket
from googleapiclient.discovery import build
from tenacity import retry, stop_after_attempt, wait_fixed, retry_if_exception_type
#retry(
retry=retry_if_exception_type(socket.timeout),
wait=wait_fixed(4),
stop=stop_after_attempt(3)
)
def create_file_once_only(creds, body):
service = build('drive', 'v3', credentials=creds)
file = service.files().create(body=body, fields='id').execute()
socket.setdefaulttimeout(5)
create_file_once_only(creds, body)

It's unlikely that this can be made to work as you hope. An HTTP POST (as with any other HTTP request) is implemented by sending a command to the web server, then receiving a response. The python requests library encapsulates a lot of tedious parts of that for you, but at the core, it's going to do a socket send followed by a socket recv (it may of course require more than one send or recv depending on the size of the data).
Now, if you were able to connect to the web server initially (again, this is taken care of for you by the requests library but typically only takes a few milliseconds), then it's highly likely that the data in your POST request has long since been sent. (If the data you are sending is megabytes long, it's possible that it's only been partially sent, but if it is reasonably short, it's almost certainly been sent in full.)
That in turn means that in all likelihood the server has received your entire request and is working on it or has enqueued your request to work on it eventually. In either case, even if you break the connection to the server by timing out on the recv, it's unlikely that the server will actually even notice that until it gets to the point in its execution where it would be sending its response to your request. By that point, it has probably finished doing whatever it was going to do.
In other words, your socket timeout is not going to apply to the "HTTP request" -- it applies to the underlying socket operations instead -- and almost certainly to the recv part on the tail end. And just breaking the socket connection doesn't cancel the HTTP request.
There is no reliable way to do what you want without designing a transactional protocol with the close cooperation of the HTTP server.
You could do something (with the cooperation of the HTTP server still) that could do something approximating it:
Create a unique ID (UUID or the like)
Send a request to the server that contains that UUID along with the other account info (name, password, whatever else)
The server then only creates the account if it hasn't already created an account with the same unique ID.
That way, you can request the operation multiple times, but know that it will only actually be implemented once. If asked to do the same operation a second time, the server would simply respond with "yep, already did that".

Prevent python2's TCPThreadingServer from calling handle on the request handler, on multiple requests

I'm trying to write a Network Application using Python's socket and SocketServer Modules.
In the Network Model, there are only clients (nodes).
Each node is connected to some other nodes(Neighbours) and can interchange "messages" with them.
There are two types of messages request_data and response_data, the response_data string is a message generated based on a request_data message (messages are basically two line strings).
In order for a Node to generate a response_data message, it must send request_data messages to the nodes it's connected to, and generate the response_data based on the received data.
I'm implementing these Connections using TCP i.e: when two nodes are connected (using socket.connect() and socket.accept()) they will stay connected and will pass messages from the same connection.
Now here's the problem.
I've implemented the Nodes using SocketServer.ThreadingTCPServer and a custom request handler so when a Node gets a request_data he sends response_datas to it's Neighbours, but when he gets the responses, the ThreadingTCPServer might capture it as a new request, (I assume that's how select.select works when there's data to be read) and I might not be able to get the response message from where I sent the request message, because instead a new request handler has been instantiated by the ThreadingTCPServer.
Basically I'm doing this in my request handler and I'm afraid it might not work:
# conn : a connected socket object created from socket.accept
conn.sendAll(requestMessage)
# I think this will not work because it might be considered a new request by the ThreadingTCPServer
response = conn.recv(1024)
I haven't actually tried this, and don't know if it will work or not, however even if it works for some limited tests I can't be sure it will always work since the problem(if it does in fact exist) stems from a race condition.
So does this work?if not what are some other approaches I can take without reinventing the wheel.

This approach does indeed work, since TCP will open seperate ports for the conversation between the Nodes, and the it will have nothing to do with the port the server is listening on.

Python HTTP client with request pipelining

The problem: I need to send many HTTP requests to a server. I can only use one connection (non-negotiable server limit). The server's response time plus the network latency is too high – I'm falling behind.
The requests typically don't change server state and don't depend on the previous request's response. So my idea is to simply send them on top of each other, enqueue the response objects, and depend on the Content-Length: of the incoming responses to feed incoming replies to the next-waiting response object. In other words: Pipeline the requests to the server.
This is of course not entirely safe (any reply without Content-Length: means trouble), but I don't care -- in that case I can always retry any queued requests. (The safe way would be to wait for the header before sending the next bit. That'd might help me enough. No way to test beforehand.)
So, ideally I want the following client code (which uses client delays to mimic network latency) to run in three seconds.
Now for the $64000 question: Is there a Python library which already does this, or do I need to roll my own? My code uses gevent; I could use Twisted if necessary, but Twisted's standard connection pool does not support pipelined requests. I also could write a wrapper for some C library if necessary, but I'd prefer native code.
#!/usr/bin/python
import gevent.pool
from gevent import sleep
from time import time
from geventhttpclient import HTTPClient
url = 'http://local_server/100k_of_lorem_ipsum.txt'
http = HTTPClient.from_url(url, concurrency=1)
def get_it(http):
print time(),"Queueing request"
response = http.get(url)
print time(),"Expect header data"
# Do something with the header, just to make sure that it has arrived
# (the greenlet should block until then)
assert response.status_code == 200
assert response["content-length"] > 0
for h in response.items():
pass
print time(),"Wait before reading body data"
# Now I can read the body. The library should send at
# least one new HTTP request during this time.
sleep(2)
print time(),"Reading body data"
while response.read(10000):
pass
print time(),"Processing my response"
# The next request should definitely be transmitted NOW.
sleep(1)
print time(),"Done"
# Run parallel requests
pool = gevent.pool.Pool(3)
for i in range(3):
pool.spawn(get_it, http)
pool.join()
http.close()

Dugong is an HTTP/1.1-only client which claims to support real HTTP/1.1 pipelining. The tutorial includes several examples on how to use it, including one using threads and another using asyncio.
Be sure to verify that the server you're communicating with actually supports HTTP/1.1 pipelining—some servers claim to support HTTP/1.1 but don't implement pipelining.

I think txrequests could get you most of what you are looking for, using the background_callback to en-queue processing of responses on a separate thread. Each request would still be it's own thread but using a session means by default it would reuse the same connection.
https://github.com/tardyp/txrequests#working-in-the-background

It seems you are running python2.
For python3 >= 3.5
you could use async/await loop
See asyncio
Also, there is a library built on top for better, easier use
called Trio, available on pip.
Another thing I can think of is multiple threads with locks.
I will think on how to better explain this or could it even work.

Twisted, FTP, and "streaming" large files

I'm attempting to implement what can best be described as "an FTP interface to an HTTP API". Essentially, there is an existing REST API that can be used to manage a user's files for a site, and I'm building a mediator server that re-exposes this API as an FTP server. So you can login with, say, Filezilla and list your files, upload new ones, delete old ones, etc.
I'm attempting this with twisted.protocols.ftp for the (FTP) server, and twisted.web.client for the (HTTP) client.
The thing I'm running up against is, when a user tries to download a file, "streaming" that file from an HTTP response to my FTP response. Similar for uploading.
The most straightforward approach would be to download the entire file from the HTTP server, then turn around and send the contents to the user. The problem with this is that any given file could be many gigabytes large (think drive images, ISO files, etc). With this approach, though, the contents of the file would be held in memory between the time I download it from the API and the time I send it to the user - not good.
So my solution is to try to "stream" it - as I get chunks of data from the API's HTTP response, I just want to turn around and send those chunks along to the FTP user. Seems straightforward.
For my "custom FTP functionality", I'm using a subclass of ftp.FTPShell. The reading method of this, openForReading, returns a Deferred that fires with an implementation of IReadFile.
Below is my (initial, simple) implementation for "streaming HTTP". I use the fetch function to setup an HTTP request, and the callback I pass in gets called with each chunk I get from the response.
I thought I could use some sort of two-ended buffer object to transport the chunks between the HTTP and FTP, by using the buffer object as the file-like object required by ftp._FileReader, but that's quickly proving not to work, as the consumer from the send call almost immediately closes the buffer (because it's returning an empty string, because there's no data to read yet, etc). Thus, I'm "sending" empty files before I even start receiving the HTTP response chunks.
Am I close, but missing something? Am I on the wrong path altogether? Is what I want to do really impossible (I highly doubt that)?
from twisted.web import client
import urlparse
class HTTPStreamer(client.HTTPPageGetter):
def __init__(self):
self.callbacks = []
def addHandleResponsePartCallback(self, callback):
self.callbacks.append(callback)
def handleResponsePart(self, data):
for cb in self.callbacks:
cb(data)
client.HTTPPageGetter.handleResponsePart(self, data)
class HTTPStreamerFactory(client.HTTPClientFactory):
protocol = HTTPStreamer
def __init__(self, *args, **kwargs):
client.HTTPClientFactory.__init__(self, *args, **kwargs)
self.callbacks = []
def addChunkCallback(self, callback):
self.callbacks.append(callback)
def buildProtocol(self, addr):
p = client.HTTPClientFactory.buildProtocol(self, addr)
for cb in self.callbacks:
p.addHandleResponsePartCallback(cb)
return p
def fetch(url, callback):
parsed = urlparse.urlsplit(url)
f = HTTPStreamerFactory(parsed.path)
f.addChunkCallback(callback)
from twisted.internet import reactor
reactor.connectTCP(parsed.hostname, parsed.port or 80, f)
As a side note, this is only my second day with Twisted - I spent most of yesterday reading through Dave Peticolas' Twisted Introduction, which has been a great starting point, even if based on an older version of twisted.
That said, I may be doing things wrong.

I thought I could use some sort of two-ended buffer object to transport the chunks between the HTTP and FTP, by using the buffer object as the file-like object required by ftp._FileReader, but that's quickly proving not to work, as the consumer from the send call almost immediately closes the buffer (because it's returning an empty string, because there's no data to read yet, etc). Thus, I'm "sending" empty files before I even start receiving the HTTP response chunks.
Instead of using ftp._FileReader, you want something that will do a write whenever a chunk arrives from your HTTPStreamer to a callback it supplies. You never need/want to do a read from a buffer on the HTTP, because there's no reason to even have such a buffer. As soon as HTTP bytes arrive, write them to the consumer. Something like...
class FTPStreamer(object):
implements(IReadFile)
def __init__(self, url):
self.url = url
def send(self, consumer):
fetch(url, consumer.write)
# You also need a Deferred to return here, so the
# FTP implementation knows when you're done.
return someDeferred
You may also want to use Twisted's producer/consumer interface to allow the transfer to be throttled, as may be necessary if your connection to the HTTP server is faster than your user's FTP connection to you.

How can I make an http request without getting back an http response in Python?

I want to send it and forget it. The http rest service call I'm making takes a few seconds to respond. The goal is to avoid waiting those few seconds before more code can execute.
I'd rather not use python threads
I'll use twisted async calls if I must and ignore the response.

You are going to have to implement that asynchronously as HTTP protocol states you have a request and a reply.
Another option would be to work directly with the socket, bypassing any pre-built module. This would allow you to violate protocol and write your own bit that ignores any responses, in essence dropping the connection after it has made the request.

HTTP implies a request and a reply for that request. Go with an async approach.

You do not need twisted for this, just urllib will do. See http://pythonquirks.blogspot.com/2009/12/asynchronous-http-request.html
I am copying the relevant code here but the credit goes to that link:
import urllib2
class MyHandler(urllib2.HTTPHandler):
def http_response(self, req, response):
return response
o = urllib2.build_opener(MyHandler())
o.open('http://www.google.com/')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.