Using python/tornado I wanted to set up a little "trampoline" server that allows two devices to communicate with each other in a RESTish manner. There's probably vastly superior/simpler "off the shelf" ways to do this. I'd welcome those suggestions, but I still feel it would be educational to figure out how to do my own using tornado.
Basically, the idea was that I would have the device in the role of server doing a longpoll with a GET. The client device would POST to the server, at which point the POST body would be transferred as the response of the blocked GET. Before the POST responded, it would block. The server side then does a PUT with the response, which is transferred to the blocked POST and return to the device. I thought maybe I could do this with tornado.queues. But that appears to not have worked out. My code:
import tornado
import tornado.web
import tornado.httpserver
import tornado.queues
ToServerQueue = tornado.queues.Queue()
ToClientQueue = tornado.queues.Queue()
class Query(tornado.web.RequestHandler):
def get(self):
toServer = ToServerQueue.get()
self.write(toServer)
def post(self):
toServer = self.request.body
ToServerQueue.put(toServer)
toClient = ToClientQueue.get()
self.write(toClient)
def put(self):
ToClientQueue.put(self.request.body)
self.write(bytes())
services = tornado.web.Application([(r'/query', Query)], debug=True)
services.listen(49009)
tornado.ioloop.IOLoop.instance().start()
Unfortunately, the ToServerQueue.get() does not actually block until the queue has an item, but rather returns a tornado.concurrent.Future. Which is not a legal value to pass to the self.write() call.
I guess my general question is twofold:
1) How can one HTTP verb invocation (e.g. get, put, post, etc) block and then be signaled by another HTTP verb invocation.
2) How can I share data from one invocation to another?
I've only really scratched the simple/straightforward use cases of making little REST servers with tornado. I wonder if the coroutine stuff is what I need, but haven't found a good tutorial/example of that to help me see the light, if that's indeed the way to go.
1) How can one HTTP verb invocation (e.g. get, put, post,u ne etc) block and then be signaled by another HTTP verb invocation.
2) How can I share data from one invocation to another?
The new RequestHandler object is created for every request. So you need some coordinator e.g. queues or locks with state object (in your case it would be re-implementing queue).
tornado.queues are queues for coroutines. Queue.get, Queue.put, Queue.join return Future objects, that need to be "resolved" - scheduled task done either with success or exception. To wait until future is resolved you should yielded it (just like in the doc examples of tornado.queues). The verbs method also need to be decorated with tornado.gen.coroutine.
import tornado.gen
class Query(tornado.web.RequestHandler):
#tornado.gen.coroutine
def get(self):
toServer = yield ToServerQueue.get()
self.write(toServer)
#tornado.gen.coroutine
def post(self):
toServer = self.request.body
yield ToServerQueue.put(toServer)
toClient = yield ToClientQueue.get()
self.write(toClient)
#tornado.gen.coroutine
def put(self):
yield ToClientQueue.put(self.request.body)
self.write(bytes())
The GET request will last (wait in non-blocking manner) until something will be available on the queue (or timeout that can be defined as Queue.get arg).
tornado.queues.Queue provides also get_nowait (there is put_nowait as well) that don't have to be yielded - returns immediately item from queue or throws exception.
Related
I have a Flask server that accepts HTTP requests from a client. This HTTP server needs to delegate work to a third-party server using a websocket connection (for performance reasons).
I find it hard to wrap my head around how to create a permanent websocket connection that can stay open for HTTP requests. Sending requests to the websocket server in a run-once script works fine and looks like this:
async def send(websocket, payload):
await websocket.send(json.dumps(payload).encode("utf-8"))
async def recv(websocket):
data = await websocket.recv()
return json.loads(data)
async def main(payload):
uri = f"wss://the-third-party-server.com/xyz"
async with websockets.connect(uri) as websocket:
future = send(websocket, payload)
future_r = recv(websocket)
_, output = await asyncio.gather(future, future_r)
return output
asyncio.get_event_loop().run_until_complete(main({...}))
Here, main() establishes a WSS connection and closes it when done, but how can I keep that connection open for incoming HTTP requests, such that I can call main() for each of those without re-establising the WSS connection?
The main problem there is that when you code a web app responding http(s), your code have a "life cycle" that is very peculiar to that: usually you have a "view" function that will get the request data, perform all actions needed to gather the response data and return it.
This "view" function in most web frameworks has to be independent from the rest of the system - it should be able to perform its duty relying on no other data or objects than what it gets when called - which are the request data, and system configurations - that gives the application server (the framework parts designed to actually connect your program to the internet) can choose a variety of ways to serve your program: they may run your view function in several parallel threads, or in several parallel processes, or even in different processes in various containers or physical servers: you application would not need to care about that.
If you want a resource that is available across calls to your view functions, you need to break out of this paradigm. For example, typically, frameworks will want to create a pool of database connections, so that views on the same process can re-use those connections. These database connections are usually supplied by the framework itself, which implements a mechanism for allowing then to be reused, and be available in a transparent way, as needed. You have to recreate a mechanism of the same sort if you want to keep a websocket connection alive.
In a certain way, you need a Python object that can mediate your websocket data behaving like a "server" for your web view functions.
That is simpler to do than it sounds - a special Python class designed to have a single instance per process, which keeps the connections, and is able to send and receive data received from parallel calls without mangling it is enough. A callable that will ensure this instance exists in the current process is enough to work under any strategy configured to serve your app to the web.
If you are using Flask, which does not use asyncio, you get a further complication - you will loose the async-ability inside your views, they will have to wait for the websocket requisition to be completed - it will then be the job of your application server to have your view in different threads or processes to ensure availability. And, it is your job to have the asyncio loop for your websocket running in a separate thread, so that it can make the requests it needs.
Here is some example code.
Please note that apart from using a single websocket per process,
this has no provisions in case of failure of any kind, but,
most important: it does nothing in parallel: all
pairs of send-recv are blocking, as you give no clue of
a mechanism that would allow one to pair each outgoing message
with its response.
import asyncio
import threading
from queue import Queue
class AWebSocket:
instance = None
def __new__(cls, *args, **kw):
if cls.instance:
return cls.instance
return super().__new__(cls, *args, **kw)
def __init__(self, *args, **kw):
cls = self.__class__
if cls.instance:
# init will be called even if new finds the existing instance,
# so we have to check again
return
self.outgoing = Queue()
self.responses = Queue()
self.socket_thread = threading.Thread(target=self.start_socket)
self.socket_thread.start()
def start_socket():
# starts an async loop in a separate thread, and keep
# the web socket running, in this separate thread
asyncio.get_event_loop().run_until_complete(self.core())
def core(self):
self.socket = websockets.connect(uri)
async def _send(self, websocket, payload):
await websocket.send(json.dumps(payload).encode("utf-8"))
async def _recv(self, websocket):
data = await websocket.recv()
return json.loads(data)
async def core(self):
uri = f"wss://the-third-party-server.com/xyz"
async with websockets.connect(uri) as websocket:
self.websocket = websocket
while True:
# This code is as you wrote it:
# it essentially blocks until a message is sent
# and the answer is received back.
# You have to have a mechanism in your websocket
# messages allowing you to identify the corresponding
# answer to each request. On doing so, this is trivially
# paralellizable simply by calling asyncio.create_task
# instead of awaiting on asyncio.gather
payload = self.outgoing.get()
future = self._send(websocket, payload)
future_r = self._recv(websocket)
_, response = await asyncio.gather(future, future_r)
self.responses.put(response)
def send(self, payload):
# This is the method you call from your views
# simply do:
# `output = AWebSocket().send(payload)`
self.outgoing.put(payload)
return self.responses.get()
I would like to implement a Twisted server that expects XML requests and sends XML responses in return:
<request type='type 01'><content>some request content</content></request>
<response type='type 01'><content>some response content</content></response>
<request type='type 02'><content>other request content</content></request>
<response type='type 02'><content>other response content</content></response>
I have created a Twisted client & server before that exchanged simple strings and tried to extend that to using XML, but I can't seem to figure out how to set it all up correctly.
client.py:
#!/usr/bin/env python
# encoding: utf-8
from twisted.internet import reactor
from twisted.internet.endpoints import TCP4ClientEndpoint, connectProtocol
from twisted.words.xish.domish import Element, IElement
from twisted.words.xish.xmlstream import XmlStream
class XMLClient(XmlStream):
def sendObject(self, obj):
if IElement.providedBy(obj):
print "[TX]: %s" % obj.toXml()
else:
print "[TX]: %s" % obj
self.send(obj)
def gotProtocol(p):
request = Element((None, 'request'))
request['type'] = 'type 01'
request.addElement('content').addContent('some request content')
p.sendObject(request)
request = Element((None, 'request'))
request['type'] = 'type 02'
request.addElement('content').addContent('other request content')
reactor.callLater(1, p.sendObject, request)
reactor.callLater(2, p.transport.loseConnection)
endpoint = TCP4ClientEndpoint(reactor, '127.0.0.1', 12345)
d = connectProtocol(endpoint, XMLClient())
d.addCallback(gotProtocol)
from twisted.python import log
d.addErrback(log.err)
reactor.run()
As in the earlier string-based approach mentioned, the client idles until CTRL+C. Once I have this going, it will draw some / a lot of inspiration from the Twisted XMPP example.
server.py:
#!/usr/bin/env python
# encoding: utf-8
from twisted.internet import reactor
from twisted.internet.endpoints import TCP4ServerEndpoint
from twisted.words.xish.xmlstream import XmlStream, XmlStreamFactory
from twisted.words.xish.xmlstream import STREAM_CONNECTED_EVENT, STREAM_START_EVENT, STREAM_END_EVENT
REQUEST_CONTENT_EVENT = intern("//request/content")
class XMLServer(XmlStream):
def __init__(self):
XmlStream.__init__(self)
self.addObserver(STREAM_CONNECTED_EVENT, self.onConnected)
self.addObserver(STREAM_START_EVENT, self.onRequest)
self.addObserver(STREAM_END_EVENT, self.onDisconnected)
self.addObserver(REQUEST_CONTENT_EVENT, self.onRequestContent)
def onConnected(self, xs):
print 'onConnected(...)'
def onDisconnected(self, xs):
print 'onDisconnected(...)'
def onRequest(self, xs):
print 'onRequest(...)'
def onRequestContent(self, xs):
print 'onRequestContent(...)'
class XMLServerFactory(XmlStreamFactory):
protocol = XMLServer
endpoint = TCP4ServerEndpoint(reactor, 12345, interface='127.0.0.1')
endpoint.listen(XMLServerFactory())
reactor.run()
client.py output:
TX [127.0.0.1]: <request type='type 01'><content>some request content</content></request>
TX [127.0.0.1]: <request type='type 02'><content>other request content</content></request>
server.py output:
onConnected(...)
onRequest(...)
onDisconnected(...)
My questions:
How do I subscribe to an event fired when the server encounters a certain XML tag ? The //request/content XPath query seems ok to me, but onRequestContent(...) does not get called :-(
Is subclassing XmlStream and XmlStreamFactory a reasonable approach at all ? It feels weird because XMLServer subscribes to events sent by its own base class and is then passed itself (?) as xs parameter ?!? Should I rather make XMLServer an ordinary class and have an XmlStream object as class member ? Is there a canonical approach ?
How would I add an error handler to the server like addErrback(...) in the client ? I'm worried exceptions get swallowed (happened before), but I don't see where to get a Deferred from to attach it to...
Why does the server by default close the connection after the first request ? I see XmlStream.onDocumentEnd(...) calling loseConnection(); I could override that method, but I wonder if there's a reason for the closing I don't see. Is it not the 'normal' approach to leave the connection open until all communication necessary for the moment has been carried out ?
I hope this post isn't considered too specific; talking XML over the network is commonplace, but despite searching for a day and a half, I was unable to find any Twisted XML server examples. Maybe I manage to turn this into a jumpstart for anyone in the future with similar questions...
This is mostly a guess but as far as I know you need to open the stream by sending a stanza without closing it.
In your example when you send <request type='type 01'><content>some request content</content></request> the server sees the <request> stanza as the start document but then you send </request> and the server will see that as the end document.
Basically, your server consumes <request> as the start document and that's also why your xpath, //request/content, will not match, because all that's left of the element is <content>...</content>.
Try sending something like <stream> from the client first, then the two requests and then </stream>.
Also, subclassing XmlStream is fine as long as you make sure you don't override any methods by default.
The "only" relevant component of XmlStream is the SAX parser. Here's how I've implemented an asynchronous SAX parser using XmlStream and only the XML parsing functions:
server.py
from twisted.words.xish.domish import Element
from twisted.words.xish.xmlstream import XmlStream
class XmlServer(XmlStream):
def __init__(self):
XmlStream.__init__(self) # possibly unnecessary
def dataReceived(self, data):
""" Overload this function to simply pass the incoming data into the XML parser """
try:
self.stream.parse(data) # self.stream gets created after self._initializestream() is called
except Exception as e:
self._initializeStream() # reinit the DOM so other XML can be parsed
def onDocumentStart(self, elementRoot):
""" The root tag has been parsed """
print('Root tag: {0}'.format(elementRoot.name))
print('Attributes: {0}'.format(elementRoot.attributes))
def onElement(self, element):
""" Children/Body elements parsed """
print('\nElement tag: {0}'.format(element.name))
print('Element attributes: {0}'.format(element.attributes))
print('Element content: {0}'.format(str(element)))
def onDocumentEnd(self):
""" Parsing has finished, you should send your response now """
response = domish.Element(('', 'response'))
response['type'] = 'type 01'
response.addElement('content', content='some response content')
self.send(response.toXml())
Then you create a Factory class that will produce this Protocol (which you've demonstrated you're capable of). Basically, you will get all your information from the XML in the onDocumentStart and onElement functions and when you've reached the end (ie. onDocumentEnd) you will send a response based on the parsed information. Also, be sure you call self._initializestream() after parsing each XML message or else you'll get an exception. That should serve as a good skeleton for you.
My answers to your questions:
Don't know :)
It's very reasonable. However I usually just subclass XmlStream (which simply inherits from Protocol) and then use a regular Factory object.
This is a good thing to worry about when using Twisted (+1 for you). Using the approach above, you could fire callbacks/errbacks as you parse and hit an element or wait till you get to the end of the XML then fire your callbacks to your hearts desire. I hope that makes sense :/
I've wondered this too actually. I think it has something to do with the applications and protocols that use the XmlStream object (such as Jabber and IRC). Just overload onDocumentEnd and make it do what you want it to do. That's the beauty of OOP.
Reference:
xml.sax
iterparse
twisted.web.sux: Twisted XML SAX parser. This is actually what XmlStream uses to parse XML.
xml.etree.cElementTree.iterparse: Here's another Stackoverflow question - ElementTree iterparse strategy
iterparse is throwing 'no element found: line 1, column 0' and I'm not sure why - I've asked a similar question :)
PS
Your problem is quite common and very simple to solve (at least in my opinion) so don't kill yourself trying to learn the Event Dipatcher model. Actually it seems you have a good handle on callbacks and errbacks (aka Deferred), so I suggest you stick to those and avoid the dispatcher.
How can I 'throw' deferred's into the reactor so it gets handled somewhere down the road?
Situation
I have 2 programs running on localhost.
A twisted jsonrpc service (localhost:30301)
A twisted webservice (localhost:4000)
When someone connects to the webservice, It needs to send a query to the jsonrpc service, wait for it to come back with a result, then display the result in the web browser of the user (returning the value of the jsonrpc call).
I can't seem to figure out how to return the value of the deferred jsonrpc call. When I visit the webservice with my browser I get a HTML 500 error code (did not return any byte) and Value: < Deferred at 0x3577b48 >.
It returns the deferred object and not the actual value of the callback.
Been looking around for a couple of hours and tried a lot of different variations before asking.
from txjsonrpc.web.jsonrpc import Proxy
from twisted.web import resource
from twisted.web.server import Site
from twisted.internet import reactor
class Rpc():
def __init__(self, child):
self._proxy = Proxy('http://127.0.0.1:30301/%s' % child)
def execute(self, function):
return self._proxy.callRemote(function)
class Server(resource.Resource):
isLeaf = True
def render_GET(self, request):
rpc = Rpc('test').execute('test')
def test(result):
return '<h1>%s</h1>' % result
rpc.addCallback(test)
return rpc
site = Site(Server())
reactor.listenTCP(4000, site)
print 'Running'
reactor.run()
The problem you're having here is that web's IResource is a very old interface, predating even Deferred.
The quick solution to your problem is to use Klein, which provides a nice convenient high-level wrapper around twisted.web for writing web applications, among other things, adding lots of handling for Deferreds throughout the API.
The slightly more roundabout way to address it is to read the chapter of the Twisted documentation that is specifically about asynchronous responses in twisted.web.
Python's popular Requests library is said to be thread-safe on its home page, but no further details are given. If I call requests.session(), can I then safely pass this object to multiple threads like so:
session = requests.session()
for i in xrange(thread_count):
threading.Thread(
target=target,
args=(session,),
kwargs={}
)
and make requests using the same connection pool in multiple threads?
If so, is this the recommended approach, or should each thread be given its own connection pool? (Assuming the total size of all the individual connection pools summed to the size of what would be one big connection pool, like the one above.) What are the pros and cons of each approach?
After reviewing the source of requests.session, I'm going to say the session object might be thread-safe, depending on the implementation of CookieJar being used.
Session.prepare_request reads from self.cookies, and Session.send calls extract_cookies_to_jar(self.cookies, ...), and that calls jar.extract_cookies(...) (jar being self.cookies in this case).
The source for Python 2.7's cookielib acquires a lock (threading.RLock) while it updates the jar, so it appears to be thread-safe. On the other hand, the documentation for cookielib says nothing about thread-safety, so maybe this feature should not be depended on?
UPDATE
If your threads are mutating any attributes of the session object such as headers, proxies, stream, etc. or calling the mount method or using the session with the with statement, etc. then it is not thread-safe.
https://github.com/psf/requests/issues/1871 implies that Session is not thread-safe, and that at least one maintainer recommends one Session per thread.
I just opened https://github.com/psf/requests/issues/2766 to clarify the documentation.
I also faced the same question and went to the source code to find a suitable solution for me.
In my opinion Session class generally has various problems.
It initializes the default HTTPAdapter in the constructor and leaks it if you mount another one to 'http' or 'https'.
HTTPAdapter implementation maintains the connection pool, I think it is not something to create on each Session object instantiation.
Session closes HTTPAdapter, thus you can't reuse the connection pool between different Session instances.
Session class doesn't seem to be thread safe according to various discussions.
HTTPAdapter internally uses the urlib3.PoolManager. And I didn't find any obvious problem related to the thread safety in the source code, so I would rather trust the documentation, which says that urlib3 is thread safe.
As the conclusion from the above list I didn't find anything better than overriding Session class
class HttpSession(Session):
def __init__(self, adapter: HTTPAdapter):
self.headers = default_headers()
self.auth = None
self.proxies = {}
self.hooks = default_hooks()
self.params = {}
self.stream = False
self.verify = True
self.cert = None
self.max_redirects = DEFAULT_REDIRECT_LIMIT
self.trust_env = True
self.cookies = cookiejar_from_dict({})
self.adapters = OrderedDict()
self.mount('https://', adapter)
self.mount('http://', adapter)
def close(self) -> None:
pass
And creating the connection factory like:
class HttpSessionFactory:
def __init__(self,
pool_max_size: int = DEFAULT_CONNECTION_POOL_MAX_SIZE,
retry: Retry = DEFAULT_RETRY_POLICY):
self.__http_adapter = HTTPAdapter(pool_maxsize=pool_max_size, max_retries=retry)
def session(self) -> Session:
return HttpSession(self.__http_adapter)
def close(self):
self.__http_adapter.close()
Finally, somewhere in the code I can write:
with self.__session_factory.session() as session:
response = session.get(request_url)
And all my session instances will reuse the same connection pool.
And somewhere at the end when the application stops I can close the HttpSessionFactory.
Hope this will help somebody.
Since nobody provided a solution to this post plus the fact that I desperately need a workaround, here is my situation and some abstract solutions/ideas for debate.
My stack:
Tornado
Celery
MongoDB
Redis
RabbitMQ
My problem: Find a way for Tornado to dispatch a celery task ( solved ) and then asynchronously gather the result ( any ideas? ).
Scenario 1: (request/response hack plus webhook)
Tornado receives a (user)request, then saves in local memory (or in Redis) a { jobID : (user)request} to remember where to propagate the response, and fires a celery task with jobID
When celery completes the task, it performs a webhook at some url and tells tornado that this jobID has finished ( plus the results )
Tornado retrieves the (user)request and forwards a response to the (user)
Can this happen? Does it have any logic?
Scenario 2: (tornado plus long-polling)
Tornado dispatches the celery task and returns some primary json data to the client (jQuery)
jQuery does some long-polling upon receipt of the primary json, say, every x microseconds, and tornado replies according to some database flag. When the celery task completes, this database flag is set to True, then jQuery "loop" is finished.
Is this efficient?
Any other ideas/schemas?
My solution involves polling from tornado to celery:
class CeleryHandler(tornado.web.RequestHandlerr):
#tornado.web.asynchronous
def get(self):
task = yourCeleryTask.delay(**kwargs)
def check_celery_task():
if task.ready():
self.write({'success':True} )
self.set_header("Content-Type", "application/json")
self.finish()
else:
tornado.ioloop.IOLoop.instance().add_timeout(datetime.timedelta(0.00001), check_celery_task)
tornado.ioloop.IOLoop.instance().add_timeout(datetime.timedelta(0.00001), check_celery_task)
Here is post about it.
Here is our solution to the problem. Since we look for result in several handlers in our application we made the celery lookup a mixin class.
This also makes code more readable with the tornado.gen pattern.
from functools import partial
class CeleryResultMixin(object):
"""
Adds a callback function which could wait for the result asynchronously
"""
def wait_for_result(self, task, callback):
if task.ready():
callback(task.result)
else:
# TODO: Is this going to be too demanding on the result backend ?
# Probably there should be a timeout before each add_callback
tornado.ioloop.IOLoop.instance().add_callback(
partial(self.wait_for_result, task, callback)
)
class ARemoteTaskHandler(CeleryResultMixin, tornado.web.RequestHandler):
"""Execute a task asynchronously over a celery worker.
Wait for the result without blocking
When the result is available send it back
"""
#tornado.web.asynchronous
#tornado.web.authenticated
#tornado.gen.engine
def post(self):
"""Test the provided Magento connection
"""
task = expensive_task.delay(
self.get_argument('somearg'),
)
result = yield tornado.gen.Task(self.wait_for_result, task)
self.write({
'success': True,
'result': result.some_value
})
self.finish()
I stumbled upon this question and hitting the results backend repeatedly did not look optimal to me. So I implemented a Mixin similar to your Scenario 1 using Unix Sockets.
It notifies Tornado as soon as the task finishes (to be accurate, as soon as next task in chain runs) and only hits results backend once. Here is the link.
Now, https://github.com/mher/tornado-celery comes to rescue...
class GenAsyncHandler(web.RequestHandler):
#asynchronous
#gen.coroutine
def get(self):
response = yield gen.Task(tasks.sleep.apply_async, args=[3])
self.write(str(response.result))
self.finish()