Twisted XmlStream: How to connect to events? - python

I would like to implement a Twisted server that expects XML requests and sends XML responses in return:
<request type='type 01'><content>some request content</content></request>
<response type='type 01'><content>some response content</content></response>
<request type='type 02'><content>other request content</content></request>
<response type='type 02'><content>other response content</content></response>
I have created a Twisted client & server before that exchanged simple strings and tried to extend that to using XML, but I can't seem to figure out how to set it all up correctly.
client.py:
#!/usr/bin/env python
# encoding: utf-8
from twisted.internet import reactor
from twisted.internet.endpoints import TCP4ClientEndpoint, connectProtocol
from twisted.words.xish.domish import Element, IElement
from twisted.words.xish.xmlstream import XmlStream
class XMLClient(XmlStream):
def sendObject(self, obj):
if IElement.providedBy(obj):
print "[TX]: %s" % obj.toXml()
else:
print "[TX]: %s" % obj
self.send(obj)
def gotProtocol(p):
request = Element((None, 'request'))
request['type'] = 'type 01'
request.addElement('content').addContent('some request content')
p.sendObject(request)
request = Element((None, 'request'))
request['type'] = 'type 02'
request.addElement('content').addContent('other request content')
reactor.callLater(1, p.sendObject, request)
reactor.callLater(2, p.transport.loseConnection)
endpoint = TCP4ClientEndpoint(reactor, '127.0.0.1', 12345)
d = connectProtocol(endpoint, XMLClient())
d.addCallback(gotProtocol)
from twisted.python import log
d.addErrback(log.err)
reactor.run()
As in the earlier string-based approach mentioned, the client idles until CTRL+C. Once I have this going, it will draw some / a lot of inspiration from the Twisted XMPP example.
server.py:
#!/usr/bin/env python
# encoding: utf-8
from twisted.internet import reactor
from twisted.internet.endpoints import TCP4ServerEndpoint
from twisted.words.xish.xmlstream import XmlStream, XmlStreamFactory
from twisted.words.xish.xmlstream import STREAM_CONNECTED_EVENT, STREAM_START_EVENT, STREAM_END_EVENT
REQUEST_CONTENT_EVENT = intern("//request/content")
class XMLServer(XmlStream):
def __init__(self):
XmlStream.__init__(self)
self.addObserver(STREAM_CONNECTED_EVENT, self.onConnected)
self.addObserver(STREAM_START_EVENT, self.onRequest)
self.addObserver(STREAM_END_EVENT, self.onDisconnected)
self.addObserver(REQUEST_CONTENT_EVENT, self.onRequestContent)
def onConnected(self, xs):
print 'onConnected(...)'
def onDisconnected(self, xs):
print 'onDisconnected(...)'
def onRequest(self, xs):
print 'onRequest(...)'
def onRequestContent(self, xs):
print 'onRequestContent(...)'
class XMLServerFactory(XmlStreamFactory):
protocol = XMLServer
endpoint = TCP4ServerEndpoint(reactor, 12345, interface='127.0.0.1')
endpoint.listen(XMLServerFactory())
reactor.run()
client.py output:
TX [127.0.0.1]: <request type='type 01'><content>some request content</content></request>
TX [127.0.0.1]: <request type='type 02'><content>other request content</content></request>
server.py output:
onConnected(...)
onRequest(...)
onDisconnected(...)
My questions:
How do I subscribe to an event fired when the server encounters a certain XML tag ? The //request/content XPath query seems ok to me, but onRequestContent(...) does not get called :-(
Is subclassing XmlStream and XmlStreamFactory a reasonable approach at all ? It feels weird because XMLServer subscribes to events sent by its own base class and is then passed itself (?) as xs parameter ?!? Should I rather make XMLServer an ordinary class and have an XmlStream object as class member ? Is there a canonical approach ?
How would I add an error handler to the server like addErrback(...) in the client ? I'm worried exceptions get swallowed (happened before), but I don't see where to get a Deferred from to attach it to...
Why does the server by default close the connection after the first request ? I see XmlStream.onDocumentEnd(...) calling loseConnection(); I could override that method, but I wonder if there's a reason for the closing I don't see. Is it not the 'normal' approach to leave the connection open until all communication necessary for the moment has been carried out ?
I hope this post isn't considered too specific; talking XML over the network is commonplace, but despite searching for a day and a half, I was unable to find any Twisted XML server examples. Maybe I manage to turn this into a jumpstart for anyone in the future with similar questions...

This is mostly a guess but as far as I know you need to open the stream by sending a stanza without closing it.
In your example when you send <request type='type 01'><content>some request content</content></request> the server sees the <request> stanza as the start document but then you send </request> and the server will see that as the end document.
Basically, your server consumes <request> as the start document and that's also why your xpath, //request/content, will not match, because all that's left of the element is <content>...</content>.
Try sending something like <stream> from the client first, then the two requests and then </stream>.
Also, subclassing XmlStream is fine as long as you make sure you don't override any methods by default.

The "only" relevant component of XmlStream is the SAX parser. Here's how I've implemented an asynchronous SAX parser using XmlStream and only the XML parsing functions:
server.py
from twisted.words.xish.domish import Element
from twisted.words.xish.xmlstream import XmlStream
class XmlServer(XmlStream):
def __init__(self):
XmlStream.__init__(self) # possibly unnecessary
def dataReceived(self, data):
""" Overload this function to simply pass the incoming data into the XML parser """
try:
self.stream.parse(data) # self.stream gets created after self._initializestream() is called
except Exception as e:
self._initializeStream() # reinit the DOM so other XML can be parsed
def onDocumentStart(self, elementRoot):
""" The root tag has been parsed """
print('Root tag: {0}'.format(elementRoot.name))
print('Attributes: {0}'.format(elementRoot.attributes))
def onElement(self, element):
""" Children/Body elements parsed """
print('\nElement tag: {0}'.format(element.name))
print('Element attributes: {0}'.format(element.attributes))
print('Element content: {0}'.format(str(element)))
def onDocumentEnd(self):
""" Parsing has finished, you should send your response now """
response = domish.Element(('', 'response'))
response['type'] = 'type 01'
response.addElement('content', content='some response content')
self.send(response.toXml())
Then you create a Factory class that will produce this Protocol (which you've demonstrated you're capable of). Basically, you will get all your information from the XML in the onDocumentStart and onElement functions and when you've reached the end (ie. onDocumentEnd) you will send a response based on the parsed information. Also, be sure you call self._initializestream() after parsing each XML message or else you'll get an exception. That should serve as a good skeleton for you.
My answers to your questions:
Don't know :)
It's very reasonable. However I usually just subclass XmlStream (which simply inherits from Protocol) and then use a regular Factory object.
This is a good thing to worry about when using Twisted (+1 for you). Using the approach above, you could fire callbacks/errbacks as you parse and hit an element or wait till you get to the end of the XML then fire your callbacks to your hearts desire. I hope that makes sense :/
I've wondered this too actually. I think it has something to do with the applications and protocols that use the XmlStream object (such as Jabber and IRC). Just overload onDocumentEnd and make it do what you want it to do. That's the beauty of OOP.
Reference:
xml.sax
iterparse
twisted.web.sux: Twisted XML SAX parser. This is actually what XmlStream uses to parse XML.
xml.etree.cElementTree.iterparse: Here's another Stackoverflow question - ElementTree iterparse strategy
iterparse is throwing 'no element found: line 1, column 0' and I'm not sure why - I've asked a similar question :)
PS
Your problem is quite common and very simple to solve (at least in my opinion) so don't kill yourself trying to learn the Event Dipatcher model. Actually it seems you have a good handle on callbacks and errbacks (aka Deferred), so I suggest you stick to those and avoid the dispatcher.

Related

tornado one handler blocks for another

Using python/tornado I wanted to set up a little "trampoline" server that allows two devices to communicate with each other in a RESTish manner. There's probably vastly superior/simpler "off the shelf" ways to do this. I'd welcome those suggestions, but I still feel it would be educational to figure out how to do my own using tornado.
Basically, the idea was that I would have the device in the role of server doing a longpoll with a GET. The client device would POST to the server, at which point the POST body would be transferred as the response of the blocked GET. Before the POST responded, it would block. The server side then does a PUT with the response, which is transferred to the blocked POST and return to the device. I thought maybe I could do this with tornado.queues. But that appears to not have worked out. My code:
import tornado
import tornado.web
import tornado.httpserver
import tornado.queues
ToServerQueue = tornado.queues.Queue()
ToClientQueue = tornado.queues.Queue()
class Query(tornado.web.RequestHandler):
def get(self):
toServer = ToServerQueue.get()
self.write(toServer)
def post(self):
toServer = self.request.body
ToServerQueue.put(toServer)
toClient = ToClientQueue.get()
self.write(toClient)
def put(self):
ToClientQueue.put(self.request.body)
self.write(bytes())
services = tornado.web.Application([(r'/query', Query)], debug=True)
services.listen(49009)
tornado.ioloop.IOLoop.instance().start()
Unfortunately, the ToServerQueue.get() does not actually block until the queue has an item, but rather returns a tornado.concurrent.Future. Which is not a legal value to pass to the self.write() call.
I guess my general question is twofold:
1) How can one HTTP verb invocation (e.g. get, put, post, etc) block and then be signaled by another HTTP verb invocation.
2) How can I share data from one invocation to another?
I've only really scratched the simple/straightforward use cases of making little REST servers with tornado. I wonder if the coroutine stuff is what I need, but haven't found a good tutorial/example of that to help me see the light, if that's indeed the way to go.
1) How can one HTTP verb invocation (e.g. get, put, post,u ne etc) block and then be signaled by another HTTP verb invocation.
2) How can I share data from one invocation to another?
The new RequestHandler object is created for every request. So you need some coordinator e.g. queues or locks with state object (in your case it would be re-implementing queue).
tornado.queues are queues for coroutines. Queue.get, Queue.put, Queue.join return Future objects, that need to be "resolved" - scheduled task done either with success or exception. To wait until future is resolved you should yielded it (just like in the doc examples of tornado.queues). The verbs method also need to be decorated with tornado.gen.coroutine.
import tornado.gen
class Query(tornado.web.RequestHandler):
#tornado.gen.coroutine
def get(self):
toServer = yield ToServerQueue.get()
self.write(toServer)
#tornado.gen.coroutine
def post(self):
toServer = self.request.body
yield ToServerQueue.put(toServer)
toClient = yield ToClientQueue.get()
self.write(toClient)
#tornado.gen.coroutine
def put(self):
yield ToClientQueue.put(self.request.body)
self.write(bytes())
The GET request will last (wait in non-blocking manner) until something will be available on the queue (or timeout that can be defined as Queue.get arg).
tornado.queues.Queue provides also get_nowait (there is put_nowait as well) that don't have to be yielded - returns immediately item from queue or throws exception.

Python server for streaming request body content

I am trying to create python intellectual proxy-server that should be able for streaming large request body content from client to the some internal storages (that may be amazon s3, swift, ftp or something like this). Before streaming server should requests some internal API server that determines parameters for uploading to internal storages. The main restriction is that it should be done in one HTTP operation with method PUT. Also it should work asynchronously because there will be a lot of file uploads.
What solution allows me to read chunks from upload content and starts streaming this chunks to internal storages befor user will have uploaded whole file? All python web applications that I know wait for a whole content will be received before give management to the wsgi applications/python web server.
One of the solutions that I found is tornado fork https://github.com/nephics/tornado . But it is unofficial and tornado developers don't hurry to include it into the main branch.
So may be you know some existing solutions for my problem? Tornado? Twisted? gevents?
Here's an example of a server that does streaming upload handling written with Twisted:
from twisted.internet import reactor
from twisted.internet.endpoints import serverFromString
from twisted.web.server import Request, Site
from twisted.web.resource import Resource
from twisted.application.service import Application
from twisted.application.internet import StreamServerEndpointService
# Define a Resource class that doesn't really care what requests are made of it.
# This simplifies things since it lets us mostly ignore Twisted Web's resource
# traversal features.
class StubResource(Resource):
isLeaf = True
def render(self, request):
return b""
class StreamingRequestHandler(Request):
def handleContentChunk(self, chunk):
# `chunk` is part of the request body.
# This method is called as the chunks are received.
Request.handleContentChunk(self, chunk)
# Unfortunately you have to use a private attribute to learn where
# the content is being sent.
path = self.channel._path
print "Server received %d more bytes for %s" % (len(chunk), path)
class StreamingSite(Site):
requestFactory = StreamingRequestHandler
application = Application("Streaming Upload Server")
factory = StreamingSite(StubResource())
endpoint = serverFromString(reactor, b"tcp:8080")
StreamServerEndpointService(endpoint, factory).setServiceParent(application)
This is a tac file (put it in streamingserver.tac and run twistd -ny streamingserver.tac).
Because of the need to use self.channel._path this isn't a completely supported approach. The API overall is pretty clunky as well so this is more an example that it's possible than that it's good. There has long been an intent to make this sort of thing easier (http://tm.tl/288) but it will probably be a long while yet before this is accomplished.
It seems I have a solution using gevent library and monkey patching:
from gevent.monkey import patch_all
patch_all()
from gevent.pywsgi import WSGIServer
def stream_to_internal_storage(data):
pass
def simple_app(environ, start_response):
bytes_to_read = 1024
while True:
readbuffer = environ["wsgi.input"].read(bytes_to_read)
if not len(readbuffer) > 0:
break
stream_to_internal_storage(readbuffer)
start_response("200 OK", [("Content-type", "text/html")])
return ["hello world"]
def run():
config = {'host': '127.0.0.1', 'port': 45000}
server = WSGIServer((config['host'], config['port']), application=simple_app)
server.serve_forever()
if __name__ == '__main__':
run()
It works well when I try to upload huge file:
curl -i -X PUT --progress-bar --verbose --data-binary #/path/to/huge/file "http://127.0.0.1:45000"

Twisted: Wait for a deferred to 'finish'

How can I 'throw' deferred's into the reactor so it gets handled somewhere down the road?
Situation
I have 2 programs running on localhost.
A twisted jsonrpc service (localhost:30301)
A twisted webservice (localhost:4000)
When someone connects to the webservice, It needs to send a query to the jsonrpc service, wait for it to come back with a result, then display the result in the web browser of the user (returning the value of the jsonrpc call).
I can't seem to figure out how to return the value of the deferred jsonrpc call. When I visit the webservice with my browser I get a HTML 500 error code (did not return any byte) and Value: < Deferred at 0x3577b48 >.
It returns the deferred object and not the actual value of the callback.
Been looking around for a couple of hours and tried a lot of different variations before asking.
from txjsonrpc.web.jsonrpc import Proxy
from twisted.web import resource
from twisted.web.server import Site
from twisted.internet import reactor
class Rpc():
def __init__(self, child):
self._proxy = Proxy('http://127.0.0.1:30301/%s' % child)
def execute(self, function):
return self._proxy.callRemote(function)
class Server(resource.Resource):
isLeaf = True
def render_GET(self, request):
rpc = Rpc('test').execute('test')
def test(result):
return '<h1>%s</h1>' % result
rpc.addCallback(test)
return rpc
site = Site(Server())
reactor.listenTCP(4000, site)
print 'Running'
reactor.run()
The problem you're having here is that web's IResource is a very old interface, predating even Deferred.
The quick solution to your problem is to use Klein, which provides a nice convenient high-level wrapper around twisted.web for writing web applications, among other things, adding lots of handling for Deferreds throughout the API.
The slightly more roundabout way to address it is to read the chapter of the Twisted documentation that is specifically about asynchronous responses in twisted.web.

namespaces in SOAPpy not working as expected

I'm having an issue correctly interfacing with a SOAP API running on Axis2:
What happens is I should call the login method with two arguments (loginName and password) and it returns an authentication token that I will use for subsequent interaction.
#!/usr/bin/python
from SOAPpy import SOAPProxy
s_user = 'Administrator'
s_pass = 'securityThroughObscurity'
s_host = '192.168.76.130:8998'
namespace = 'http://bcc.inc.com/IncSecurity'
url = 'http://' + s_host + '/axis2/services/IncSecurityService'
DHCPServ = SOAPProxy(url, namespace)
DHCPServ.config.dumpSOAPOut = 1
DHCPServ.config.dumpSOAPIn = 1
DHCPResp = DHCPServ.login(loginName=s_user, password=s_pass)
The Axis2 server on the other side returns an XML error stating Data element of the OM Node is NULL. Looking at the Axis2 logs, I see the error is adb_login.c(383) non nillable or minOuccrs != 0 element loginName missing
I then packet captured the login XML from a known working Java client versus the XML from this client and these are the differences between the two:
SOAPpy:
<ns1:login xmlns:ns1="http://bcc.inc.com/IncSecurity" SOAP-ENC:root="1">
<password xsi:type="xsd:string">securityThroughObscurity</password>
<loginName xsi:type="xsd:string">Administrator</loginName>
</ns1:login>
Java:
<ns2:login xmlns:ns2="http://bcc.inc.com/IncSecurity">
<ns2:loginName>Administrator</ns2:loginName>
<ns2:password>securityThroughObscurity</ns2:password>
</ns2:login>
So this means that for some reason (probably related to my lack of knowledge in Python and SOAPpy) the namespace is not being applied to the variables being used in the login method, so by all accounts they don't actually exist and the error is warranted.
Also, it seems to be flipping the variables around and putting the password before loginName but I don't think that matters much.
What am I doing wrong?
Looks like it's a known bug in SOAPPy, someone has suggested a simple patch: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=523083
Alternately (assuming you have access to the service WSDL), SOAPPy lets you specify a WSDL instead of just a namespace. This looks like it will provide better namespace information to the envelope generation code. http://diveintopython.net/soap_web_services/introspection.html
Finally, if SOAPPy just isn't working for you, try Suds (it's better documented than SOAPPy).
from suds.client import Client
from suds.wsse import *
client = Client(WSDL_LOCATION)
guid = client.service.someFunctionName("a string argument", 42)
Good luck!

Pushing data once a URL is requested

Given, when a user requests /foo on my server, I send the following HTTP response (not closing the connection):
Content-Type: multipart/x-mixed-replace; boundary=-----------------------
-----------------------
Content-Type: text/html
foo
When the user goes to /bar (which will send 204 No Content so the view doesn't change), I want to send the following data in the initial response.
-----------------------
Content-Type: text/html
bar
How would I get the second request to trigger this from the initial response? I'm planning on possibly creating a fancy [engines that support multipart/x-mixed-replace (currently only Gecko)]-only email webapp that does server-push and Ajax effects without any JavaScript, just for fun.
No complete answer, but:
In your question, you're describing a Comet-style architecture. Regarding support of Comet-style techniques in Python/WSGI, there is a StackOverflow question, which talks about various Python servers with support for long-running requests a la Comet.
Also interesting is this mail thread in the Python Web-SIG: "Could WSGI handle Asynchronous response?". In May 2008, there was a broad discussion in the Web-SIG about the topic of asynchronous requests in WSGI.
A recent development is evserver, a lightweight WSGI server, which implements the Asynchronous WSGI extension proposed by Christopher Stawarz in the Web-SIG in May 2008.
Finally, the Tornado web server supports non-blocking asynchronous requests. It has a chat example application using long polling, which has similarities with your requirements.
If the problem is to pass some command from /bar application to /foo application and you are using some servlet-like approach (the Python code is loaded once and not for each request as in CGI), you can just change some class property of the /foo application and be ready to react to the change in the /foo instance (by checking the property state).
Obviously the /foo application should not return right after the first request and yield content line by line.
Thought this is just theory, I have not tried that myself.
I have created some small example (just for fun, you know :))
import threading
num = 0
cond = threading.Condition()
def app(environ, start_response):
global num
cond.acquire()
num += 1
cond.notifyAll()
cond.release()
start_response("200 OK", [("Content-Type", "multipart/x-mixed-replace; boundary=xxx")])
while True:
n = num
s = "--xxx\r\nContent-Type: text/html\r\n\r\n%s\n" % n
yield s
# wait for num change:
cond.acquire()
while num == n:
cond.wait()
cond.release()
from cherrypy.wsgiserver import CherryPyWSGIServer
server = CherryPyWSGIServer(("0.0.0.0", 3000), app)
try:
server.start()
except KeyboardInterrupt:
server.stop()
# Now whenever you visit http://127.0.0.1:3000/, the number increases.
# It also automatically increases in all previously opened windows/tabs.
The idea of a shared variable and thread synchronization (using condition variable object) is based on the fact that WSGI server provided by CherryPyWSGIServer is threaded.
Not sure if this is quite what you're looking for, but there is a fairly old way of doing server push using a mime content of multipart/x-mixed-replace
Basically you compose the response as a mime object with content type multipart/x-mixed-replace, and send the first "version" of a document down. The browser will keep the socket open.
Then as the server decides to push more data, a new "version" of the document gets sent from the server, and the browser will intelligently replace (within whatever frame/iframe contains the content) the content.
This was an early way of doing webcams, where the server would send down (push) image after image, and the browser would just keep replacing the image in the document over and over. This is also a way of doing a "Loading..." message over a single HTTP request.

Categories

Resources