Creating a processing queue in Tornado

Creating a processing queue in Tornado - python

I'm using a Tornado web server to queue up items that need to be processed outside of the request/response cycle.
In my simplified example below, every time a request comes in, I add a new string to a list called queued_items. I want to create something that will watch that list and process the items as they show up in it.
(In my real code, the items are processed and sent over a TCP socket which may or may not be connected when the web request arrives. I want the web server to keep queuing up items regardless of the socket connection)
I'm trying to keep this code simple and not use external queues/programs like Redis or Beanstalk. It's not going to have very high volume.
What's a good way using Tornado idioms to watch the client.queued_items list for new items and process them as they arrive?
import time
import tornado.ioloop
import tornado.gen
import tornado.web
class Client():
def __init__(self):
self.queued_items = []
#tornado.gen.coroutine
def watch_queue(self):
# I have no idea what I'm doing
items = yield client.queued_items
# go_do_some_thing_with_items(items)
class IndexHandler(tornado.web.RequestHandler):
def get(self):
client.queued_items.append("%f" % time.time())
self.write("Queued a new item")
if __name__ == "__main__":
client = Client()
# Watch the queue for when new items show up
client.watch_queue()
# Create the web server
application = tornado.web.Application([
(r'/', IndexHandler),
], debug=True)
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()

There is a library called toro, which provides synchronization primitives for tornado. [Update: As of tornado 4.2, toro has been merged into tornado.]
Sounds like you could just use a toro.Queue (or tornado.queues.Queue in tornado 4.2+) to handle this:
import time
import toro
import tornado.ioloop
import tornado.gen
import tornado.web
class Client():
def __init__(self):
self.queued_items = toro.Queue()
#tornado.gen.coroutine
def watch_queue(self):
while True:
items = yield self.queued_items.get()
# go_do_something_with_items(items)
class IndexHandler(tornado.web.RequestHandler):
#tornado.gen.coroutine
def get(self):
yield client.queued_items.put("%f" % time.time())
self.write("Queued a new item")
if __name__ == "__main__":
client = Client()
# Watch the queue for when new items show up
tornado.ioloop.IOLoop.current().add_callback(client.watch_queue)
# Create the web server
application = tornado.web.Application([
(r'/', IndexHandler),
], debug=True)
application.listen(8888)
tornado.ioloop.IOLoop.current().start()
There are a few tweaks required, aside from switching the data structure from a list to a toro.Queue:
We need to schedule watch_queue to run inside the IOLoop using add_callback, rather than trying to call it directly outside of an IOLoop context.
IndexHandler.get needs to be converted to a coroutine, because toro.Queue.put is a coroutine.
I also added a while True loop to watch_queue, so that it will run forever, rather than just processing one item and then exiting.

Related

How to avoid ReactorNotRestartable in Autobahn Python

I have a python based page which recieves data by POST, which is then forwarded to the Crossbar server using Autobahn (Wamp). It works well the first 1-2 times but when it's called again after that it throws ReactorNotRestartable.
Now, I need this to work whichever way possible, either by reusing this "Reactor" based on a conditional check or by stopping it properly after every run. (The first one would be preferable because it might reduce the execution time)
Thanks for your help!
Edit:
This is in a webpage (Django View) so it needs to run as many times as the page is loaded/data is sent to it via POST.
from twisted.internet import reactor
from twisted.internet.defer import inlineCallbacks
from twisted.internet.endpoints import TCP4ClientEndpoint
from twisted.application.internet import ClientService
from autobahn.wamp.types import ComponentConfig
from autobahn.twisted.wamp import ApplicationSession, WampWebSocketClientFactory
class MyAppSession(ApplicationSession):
def __init__(self, config):
ApplicationSession.__init__(self, config)
def onConnect(self):
self.join(self.config.realm)
def onChallenge(self, challenge):
pass
#inlineCallbacks
def onJoin(self, details):
yield self.call('receive_data', data=message)
yield self.leave()
def onLeave(self, details):
self.disconnect()
def onDisconnect(self):
reactor.stop()
message = "data from POST[]"
session = MyAppSession(ComponentConfig('realm_1', {}))
transport = WampWebSocketClientFactory(session, url='ws://127.0.0.1:8080')
endpoint = TCP4ClientEndpoint(reactor, '127.0.0.1', 8080)
service = ClientService(endpoint, transport)
service.startService()
reactor.run()
I figured out a probably hacky-and-not-so-good way by using multiprocessing and putting reactor.stop() inside onJoin() right after the function call. This way I don't have to bother with the "twisted running in the main thread" thing because its process gets killed as soon as my work is done.
Is there a better way?

Handling stdin with tornado

How to listen for events that happen on stdin in Tornado loop?
In particular, in a tornado-system, I want to read from stdin, react on it, and terminate if stdin closes. At the same time, the Tornado web service is running on the same process.
While looking for this, the most similar I could find was handling streams of an externally spawned process. However, this is not what I want: I want to handle i/o stream of the current process, i.e. the one that has the web server.
Structurally, my server is pretty much hello-world tornado, so we can base the example off that. I just need to add an stdin handler.

You can use the add_handler method on the IOLoop instance to watch for events on stdin.
Here's a minimal working example:
from tornado.ioloop import IOLoop
from tornado.web import Application, RequestHandler
import sys
class MainHandler(RequestHandler):
def get(self):
self.finish("foo")
application = Application([
(r"/", MainHandler),
])
def on_stdin(fd, events):
content = fd.readline()
print "received: %s" % content
if __name__ == "__main__":
application.listen(8888)
IOLoop.instance().add_handler(sys.stdin, on_stdin, IOLoop.READ)
IOLoop.instance().start()

python tornado - how to return real-time data

I am using the tornado library in python. I have a queue where data gets added in. I have to keep connection open so that when client requests I send out items from queue. Here is a simple implementation of mine. The problem I face is when I add new elements to queue, I don't see it being it returned. Infact, I don't see any code executed below IOLoop.current().start() line.
from tornado.ioloop import IOLoop
from tornado.web import RequestHandler, Application, url,asynchronous
from Queue import Queue
import json
q=Queue()
q.put("one")
q.put("two")
class HelloHandler(RequestHandler):
def get(self):
data=q.get()
self.write(data)
def make_app():
return Application([
url(r"/", HelloHandler),
])
def main():
app = make_app()
app.listen(8888)
IOLoop.current().start() # stops here
time.sleep(2)
q.put("three")
print q
if __name__=='__main__':
main()
first time on this :
http://localhost:8888/
returns "one"
second time on this:
http://localhost:8888/
return "two"
Third time on this"
http://localhost:8888/
blocking

The problem you have is that calling IOLoop.current().start() transfers control to Tornado. It loops until IOLoop.stop() is called.
If you need to do something after the IOLoop has started, then you can use one of the callbacks. For example, here is your code modified to use IOLoop.call_later(). You could also use IOLoop.add_timeout() if you are using an earlier (<4.0) version of Tornado.
from tornado.ioloop import IOLoop
from tornado.web import RequestHandler, Application, url,asynchronous
from Queue import Queue
import json
q=Queue()
q.put("one")
q.put("two")
class HelloHandler(RequestHandler):
def get(self):
data=q.get()
self.write(data)
def make_app():
return Application([
url(r"/", HelloHandler),
])
def main():
app = make_app()
app.listen(8888)
IOLoop.current().call_later(2, q.put, "three")
IOLoop.current().start()
if __name__=='__main__':
main()

Tornado and Autobahn-python listening on the same port

Recently I started a small personal project. It's a realtime web system based on asyncio and autobahn-python. However I also would like to serve some static files via HTTP and do it from the same process. My HTTP server is Tornado sitting on top of asyncio event loop and everything works perfectly fine except that I have to start tornado and autobahn handlers on different ports. Here is a stripped down version of what I currently have:
import six
import datetime
import asyncio
import tornado.web
import tornado.httpserver
import tornado.netutil
from tornado.platform.asyncio import AsyncIOMainLoop
from autobahn.wamp import router
from autobahn.asyncio import wamp, websocket
# WAMP server
class MyBackendComponent(wamp.ApplicationSession):
def onConnect(self):
self.join(u"realm1")
#asyncio.coroutine
def onJoin(self, details):
def utcnow():
now = datetime.datetime.utcnow()
return six.u(now.strftime("%Y-%m-%dT%H:%M:%SZ"))
reg = yield from self.register(utcnow, 'com.timeservice.now')
# HTTP server
class MainHandler(tornado.web.RequestHandler):
def get(self):
self.write("Hello, world!")
tornado_app = tornado.web.Application(
[
(r"/", MainHandler),
],
)
if __name__ == '__main__':
router_factory = router.RouterFactory()
session_factory = wamp.RouterSessionFactory(router_factory)
session_factory.add(MyBackendComponent())
transport_factory = websocket.WampWebSocketServerFactory(session_factory,
debug=True,
debug_wamp=True)
AsyncIOMainLoop().install()
tornado_app.listen(80, "127.0.0.1")
loop = asyncio.get_event_loop()
coro = loop.create_server(transport_factory, "127.0.0.1", 8080)
server = loop.run_until_complete(coro)
try:
loop.run_forever()
except KeyboardInterrupt:
pass
finally:
server.close()
loop.close()
Question: Is there the Right Way to make autobahn-wamp and tornado handlers listen on the same port?
My initial idea was to implement some kind of socket.socket wrapper and dispatch incoming messages there but it turned out to be awfully messy. I don't want to use any external proxies because the backend should be portable as much as possible.
Also I'm not asking anybody to implement it for me(but of course you can if you want to!) - only to know if somebody have already done something similar before diving into autobahn/tornado code.
Thanks in advance!
PS: Sorry for my poor English - it's not my mother tongue.

How to implement 'SomeAsyncWorker()' from Bottle Asynchronous Primer?

I have a bunch of long running scripts which do some number crunching and as they run write output to the console via print I want to invoke these scripts from a browser, and display the progress in the browser as they run. I'm currently playing with bottle and am working through this primer http://bottlepy.org/docs/dev/async.html# which is rather neat.
I'd like to try Event Callbacks http://bottlepy.org/docs/dev/async.html#event-callbacks as this seems to exactly match my problem, the script would run as an AsyncWorker (ideally managed by some message queue to limit the number running at any one instance) and periodically write back it's state. But I cannot figure out what SomeAsyncWorker() is - is it a tornado class or a gevent class I have to implement or something else?
#route('/fetch')
def fetch():
body = gevent.queue.Queue()
worker = SomeAsyncWorker()
worker.on_data(body.put)
worker.on_finish(lambda: body.put(StopIteration))
worker.start()
return body

I've found one way of doing this using gevent.queue here http://toastdriven.com/blog/2011/jul/31/gevent-long-polling-you/ which shouldn't be hard to adapt to work with bottle
# wsgi_longpolling/better_responses.py
from gevent import monkey
monkey.patch_all()
import datetime
import time
from gevent import Greenlet
from gevent import pywsgi
from gevent import queue
def current_time(body):
current = start = datetime.datetime.now()
end = start + datetime.timedelta(seconds=60)
while current < end:
current = datetime.datetime.now()
body.put('<div>%s</div>' % current.strftime("%Y-%m-%d %I:%M:%S"))
time.sleep(1)
body.put('</body></html>')
body.put(StopIteration)
def handle(environ, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
body = queue.Queue()
body.put(' ' * 1000)
body.put("<html><body><h1>Current Time:</h1>")
g = Greenlet.spawn(current_time, body)
return body
server = pywsgi.WSGIServer(('127.0.0.1', 1234), handle)
print "Serving on http://127.0.0.1:1234..."
server.serve_forever()

(Not exactly an answer to your question, but here's another tack you could take.)
I've cobbled together a very simple multi-threaded WSGI server that fits nicely under bottle. Here's an example:
import bottle
import time
from mtbottle import MTServer
app = bottle.Bottle()
#app.route('/')
def foo():
time.sleep(2)
return 'hello, world!\n'
app.run(server=MTServer, host='0.0.0.0', port=8080, thread_count=3)
# app is nonblocking; it will handle up to 3 requests concurrently.
# A 4th concurrent request will block until one of the first 3 completed.
https://github.com/RonRothman/mtwsgi
One down side is that all endpoints on that port will be asynchronous; in contrast, the gevent method (I think) gives you more control over which methods are asynchronous and which are synchronous.
Hope this helps!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating a processing queue in Tornado - python

Related

How to avoid ReactorNotRestartable in Autobahn Python

Handling stdin with tornado

python tornado - how to return real-time data

Tornado and Autobahn-python listening on the same port

How to implement 'SomeAsyncWorker()' from Bottle Asynchronous Primer?

Categories

Resources