Python how can I do a multithreading/asynchronous HTTP server with twisted - python

Now I wrote ferver by this tutorial:
https://twistedmatrix.com/documents/14.0.0/web/howto/web-in-60/asynchronous-deferred.html
But it seems to be good only for delayng process, not actually concurently process 2 or more requests. My full code is:
from twisted.internet.task import deferLater
from twisted.web.resource import Resource
from twisted.web.server import Site, NOT_DONE_YET
from twisted.internet import reactor, threads
from time import sleep
class DelayedResource(Resource):
def _delayedRender(self, request):
print 'Sorry to keep you waiting.'
request.write("<html><body>Sorry to keep you waiting.</body></html>")
request.finish()
def make_delay(self, request):
print 'Sleeping'
sleep(5)
return request
def render_GET(self, request):
d = threads.deferToThread(self.make_delay, request)
d.addCallback(self._delayedRender)
return NOT_DONE_YET
def main():
root = Resource()
root.putChild("social", DelayedResource())
factory = Site(root)
reactor.listenTCP(8880, factory)
print 'started httpserver...'
reactor.run()
if __name__ == '__main__':
main()
But when I passing 2 requests console output is like:
Sleeping
Sorry to keep you waiting.
Sleeping
Sorry to keep you waiting.
But if it was concurrent it should be like:
Sleeping
Sleeping
Sorry to keep you waiting.
Sorry to keep you waiting.
So the question is how to make twisted not to wait until response is finished before processing next?
Also make_delayIRL is a large function with heavi logic. Basically I spawn lot of threads and make requests to other urls and collecting results intro response, so it can take some time and not easly to be ported

Twisted processes everything in one event loop. If somethings blocks the execution, it also blocks Twisted. So you have to prevent blocking calls.
In your case you have time.sleep(5). It is blocking. You found the better way to do it in Twisted already: deferLater(). It returns a Deferred that will continue execution after the given time and release the events loop so other things can be done meanwhile. In general all things that return a deferred are good.
If you have to do heavy work that for some reason can not be deferred, you should use deferToThread() to execute this work in a thread. See https://twistedmatrix.com/documents/15.5.0/core/howto/threading.html for details.

You can use greenlents in your code (like threads).
You need to install the geventreactor - https://gist.github.com/yann2192/3394661
And use reactor.deferToGreenlet()
Also
In your long-calculation code need to call gevent.sleep() for change context to another greenlet.
msecs = 5 * 1000
timeout = 100
for xrange(0, msecs, timeout):
sleep(timeout)
gevent.sleep()

Related

How to avoid ReactorNotRestartable in Autobahn Python

I have a python based page which recieves data by POST, which is then forwarded to the Crossbar server using Autobahn (Wamp). It works well the first 1-2 times but when it's called again after that it throws ReactorNotRestartable.
Now, I need this to work whichever way possible, either by reusing this "Reactor" based on a conditional check or by stopping it properly after every run. (The first one would be preferable because it might reduce the execution time)
Thanks for your help!
Edit:
This is in a webpage (Django View) so it needs to run as many times as the page is loaded/data is sent to it via POST.
from twisted.internet import reactor
from twisted.internet.defer import inlineCallbacks
from twisted.internet.endpoints import TCP4ClientEndpoint
from twisted.application.internet import ClientService
from autobahn.wamp.types import ComponentConfig
from autobahn.twisted.wamp import ApplicationSession, WampWebSocketClientFactory
class MyAppSession(ApplicationSession):
def __init__(self, config):
ApplicationSession.__init__(self, config)
def onConnect(self):
self.join(self.config.realm)
def onChallenge(self, challenge):
pass
#inlineCallbacks
def onJoin(self, details):
yield self.call('receive_data', data=message)
yield self.leave()
def onLeave(self, details):
self.disconnect()
def onDisconnect(self):
reactor.stop()
message = "data from POST[]"
session = MyAppSession(ComponentConfig('realm_1', {}))
transport = WampWebSocketClientFactory(session, url='ws://127.0.0.1:8080')
endpoint = TCP4ClientEndpoint(reactor, '127.0.0.1', 8080)
service = ClientService(endpoint, transport)
service.startService()
reactor.run()
I figured out a probably hacky-and-not-so-good way by using multiprocessing and putting reactor.stop() inside onJoin() right after the function call. This way I don't have to bother with the "twisted running in the main thread" thing because its process gets killed as soon as my work is done.
Is there a better way?

concurrency of heavy tasks in tornado

my code:
import tornado.tcpserver
import tornado.ioloop
import itertools
import socket
import time
class Talk():
def __init__(self, id):
self.id = id
#tornado.gen.coroutine
def on_connect(self):
try:
while "connection alive":
self.said = yield self.stream.read_until(b"\n")
response = yield tornado.gen.Task(self.task) ### LINE 1
yield self.stream.write(response) ### LINE 2
except tornado.iostream.StreamClosedError:
print('error: socket closed')
return
#tornado.gen.coroutine
def task(self):
if self.id == 1:
time.sleep(3) # sometimes request is heavy blocking
return b"response"
#tornado.gen.coroutine
def on_disconnect(self):
yield []
class Server(tornado.tcpserver.TCPServer):
def __init__(self, io_loop=None, ssl_options=None, max_buffer_size=None):
tornado.tcpserver.TCPServer.__init__(self,
io_loop=io_loop,
ssl_options=ssl_options,
max_buffer_size=max_buffer_size)
self.talk_id_alloc = itertools.count(1)
return
#tornado.gen.coroutine
def handle_stream(self, stream, address):
talk_id = next(self.talk_id_alloc)
talk = Talk(talk_id)
stream.set_close_callback(talk.on_disconnect)
stream.socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
stream.socket.setsockopt(socket.IPPROTO_TCP, socket.SO_KEEPALIVE, 1)
talk.stream = stream
yield talk.on_connect()
return
Server().listen(8888)
tornado.ioloop.IOLoop.instance().start()
problem:
I need a tornado as tcp server - it looks like a good choice for handling many requests with low computation.
however:
99% of requests will last less than 0,05 sec, but
1% of them can last even 3 sec (special cases).
single response must be returned at once, not partially.
what is best aproach here?
how to achieve a code where LINE #1 is never blocking more than 0.1 sec
yield tornado.gen.with_timeout(
datetime.timedelta(seconds=0.1), tornado.gen.Task(self.task))
doesnt work form me - do nothing
tornado.ioloop.IOLoop.current().add_timeout(
datetime.timedelta(seconds=0.1),
lambda: result.set_exception(TimeoutError("Timeout")))
either nothing.
looking for better solutions:
task can detect if need high computation (API ...) - using timeout?,
then run/fork to another thread or even process
and send to tornado server execption - "receive" me later from results queue (consumer/producer)
i dont want case where timeout kill heavy task without saving results, and task is reopened within special wrapper - so consumer/producer pattern should be for all tasks?
adding new ioloop when current is blocked - how detect blocking?
I dont see any solution in tornado.
task in line #1 could be simple (~99%) or complicated, which can require:
I/O:
- disk/DB access
- ram/redis access
network:
- API call
CPU:
- algorithms, regex
(the worst task will do all of above).
I know what kind of task it is (the weight) only when I start doing it,
so appriopriate is use a task queue in separate threads.
I dont want delay simple/quick tasks.
so if you manage to cancel the heavy tasks, I recommend cancelling them with a time-out and then spawning them off to another thread. Performance-wise this is not ideal (GIL) but you prevent tornado from blocking - which is your ultimate goal.
A nice write-up about how this can be done can be found here: http://lbolla.info/blog/2013/01/22/blocking-tornado.
If you want to go further you could use something like celery where you can offload to other processes transparently - though this much heavier.

Python Tornado - Asynchronous Request is blocking

The request handlers are as follows:
class TestHandler(tornado.web.RequestHandler): # localhost:8888/test
#tornado.web.asynchronous
def get(self):
t = threading.Thread(target = self.newThread)
t.start()
def newThread(self):
print "new thread called, sleeping"
time.sleep(10)
self.write("Awake after 10 seconds!")
self.finish()
class IndexHandler(tornado.web.RequestHandler): # localhost:8888/
def get(self):
self.write("It is not blocked!")
self.finish()
When I GET localhost:8888/test, the page loads 10 seconds and shows Awake after 10 seconds; while it is loading, if I open localhost:8888/index in a new browser tab, the new index page is not blocked and loaded instantly. These fit my expectation.
However, while the /test is loading, if I open another /test in a new browser tab, it is blocked. The second /test only starts processing after the first has finished.
What mistakes have I made here?
What you are seeing is actually a browser limitation, not an issue with your code. I added some extra logging to your TestHandler to make this clear:
class TestHandler(tornado.web.RequestHandler): # localhost:8888/test
#tornado.web.asynchronous
def get(self):
print "Thread starting %s" % time.time()
t = threading.Thread(target = self.newThread)
t.start()
def newThread(self):
print "new thread called, sleeping %s" % time.time()
time.sleep(10)
self.write("Awake after 10 seconds!" % time.time())
self.finish()
If I open two curl sessions to localhost/test simultaneously, I get this on the server side:
Thread starting 1402236952.17
new thread called, sleeping 1402236952.17
Thread starting 1402236953.21
new thread called, sleeping 1402236953.21
And this on the client side:
Awake after 10 seconds! 1402236962.18
Awake after 10 seconds! 1402236963.22
Which is exactly what you expect. However in Chromium, I get the same behavior as you. I think that Chromium (perhaps all browsers) will only allow one connection at a time to be opened to the same URL. I confirmed this by making IndexHandler run the same code as TestHandler, except with slightly different log messages. Here's the output when opening two browser windows, one to /test, and one to /index:
index Thread starting 1402237590.03
index new thread called, sleeping 1402237590.03
Thread starting 1402237592.19
new thread called, sleeping 1402237592.19
As you can see both ran concurrently without issue.
I think you picked the "wrong" test for checking parallel GET requests, that's because you're using a blocking function for your test: time.sleep(), which its behavior doesn't really occur when you simply render an HTML page ...
What happens is, that the def get() ( which handle all GET requests ) is actually being blocked when you use time.sleep it cannot process any new GET requests, puts them in some kind of "queue".
So if you really want to test sleep() - use the Tornado non-blocking function: tornado.gen.sleep()
Example:
from tornado import gen
#gen.coroutine
def get(self):
yield self.time_wait()
#gen.coroutine
def time_wait(self):
yield gen.sleep(15)
self.write("done")
Open multiple tabs in your browser, then you'll see that all requests are being processed when they arrive w/o "queueing" the new requests that comes in ..

Tornado adbapi callback not running

I'm afraid I'm finding it difficult to work with the adbapi interface for sqlite3 ConnectionPools in twisted.
I've initialized my pool like this in a file I've named db.py:
from twisted.enterprise import adbapi
pool = adbapi.ConnectionPool("sqlite3", db=config.db_file)
pool.start()
def last(datatype, n):
cmd = "SELECT * FROM %s ORDER BY Timestamp DESC LIMIT %i" % (datatype, n)
return pool.runQuery(cmd)
Then, I'm importing db.py and using it inside a particular route handler. Unfortunately, it appears the callback is never triggered. datatype is printed, but response is never printed.
class DataHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self, datatype):
print datatype
data = db.last(datatype, 500)
data.addCallback(self.on_response)
def on_response(self, response):
print response
self.write(json.dumps(response))
self.finish()
Any ideas?
Mixing Tornado and Twisted requires special attention. Try this, as the first lines executed in your entire program:
import tornado.platform.twisted
tornado.platform.twisted.install()
Then, to start your server:
tornado.ioloop.IOLoop.current().start()
What's happening now is, you start the Tornado IOLoop but you never start the Twisted Reactor.
Your Twisted SQLite connection begins an IO operation when you run your query, but since the Reactor isn't running, the operation never completes. In order for the IOLoop and the Reactor to share your process you must run one of them on top of the other. Tornado provides a compatibility layer that allows you to do that.

Twisted: why is it that passing a deferred callback to a deferred thread makes the thread blocking all of a sudden?

I unsuccessfully tried using txredis (the non blocking twisted api for redis) for a persisting message queue I'm trying to set up with a scrapy project I am working on. I found that although the client was not blocking, it became much slower than it could have been because what should have been one event in the reactor loop was split up into thousands of steps.
So instead, I tried making use of redis-py (the regular blocking twisted api) and wrapping the call in a deferred thread. It works great, however I want to perform an inner deferred when I make a call to redis as I would like to set up connection pooling in attempts to speed things up further.
Below is my interpretation of some sample code taken from the twisted docs for a deferred thread to illustrate my use case:
#!/usr/bin/env python
from twisted.internet import reactor,threads
from twisted.internet.task import LoopingCall
import time
def main_loop():
print 'doing stuff in main loop.. do not block me!'
def aBlockingRedisCall():
print 'doing lookup... this may take a while'
time.sleep(10)
return 'results from redis'
def result(res):
print res
def main():
lc = LoopingCall(main_loop)
lc.start(2)
d = threads.deferToThread(aBlockingRedisCall)
d.addCallback(result)
reactor.run()
if __name__=='__main__':
main()
And here is my alteration for connection pooling that makes the code in the deferred thread blocking :
#!/usr/bin/env python
from twisted.internet import reactor,defer
from twisted.internet.task import LoopingCall
import time
def main_loop():
print 'doing stuff in main loop.. do not block me!'
def aBlockingRedisCall(x):
if x<5: #all connections are busy, try later
print '%s is less than 5, get a redis client later' % x
x+=1
d = defer.Deferred()
d.addCallback(aBlockingRedisCall)
reactor.callLater(1.0,d.callback,x)
return d
else:
print 'got a redis client; doing lookup.. this may take a while'
time.sleep(10) # this is now blocking.. any ideas?
d = defer.Deferred()
d.addCallback(gotFinalResult)
d.callback(x)
return d
def gotFinalResult(x):
return 'final result is %s' % x
def result(res):
print res
def aBlockingMethod():
print 'going to sleep...'
time.sleep(10)
print 'woke up'
def main():
lc = LoopingCall(main_loop)
lc.start(2)
d = defer.Deferred()
d.addCallback(aBlockingRedisCall)
d.addCallback(result)
reactor.callInThread(d.callback, 1)
reactor.run()
if __name__=='__main__':
main()
So my question is, does anyone know why my alteration causes the deferred thread to be blocking and/or can anyone suggest a better solution?
Well, as the twisted docs say:
Deferreds do not make the code
magically not block
Whenever you're using blocking code, such as sleep, you have to defer it to a new thread.
#!/usr/bin/env python
from twisted.internet import reactor,defer, threads
from twisted.internet.task import LoopingCall
import time
def main_loop():
print 'doing stuff in main loop.. do not block me!'
def aBlockingRedisCall(x):
if x<5: #all connections are busy, try later
print '%s is less than 5, get a redis client later' % x
x+=1
d = defer.Deferred()
d.addCallback(aBlockingRedisCall)
reactor.callLater(1.0,d.callback,x)
return d
else:
print 'got a redis client; doing lookup.. this may take a while'
def getstuff( x ):
time.sleep(3)
return "stuff is %s" % x
# getstuff is blocking, so you need to push it to a new thread
d = threads.deferToThread(getstuff, x)
d.addCallback(gotFinalResult)
return d
def gotFinalResult(x):
return 'final result is %s' % x
def result(res):
print res
def aBlockingMethod():
print 'going to sleep...'
time.sleep(10)
print 'woke up'
def main():
lc = LoopingCall(main_loop)
lc.start(2)
d = defer.Deferred()
d.addCallback(aBlockingRedisCall)
d.addCallback(result)
reactor.callInThread(d.callback, 1)
reactor.run()
if __name__=='__main__':
main()
In case the redis api is not very complex it might be more natural to rewrite it using twisted.web, instead of just calling the blocking api in a lot threads.
There's also an up-to-date Redis client for twisted which already supports the new protocol and features of Redis 2.x. You should definetely give it a try. It's called txredisapi.
For the persistent message queue, I'd recommend RestMQ. A redis-based message queue system built on top of cyclone and txredisapi.
http://github.com/gleicon/restmq
Cheers
On a related note, you could probably gain a lot by using a Redis client created specifically for Twisted, such as this one: http://github.com/deldotdr/txRedis

Categories

Resources