I have a bunch of long running scripts which do some number crunching and as they run write output to the console via print I want to invoke these scripts from a browser, and display the progress in the browser as they run. I'm currently playing with bottle and am working through this primer http://bottlepy.org/docs/dev/async.html# which is rather neat.
I'd like to try Event Callbacks http://bottlepy.org/docs/dev/async.html#event-callbacks as this seems to exactly match my problem, the script would run as an AsyncWorker (ideally managed by some message queue to limit the number running at any one instance) and periodically write back it's state. But I cannot figure out what SomeAsyncWorker() is - is it a tornado class or a gevent class I have to implement or something else?
#route('/fetch')
def fetch():
body = gevent.queue.Queue()
worker = SomeAsyncWorker()
worker.on_data(body.put)
worker.on_finish(lambda: body.put(StopIteration))
worker.start()
return body
I've found one way of doing this using gevent.queue here http://toastdriven.com/blog/2011/jul/31/gevent-long-polling-you/ which shouldn't be hard to adapt to work with bottle
# wsgi_longpolling/better_responses.py
from gevent import monkey
monkey.patch_all()
import datetime
import time
from gevent import Greenlet
from gevent import pywsgi
from gevent import queue
def current_time(body):
current = start = datetime.datetime.now()
end = start + datetime.timedelta(seconds=60)
while current < end:
current = datetime.datetime.now()
body.put('<div>%s</div>' % current.strftime("%Y-%m-%d %I:%M:%S"))
time.sleep(1)
body.put('</body></html>')
body.put(StopIteration)
def handle(environ, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
body = queue.Queue()
body.put(' ' * 1000)
body.put("<html><body><h1>Current Time:</h1>")
g = Greenlet.spawn(current_time, body)
return body
server = pywsgi.WSGIServer(('127.0.0.1', 1234), handle)
print "Serving on http://127.0.0.1:1234..."
server.serve_forever()
(Not exactly an answer to your question, but here's another tack you could take.)
I've cobbled together a very simple multi-threaded WSGI server that fits nicely under bottle. Here's an example:
import bottle
import time
from mtbottle import MTServer
app = bottle.Bottle()
#app.route('/')
def foo():
time.sleep(2)
return 'hello, world!\n'
app.run(server=MTServer, host='0.0.0.0', port=8080, thread_count=3)
# app is nonblocking; it will handle up to 3 requests concurrently.
# A 4th concurrent request will block until one of the first 3 completed.
https://github.com/RonRothman/mtwsgi
One down side is that all endpoints on that port will be asynchronous; in contrast, the gevent method (I think) gives you more control over which methods are asynchronous and which are synchronous.
Hope this helps!
Related
I developed an API with Bottle and some requests takes long time to send the response, the problem is if during this time I send another short request, I have to wait until the first request is finished.
Here is an example :
from gevent import monkey
monkey.patch_all()
from bottle import route, run
#route('/test', method='GET')
def test():
return 'hello'
#route('/loop', method='GET')
def loop():
for i in range(0, 1000000000):
a = 0
if __name__ == '__main__':
run(host='127.0.0.1', port=45677, debug=True, server='gevent')
If you run /loop and then /test you will have to wait until the /loop is finished to get the /test response.
I tried with many server, always the same problem.
What I'm doing wrong ? Thank you for your help.
You need to understand the async approach. For instance with gevent async doesn't mean multithreaded, so anything that requires CPU will still block. But this is great for stuff that relies on IO, like SQL queries.
So inside your for loop since that is merely cpu bound you will block until it's over unless you provide a sleep condition to allow other contexts to run during the process.
import gevent
from gevent import monkey,spawn as gspawn, sleep as gsleep, socket, signal_handler as sig
monkey.patch_all()
import signal
from bottle import Bottle, static_file, get, post, request, response, template, redirect, hook, route, abort
from gevent.pywsgi import WSGIServer
from geventwebsocket.handler import WebSocketHandler
def sample():
gspawn(myfunc)
#get('/')
def app():
return 'Hello World!'
#get('/test')
def test():
return 'hello'
#route('/loop')
def loop():
for i in range(0, 1000000000):
gsleep(0)
a = 0
if __name__ == '__main__':
botapp = bottle.app()
server = WSGIServer(("0.0.0.0", int(port)), botapp , handler_class=WebSocketHandler)
def shutdown():
print('Shutting down ...')
server.stop(timeout=60)
exit(signal.SIGTERM)
sig(signal.SIGTERM, shutdown)
sig(signal.SIGINT, shutdown)
server.serve_forever()
I have a project that I'm working on where I hope to be able to:
start a wsgiserver on its own thread
do stuff (some of which involves interacting with the wsgiserver
close the thread
end the program
I can do the first two steps, but I'm having trouble with the last two. I've provided a simpler version of my project that exhibits the issue I have where I can do the first two steps from above, just not the last two.
A couple questions:
How do I get the thread to stop the wsgi server?
Do I just need to pull out the wsgiserver code and start it on its own process?
Some details of my project that may head off some questions:
My project currently spins up other processes that are intended to talk to my wsgi server. I can spin everything up and get my processes to talk to my server, but I'm not able to get a graceful shutdown. This code sample is intended to provide a 'relatively simple' sample that can be more easily reviewed.
there are remnants of failed attempts at solving this in the code, hopefully, they aren't too distracting.
#Simple echo program
#listens on port 3000 and returns anything posted by http to that port
#installing required libraries
#download/install Microsoft Visual C++ 9.0 for Python
#https://www.microsoft.com/en-us/download/details.aspx?id=44266
#pip install greenlet
#pip install gevent
import sys
import threading
import urllib
import urllib2
import time
import traceback
from gevent.pywsgi import WSGIServer, WSGIHandler
from gevent import socket
server = ""
def request_error(start_response):
global server
# Send error to atm - must provide start_response
start_response('500', [])
#server.stop()
return ['']
def handle_transaction(env, start_response):
global server
try:
result = env['wsgi.input'].read()
print("Received: " + result)
sys.stdout.flush()
start_response('200 OK', [])
if (result.lower()=="exit"):
#server.stop()
return result
else:
return result
except:
return request_error(start_response)
class ErrorCapturingWSGIHandler(WSGIHandler):
def read_requestline(self):
result = None
try:
result = WSGIHandler.read_requestline(self)
except:
protocol_error()
raise # re-raise error, to not change WSGIHandler functionality
return result
class ErrorCapturingWSGIServer(WSGIServer):
handler_class = ErrorCapturingWSGIHandler
def start_server():
global server
server = ErrorCapturingWSGIServer(
('', 3000), handle_transaction, log=None)
server.serve_forever()
def main():
global server
#start server on it's own thread
print("Echoing...")
commandServerThread = threading.Thread(target=start_server)
commandServerThread.start()
#now that the server is started, send data
req = urllib2.Request("http://127.0.0.1:3000", data='ping')
response = urllib2.urlopen(req)
reply = response.read()
print(reply)
#take a look at the threading info
print(threading.active_count())
#try to exit
req = urllib2.Request("http://127.0.0.1:3000", data='exit')
response = urllib2.urlopen(req)
reply = response.read()
print(reply)
#Now that I'm done, exit
#sys.exit(0)
return
if __name__ == '__main__':
main()
Now I wrote ferver by this tutorial:
https://twistedmatrix.com/documents/14.0.0/web/howto/web-in-60/asynchronous-deferred.html
But it seems to be good only for delayng process, not actually concurently process 2 or more requests. My full code is:
from twisted.internet.task import deferLater
from twisted.web.resource import Resource
from twisted.web.server import Site, NOT_DONE_YET
from twisted.internet import reactor, threads
from time import sleep
class DelayedResource(Resource):
def _delayedRender(self, request):
print 'Sorry to keep you waiting.'
request.write("<html><body>Sorry to keep you waiting.</body></html>")
request.finish()
def make_delay(self, request):
print 'Sleeping'
sleep(5)
return request
def render_GET(self, request):
d = threads.deferToThread(self.make_delay, request)
d.addCallback(self._delayedRender)
return NOT_DONE_YET
def main():
root = Resource()
root.putChild("social", DelayedResource())
factory = Site(root)
reactor.listenTCP(8880, factory)
print 'started httpserver...'
reactor.run()
if __name__ == '__main__':
main()
But when I passing 2 requests console output is like:
Sleeping
Sorry to keep you waiting.
Sleeping
Sorry to keep you waiting.
But if it was concurrent it should be like:
Sleeping
Sleeping
Sorry to keep you waiting.
Sorry to keep you waiting.
So the question is how to make twisted not to wait until response is finished before processing next?
Also make_delayIRL is a large function with heavi logic. Basically I spawn lot of threads and make requests to other urls and collecting results intro response, so it can take some time and not easly to be ported
Twisted processes everything in one event loop. If somethings blocks the execution, it also blocks Twisted. So you have to prevent blocking calls.
In your case you have time.sleep(5). It is blocking. You found the better way to do it in Twisted already: deferLater(). It returns a Deferred that will continue execution after the given time and release the events loop so other things can be done meanwhile. In general all things that return a deferred are good.
If you have to do heavy work that for some reason can not be deferred, you should use deferToThread() to execute this work in a thread. See https://twistedmatrix.com/documents/15.5.0/core/howto/threading.html for details.
You can use greenlents in your code (like threads).
You need to install the geventreactor - https://gist.github.com/yann2192/3394661
And use reactor.deferToGreenlet()
Also
In your long-calculation code need to call gevent.sleep() for change context to another greenlet.
msecs = 5 * 1000
timeout = 100
for xrange(0, msecs, timeout):
sleep(timeout)
gevent.sleep()
So here's the deal : I'm writing a simple lightweight IRC app, hosted locally, that basically does the same job as Xchat and works in your browser, just as Sabnzbd. I display search results in the browser as an html table, and using an AJAX GET request with an on_click event, the download is launched. I use another AJAX GET request in a 1 second loop to request the download information (status, progress, speed, ETA, etc.). I hit a bump with the simultaneous AJAX requests, since my CGI handler seems to only be able to handle one thread at a time : indeed, the main thread processes the download, while requests for download status are sent too.
Since I had a Django app somewhere, I tried implementing this IRC app and everything works fine. Simultaneous requests are handled properly.
So is there something I have to know with the HTTP handler ? Is it not possible with the basic CGI handle to deal with simultaneous requests ?
I use the following for my CGI IRC app :
from http.server import BaseHTTPRequestHandler, HTTPServer, CGIHTTPRequestHandler
If it's not about theory but about my code, I can gladly post various python scripts if it helps.
A little bit deeper into the documentation:
These four classes process requests synchronously; each request must be completed before the next request can be started.
TL;DR: Use a real web server.
So, after further research, here's my code, whick works :
from http.server import BaseHTTPRequestHandler, HTTPServer, CGIHTTPRequestHandler
from socketserver import ThreadingMixIn
import threading
import cgitb; cgitb.enable() ## This line enables CGI error reporting
import webbrowser
class HTTPRequestHandler(CGIHTTPRequestHandler):
"""Handle requests in a separate thread."""
def do_GET(self):
if "shutdown" in self.path:
self.send_head()
print ("shutdown")
server.stop()
else:
self.send_head()
class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):
allow_reuse_address = True
daemon_threads = True
def shutdown(self):
self.socket.close()
HTTPServer.shutdown(self)
class SimpleHttpServer():
def __init__(self, ip, port):
self.server = ThreadedHTTPServer((ip,port), HTTPRequestHandler)
self.status = 1
def start(self):
self.server_thread = threading.Thread(target=self.server.serve_forever)
self.server_thread.daemon = True
self.server_thread.start()
def waitForThread(self):
self.server_thread.join()
def stop(self):
self.server.shutdown()
self.waitForThread()
if __name__=='__main__':
HTTPRequestHandler.cgi_directories = ["/", "/ircapp"]
server = SimpleHttpServer('localhost', 8020)
print ('HTTP Server Running...........')
webbrowser.open_new_tab('http://localhost:8020/ircapp/search.py')
server.start()
server.waitForThread()
I'm using a Tornado web server to queue up items that need to be processed outside of the request/response cycle.
In my simplified example below, every time a request comes in, I add a new string to a list called queued_items. I want to create something that will watch that list and process the items as they show up in it.
(In my real code, the items are processed and sent over a TCP socket which may or may not be connected when the web request arrives. I want the web server to keep queuing up items regardless of the socket connection)
I'm trying to keep this code simple and not use external queues/programs like Redis or Beanstalk. It's not going to have very high volume.
What's a good way using Tornado idioms to watch the client.queued_items list for new items and process them as they arrive?
import time
import tornado.ioloop
import tornado.gen
import tornado.web
class Client():
def __init__(self):
self.queued_items = []
#tornado.gen.coroutine
def watch_queue(self):
# I have no idea what I'm doing
items = yield client.queued_items
# go_do_some_thing_with_items(items)
class IndexHandler(tornado.web.RequestHandler):
def get(self):
client.queued_items.append("%f" % time.time())
self.write("Queued a new item")
if __name__ == "__main__":
client = Client()
# Watch the queue for when new items show up
client.watch_queue()
# Create the web server
application = tornado.web.Application([
(r'/', IndexHandler),
], debug=True)
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()
There is a library called toro, which provides synchronization primitives for tornado. [Update: As of tornado 4.2, toro has been merged into tornado.]
Sounds like you could just use a toro.Queue (or tornado.queues.Queue in tornado 4.2+) to handle this:
import time
import toro
import tornado.ioloop
import tornado.gen
import tornado.web
class Client():
def __init__(self):
self.queued_items = toro.Queue()
#tornado.gen.coroutine
def watch_queue(self):
while True:
items = yield self.queued_items.get()
# go_do_something_with_items(items)
class IndexHandler(tornado.web.RequestHandler):
#tornado.gen.coroutine
def get(self):
yield client.queued_items.put("%f" % time.time())
self.write("Queued a new item")
if __name__ == "__main__":
client = Client()
# Watch the queue for when new items show up
tornado.ioloop.IOLoop.current().add_callback(client.watch_queue)
# Create the web server
application = tornado.web.Application([
(r'/', IndexHandler),
], debug=True)
application.listen(8888)
tornado.ioloop.IOLoop.current().start()
There are a few tweaks required, aside from switching the data structure from a list to a toro.Queue:
We need to schedule watch_queue to run inside the IOLoop using add_callback, rather than trying to call it directly outside of an IOLoop context.
IndexHandler.get needs to be converted to a coroutine, because toro.Queue.put is a coroutine.
I also added a while True loop to watch_queue, so that it will run forever, rather than just processing one item and then exiting.