I have a application written in Twisted and I want to add a web interface to control and monitor it. I'll need plenty of dynamic pages that show the current status and configuration, so I hoped for a framework that offers at least a templating language with inheritance and some basic routing.
Since I am using Twisted anyways I wanted to use twisted.web - but it's templating language is too basic and it seems that the only framework, Nevow is quite dead (it's on launchpad but the homepage and wiki are down and I can't find any documentation).
So what are my options?
Is there any other twisted.web based framework?
Are there other frameworks that work with twisted's reactor?
Should I just get a web framework (I'm thinking web.py or flask) and run it in a thread?
Thanks for your answers.
Since Nevow is still down and I didn't want to write routing and support for a templating lib myself, I ended up using Flask. It turned out to be quite easy:
# make a Flask app
from flask import Flask, render_template, g
app = Flask(__name__)
#app.route("/")
def index():
return render_template("index.html")
# run in under twisted through wsgi
from twisted.web.wsgi import WSGIResource
from twisted.web.server import Site
resource = WSGIResource(reactor, reactor.getThreadPool(), app)
site = Site(resource)
# bind it etc
# ...
It works flawlessly so far.
You can bind it directly into the reactor like the example below:
reactor.listenTCP(5050, site)
reactor.run()
If you need to add children to a WSGI root visit this link for more details.
Here is an example showing how to combine WSGI Resource with a static child.
from twisted.internet import reactor
from twisted.web import static as Static, server, twcgi, script, vhost
from twisted.web.resource import Resource
from twisted.web.wsgi import WSGIResource
from flask import Flask, g, request
class Root( Resource ):
"""Root resource that combines the two sites/entry points"""
WSGI = WSGIResource(reactor, reactor.getThreadPool(), app)
def getChild( self, child, request ):
# request.isLeaf = True
request.prepath.pop()
request.postpath.insert(0,child)
return self.WSGI
def render( self, request ):
"""Delegate to the WSGI resource"""
return self.WSGI.render( request )
def main():
static = Static.File("/path/folder")
static.processors = {'.py': script.PythonScript,
'.rpy': script.ResourceScript}
static.indexNames = ['index.rpy', 'index.html', 'index.htm']
root = Root()
root.putChild('static', static)
reactor.listenTCP(5050, server.Site(root))
reactor.run()
Nevow is the obvious choice. Unfortunately the divmod web server hardware and the backup server hardware failed at the same time. They are attempting to recover the data and publish it on launchpad, but it may take a while.
You could also use basically any existing template module with twisted.web; Jinja2 comes to mind.
Related
Dash is a dashboard-related python library that is based on Flask. The default dash app will run a Flask server, which is as they stated, "not recommended for production environment". I have managed to find Twisted library which can do decent html handling. The problem is, I know how to use Twisted to host flask sites, but I do not know how to do the same for dash app. There is a nice library which integrates both flask and twisted together.
https://github.com/cravler/flask-twisted
To use it, one just need to use the following lines:
server = flask.Flask(__name__)
app = dash.Dash(__name__, server = server)
twisted = Twisted(server)
twisted.run(host='0.0.0.0',port=8050, debug=False)
Now, for learning purposes, I am trying to recreate the same functionality without flask-twisted. I tried my best to follow the source code in the module but still cannot recreate the same result. The page http://127.0.0.1:8082/my_flask/ is stuck at "Loading...". What did I do wrong?
if __name__ == '__main__':
resource = WSGIResource(reactor, reactor.getThreadPool(), server)
site = Site(WSGIRootResource(resource, {}))
server.run
root = Resource()
root.putChild(b'my_flask', site)
reactor.listenTCP(8082, Site(root))
reactor.run()
My Flask app has url routing defined as
self.add_url_rule('/api/1/accounts/<id_>', view_func=self.accounts, methods=['GET'])
Problem is one of the application making queries to this app adds additional / in url like /api/1//accounts/id. It's not in my control to correct the application which makes such queries so I cant change it.
To resolve this problem currently I have added multiple rules
self.add_url_rule('/api/1/accounts/<id_>', view_func=self.accounts, methods=['GET'])
self.add_url_rule('/api/1//accounts/<id_>', view_func=self.accounts, methods=['GET'])
There are number of such routes and it's ugly workaround. Is there a way in flask to modify URL before it hits the routing logic?
I'd normalise the path before it gets to Flask, either by having the HTTP server that hosts the WSGI container or a proxy server that sits before your stack do this, or by using WSGI middleware.
The latter is easily written:
import re
from functools import partial
class PathNormaliser(object):
_collapse_slashes = partial(re.compile(r'/+').sub, r'/')
def __init__(self, application):
self.application = application
def __call__(self, env, start_response):
env['PATH_INFO'] = self._collapse_slashes(env['PATH_INFO'])
return self.application(env, start_response)
You may want to log that you are applying this transformation, together with diagnostic information like the REMOTE_HOST and HTTP_USER_AGENT entries. Personally, I'd force that specific application to generate non-broken URLs as soon as possible.
Look at your WSGI server documentation to see how to add in extra WSGI middleware components.
The API should allow arbitrary HTTP get requests containing URLs the user wants scraped, and then Flask should return the results of the scrape.
The following code works for the first http request, but after twisted reactor stops, it won't restart. I may not even be going about this the right way, but I just want to put a RESTful scrapy API up on Heroku, and what I have so far is all I can think of.
Is there a better way to architect this solution? Or how can I allow scrape_it to return without stopping twisted reactor (which can't be started again)?
from flask import Flask
import os
import sys
import json
from n_grams.spiders.n_gram_spider import NGramsSpider
# scrapy api
from twisted.internet import reactor
import scrapy
from scrapy.crawler import CrawlerRunner
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
app = Flask(__name__)
def scrape_it(url):
items = []
def add_item(item):
items.append(item)
runner = CrawlerRunner()
d = runner.crawl(NGramsSpider, [url])
d.addBoth(lambda _: reactor.stop()) # <<< TROUBLES HERE ???
dispatcher.connect(add_item, signal=signals.item_passed)
reactor.run(installSignalHandlers=0) # the script will block here until the crawling is finished
return items
#app.route('/scrape/<path:url>')
def scrape(url):
ret = scrape_it(url)
return json.dumps(ret, ensure_ascii=False, encoding='utf8')
if __name__ == '__main__':
PORT = os.environ['PORT'] if 'PORT' in os.environ else 8080
app.run(debug=True, host='0.0.0.0', port=int(PORT))
I think there is no a good way to create Flask-based API for Scrapy. Flask is not a right tool for that because it is not based on event loop. To make things worse, Twisted reactor (which Scrapy uses) can't be started/stopped more than once in a single thread.
Let's assume there is no problem with Twisted reactor and you can start and stop it. It won't make things much better because your scrape_it function may block for an extended period of time, and so you will need many threads/processes.
I think the way to go is to create an API using async framework like Twisted or Tornado; it will be more efficient than a Flask-based (or Django-based) solution because the API will be able to serve requests while Scrapy is running a spider.
Scrapy is based on Twisted, so using twisted.web or https://github.com/twisted/klein can be more straightforward. But Tornado is also not hard because you can make it use Twisted event loop.
There is a project called ScrapyRT which does something very similar to what you want to implement - it is an HTTP API for Scrapy. ScrapyRT is based on Twisted.
As an examle of Scrapy-Tornado integration check Arachnado - here is an example on how to integrate Scrapy's CrawlerProcess with Tornado's Application.
If you really want Flask-based API then it could make sense to start crawls in separate processes and/or use queue solution like Celery. This way you're loosing most of the Scrapy efficiency; if you go this way you can use requests + BeautifulSoup as well.
I have been working on similar project last week, it's SEO service API, my workflow was like this:
The client send a request to Flask-based server with a URRL to scrape, and a callback url to notify the client when scrapping is done (client here is an other web app)
Run Scrapy in the background using Celery. The spider will save the data to the database.
The backgound service will notify the client by calling the callback url when the spider is done.
I am currently working on an application that requires math expressions to be rendered (from latex) and needs to have some sort of native gui (even if it just uses gtk, then renders html in webkit).
I did some research and decided an easy way to do this would be to use webkit to load a web page and use a JavaScript library like MathJax to render the math.
Some other reasons why I chosen to do it this way over other solutions are I have had a fair amount of experience developing web apps in python (although a while ago), lack of experience with native guis and the portability it would provide.
For a web app framework I have chosen to use flask as it is one I am most familiar with.
The problem is this application needs to have it's own native GUI through preferably gtk (even if just renders html with webkit) and also preferably shouldn't have a http server that is attached to some socket.
So my question is, instead of running flask's server is there any way to do something like this:
import gtk
import webkit
from flask import Flask
app = Flask(__name__)
#app.route('/')
def hello_world():
return "<h1>Hello World!</h1>"
if __name__ == '__main__':
window = gtk.Window()
webview = webkit.WebView()
webview.load_string(
app.load_from_uri('/'),
"text/html",
"utf-8",
'/'
)
window.add(webview)
window.show_all()
Where app.load_from_uri('/') is just used as an example of a way to load the webpage for a given uri of a Flask app. But as this is just an example, how could app.load_from_uri('/') be done in real code?
Also is there anyway to override when the user clicks a link so it does something like this:
def link_clicked(uri):
webview.load_string(
app.load_from_uri(uri),
"text/html",
"utf-8",
uri
)
Thanks any help is greatly appreciated!
I've ended up finding a solution to this myself (but open to better ones).
The first thing, loading a page, was pretty easy. Flask provides a way to test apps which mainly just sets up all the things for WSGI to be able to process a request. This is just what I needed so I used this like so:
from flask import Flask
class WebViewFlask(Flask):
"""
Adds the ability to load a uri without the
need of a HTTP server.
"""
def load_from_uri(self, uri):
"""
Loads a uri without a running HTTP server.
"""
with self.test_client() as c:
response = c.get(uri)
return response.data, response.mimetype
The second part, overriding "when the user clicks a link", is a bit more trickier.
import os
import webkit
class FlaskAppView(webkit.WebView):
"""
Loads pages for flask apps into a WebView.
"""
def __init__(self, flask_app, *args, **kwargs):
# Protocol for flask app, by default file:// is used
# so a protocol is defined here to prevent that.
self.PROTOCOL = 'flask://'
super(webkit.WebView, self).__init__(*args, **kwargs)
self._flask_app = flask_app
# Register new navigation handler.
self.connect(
"navigation-policy-decision-requested",
self._nav_request
)
# For navigation handler.
self.prev_uri = None
# Redefine open like this as when using super
# an error like this occurs:
# AttributeError: 'super' object has no attribute 'open'
self._open = self.open
self.open = self.open_
def _nav_request(self, view, frame, net_req, nav_act, pol_dec):
"""
WebView navigation handler for Flask apps.
"""
# Get the uri
uri = net_req.get_uri()
# In order for flask apps to use relative links
# the protocol is removed and it is made into an absolute
# path.
if uri.startswith(self.PROTOCOL):
# In this case it is not relative but
# it needs to have it's protocol removed
uri = uri[len(self.PROTOCOL):]
elif not self.prev_uri.endswith(uri):
# It is relative and self.prev_uri needs to
# be appended.
uri = os.path.normpath(os.path.join(self.prev_uri, uri))
# This is used to prevent an infinite recursive loop due
# to view.load_string running this function with the same
# input.
if uri == self.prev_uri:
return False
self.prev_uri = uri
# Create response from Flask app.
response = app.load_from_uri(uri) + ('utf-8', uri)
# Load response.
view.load_string(*response)
# Return False to prevent additional
# handlers from running.
return False
def open_(self, uri):
"""
Prepends protocol to uri for webkit.WebView.open.
"""
self._open(self.PROTOCOL + uri)
Basically a new navigation event handler is registered with some code to allow for successful recursion and support for relative paths.
Anyway, with that code above by just replacing Flask with WebViewFlask and WebView with FlaskAppView everything pretty much just works.
And the result:
Which is a flask app being loaded in a webkit.WebView without any sort of server. The best thing about it is by just switching app back to an instance of Flask instead of WebViewFlask It's a plain webapp again.
Currently using Tornado as a wrapper for several WSGI apps (mostly Flask apps). Recently been noticing a lot of hacking attempts, and been wondering if it's possible to automatically look at a list of the IPs defined in some file and then redirect all of those IPs to a page saying something like "Someone using this IP tried hacking our site, prove you're not a bot and we'll re-allow your ip".
The tornado code that runs the server is here:
from tornado.wsgi import WSGIContainer
from tornado.httpserver import HTTPServer
from tornado.ioloop import IOLoop
from Wrapper.app import application
http_server = HTTPServer(WSGIContainer(application))
http_server.listen(80)
IOLoop.instance().start()
And wrapper.app is below:
from werkzeug.wsgi import DispatcherMiddleware
from Splash import splash_app
from SentimentDemo import sentiment_app
from FERDemo import FER_app
application = DispatcherMiddleware(splash_app, {
'/api/sentiment': sentiment_app,
'/api/fer': FER_app
})
I haven't been able to find any documentation on this sort of thing, so I'm sorry in advance if this question seems uninformed, but even just a place to start looking would be spectacular.
You want to subclass WSGIContainer and override its __call__ method. Something like
class MyWSGIContainer(WSGIContainer):
def __call__(self, request):
if request.remote_ip in blacklist:
self.write_redirect()
else:
super(MyWSGIContainer, self)(request)
For some tips on writing self.write_redirect() look at the code for the WSGIContainer here; you can see how it formats the HTTP headers. You should use HTTP 302 Temporary Redirect.
Then pass your MyWSGIContainer instance into HTTPServer, instead of the default WSGIContainer.