I am implementing an endpoint in my Flask application that receives a collection of HTTP requests, and returns a collection of the corresponding HTTP responses. In order to accomplish this, I need my endpoint to call other endpoints in order to construct the result. However, because Flask is blocking while processing the original request, it cannot process the nested requests and the application gets deadlocked.
Is there any way to issue a request within a request in flask in a way that doesn't result in a deadlock?
I included a segment of my code which I believe should be enough to illustrate the problem without overwhelming you. If you would like to see more of it please let me know and I'll share.
from requests import Session, Request
def split(request):
multipart = request.stream.read()
boundary = request.content_type.split(';')[1]
prefix = ' boundary"'
suffix = '"'
delimiter = '--%s' % boundary[len(prefix)+1:-len(suffix)]
subrequests = [s.lstrip() for s in multipart.split(delimiter)]
for sub in subrequests:
status_line, _, more_lines = sub.partition('\n')
method, path, version = status_line.split()
headers, _, body = more_lines.partition('\n\n')
url = 'http://localhost:3000' + path
return Request(method, url, headers=headers, data=body)
#app.route('/batch', methods=["GET", "POST"])
def batch():
subrequests = split(request)
session = Session()
responses = []
for sub in subrequests:
response.append(s.send(sub.prepare())) # Deadlock!
There are two candidate solutions that I considered which I found to be unsatisfactory:
Don't issue a full request. Instead, just call the function that is mapped to the endpoint of interest (url_for). I am unsatisfied by this approach because the nested requests have their own headers and cookies which are neglected by this approach. Furthermore, code in the 'before_request' and 'after_request' handlers won't get called automatically
Run multiple instances of the application. This will solve the problem, but expose my service to a pretty simple DoS attack. If I have X instances running, All an attacker would need to do is to hit my service with X different requests to cause a deadlock.
Thank you.
Knowing that the internal flask server is not production-ready, when using only for development, pass the threaded=true parameter to app.run.
app.run(debug=True, threaded=True)
This happens cause you're using the flask devserver. It's not for production use.
In production environment you would use an application server (uWSGI, GUnicorn, Tornado, ...) with or without a webserver layer (NGINX, Apache,...) to proxy/balance connections to the workers protecting (not completely but in a lot of environments it's acceptable) from DoS attacks.
Related
I'm looking for some advice, or a relevant tutorial regarding the following:
My task is to set up a flask route that POSTs to API endpoint X, receives a new endpoint Y in X's response, then GETs from endpoint Y repeatedly until it receives a certain status message in the body of Y's response, and then returns Y's response.
The code below (irrelevant data redacted) accomplishes that goal in, I think, a very stupid way. It returns the appropriate data occasionally, but not reliably. (It times out 60% of the time.) When I console log very thoroughly, it seems as though I have bogged down my server with multiple while loops running constantly, interfering with each other.
I'll also receive this error occasionally:
SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /book
import sys, requests, time, json
from flask import Flask, request
# create the Flask app
app = Flask(__name__)
# main booking route
#app.route('/book', methods=['POST']) #GET requests will be blocked
def book():
# defining the api-endpoints
PRICING_ENDPOINT = ...
# data to be sent to api
data = {...}
# sending post request and saving response as response object
try:
r_pricing = requests.post(url = PRICING_ENDPOINT, data = data)
except requests.exceptions.RequestException as e:
return e
sys.exit(1)
# extracting response text
POLL_ENDPOINT = r_pricing.headers['location']
# setting data for poll
data_for_poll = {...}
r_poll = requests.get(POLL_ENDPOINT, data = data_for_poll)
# poll loop, looking for 'UpdatesComplete'
j = 1
poll_json = r_poll.json()
update_status = poll_json['Status']
while update_status == 'UpdatesPending':
time.sleep(2)
j = float(j) + float(1)
r_poll = requests.get(POLL_ENDPOINT, data = data_for_poll)
poll_json = r_poll.json()
update_status = poll_json['Status']
return r_poll.text
This is more of an architectural issue more than a Flask issue. Long-running tasks in Flask views are always a poor design choice. In this case, the route's response is dependent on two endpoints of another server. In effect, apart from carrying the responsibility of your app, you are also carrying the responsibility of another server.
Since the application's design seems to be a proxy for another service, I would recommend creating the proxy in the right way. Just like book() offers the proxy for PRICING_ENDPOINT POST request, create another route for POLL_ENDPOINT GET request and move the polling logic to the client code (JS).
Update:
If you cannot for some reason trust the client (browser -> JS) with the POLL_ENDPOINT information in a hidden proxy like situation, then maybe move the polling to a task runner like Celery or Python RQ. Although, it will introduce additional components to your application, it would be the right way to go.
Probably you get that error because of the HTTP connection time out with your API server that is looping. There are some standards for HTTP time connection and loop took more time that is allowed for the connection. The first (straight) solution is to "play" with Apache configs and increase the HTTP connection time for your wsgi. You can also make a socket connection and in it check the update status and close it while the goal was achieved. Or you can move your logic to the client side.
I've recently added a SSL certificate to my webapp. It's deployed on Amazon Web Services uses load balancers. The load balancers work as reverse proxies, handling external HTTPS and sending internal HTTP. So all traffic to my Flask app is HTTP, not HTTPS, despite being a secure connection.
Because the site was already online before the HTTPS migration, I used SSLify to send 301 PERMANENT REDIRECTS to HTTP connections. It works despite all connections being HTTP because the reverse proxy sets the X-Forwarded-Proto request header with the original protocol.
The problem
url_for doesn't care about X-Forwarded-Proto. It will use the my_flask_app.config['PREFERRED_URL_SCHEME'] when a scheme isn't available, but during a request a scheme is available. The HTTP scheme of the connection with the reverse proxy.
So when someone connects to https://example.com, it connects to the load balancer, which then connects to Flask using http://example.com. Flask sees the http and assumes the scheme is HTTP, not HTTPS as it originally was.
That isn't a problem in most url_for used in templates, but any url_for with _external=True will use http instead of https. Personally, I use _external=True for rel=canonical since I heard it was recommended practice. Besides that, using Flask.redirect will prepend non-_external urls with http://example.com, since the redirect header must be a fully qualified URL.
If you redirect on a form post for example, this is what would happen.
Client posts https://example.com/form
Server issues a 303 SEE OTHER to http://example.com/form-posted
SSLify then issues a 301 PERMANENT REDIRECT to https://example.com/form-posted
Every redirect becomes 2 redirects because of SSLify.
Attempted solutions
Adding PREFERRED_URL_SCHEME config
https://stackoverflow.com/a/26636880/1660459
my_flask_app.config['PREFERRED_URL_SCHEME'] = 'https'
Doesn't work because there is a scheme during a request, and that one is used instead. See https://github.com/mitsuhiko/flask/issues/1129#issuecomment-51759359
Wrapping a middleware to mock HTTPS
https://stackoverflow.com/a/28247577/1660459
def _force_https(app):
def wrapper(environ, start_response):
environ['wsgi.url_scheme'] = 'https'
return app(environ, start_response)
return wrapper
app = Flask(...)
app = _force_https(app)
As is, this didn't work because I needed that app later. So I used wsgi_app instead.
def _force_https(wsgi_app):
def wrapper(environ, start_response):
environ['wsgi.url_scheme'] = 'https'
return wsgi_app(environ, start_response)
return wrapper
app = Flask(...)
app.wsgi_app = _force_https(app.wsgi_app)
Because wsgi_app is called before any app.before_request handlers, doing this makes SSLify think the app is already behind a secure request and then it won't do any HTTP-to-HTTPS redirects.
Patching url_for
(I can't even find where I got this one from)
from functools import partial
import Flask
Flask.url_for = partial(Flask.url_for, _scheme='https')
This could work, but Flask will give an error if you set _scheme but not _external. Since most of my app url_for are internal, it doesn't work at all.
I was having these same issues with `redirect(url_for('URL'))' behind an AWS Elastic Load Balancer recently & I solved it this using the werkzeug.contrib.fixers.ProxyFix
call in my code.
example:
from werkzeug.contrib.fixers import ProxyFix
app = Flask(__name__)
app.wsgi_app = ProxyFix(app.wsgi_app)
The ProxyFix(app.wsgi_app) adds HTTP proxy support to an application that was not designed with HTTP proxies in mind. It sets REMOTE_ADDR, HTTP_HOST from X-Forwarded headers.
Example:
from werkzeug.middleware.proxy_fix import ProxyFix
# App is behind one proxy that sets the -For and -Host headers.
app = ProxyFix(app, x_for=1, x_host=1)
Digging around Flask source code, I found out that url_for uses the Flask._request_ctx_stack.top.url_adapter when there is a request context.
The url_adapter.scheme defines the scheme used. To make the _scheme parameter work, url_for will swap the url_adapter.scheme temporarily and then set it back before the function returns.
(this behavior has been discussed on github as to whether it should be the previous value or PREFERRED_URL_SCHEME)
Basically, what I did was set url_adapter.scheme to https with a before_request handler. This way doesn't mess with the request itself, only with the thing generating the urls.
def _force_https():
# my local dev is set on debug, but on AWS it's not (obviously)
# I don't need HTTPS on local, change this to whatever condition you want.
if not app.debug:
from flask import _request_ctx_stack
if _request_ctx_stack is not None:
reqctx = _request_ctx_stack.top
reqctx.url_adapter.url_scheme = 'https'
app.before_request(_force_https)
Github offers to send Post-receive hooks to an URL of your choice when there's activity on your repo.
I want to write a small Python command-line/background (i.e. no GUI or webapp) application running on my computer (later on a NAS), which continually listens for those incoming POST requests, and once a POST is received from Github, it processes the JSON information contained within. Processing the json as soon as I have it is no problem.
The POST can come from a small number of IPs given by github; I plan/hope to specify a port on my computer where it should get sent.
The problem is, I don't know enough about web technologies to deal with the vast number of options you find when searching.. do I use Django, Requests, sockets,Flask, microframeworks...? I don't know what most of the terms involved mean, and most sound like they offer too much/are too big to solve my problem - I'm simply overwhelmed and don't know where to start.
Most tutorials about POST/GET I could find seem to be concerned with either sending or directly requesting data from a website, but not with continually listening for it.
I feel the problem is not really a difficult one, and will boil down to a couple of lines, once I know where to go/how to do it. Can anybody offer pointers/tutorials/examples/sample code?
First thing is, web is request-response based. So something will request your link, and you will respond accordingly. Your server application will be continuously listening on a port; that you don't have to worry about.
Here is the similar version in Flask (my micro framework of choice):
from flask import Flask, request
import json
app = Flask(__name__)
#app.route('/',methods=['POST'])
def foo():
data = json.loads(request.data)
print "New commit by: {}".format(data['commits'][0]['author']['name'])
return "OK"
if __name__ == '__main__':
app.run()
Here is a sample run, using the example from github:
Running the server (the above code is saved in sample.py):
burhan#lenux:~$ python sample.py
* Running on http://127.0.0.1:5000/
Here is a request to the server, basically what github will do:
burhan#lenux:~$ http POST http://127.0.0.1:5000 < sample.json
HTTP/1.0 200 OK
Content-Length: 2
Content-Type: text/html; charset=utf-8
Date: Sun, 27 Jan 2013 19:07:56 GMT
Server: Werkzeug/0.8.3 Python/2.7.3
OK # <-- this is the response the client gets
Here is the output at the server:
New commit by: Chris Wanstrath
127.0.0.1 - - [27/Jan/2013 22:07:56] "POST / HTTP/1.1" 200 -
Here's a basic web.py example for receiving data via POST and doing something with it (in this case, just printing it to stdout):
import web
urls = ('/.*', 'hooks')
app = web.application(urls, globals())
class hooks:
def POST(self):
data = web.data()
print
print 'DATA RECEIVED:'
print data
print
return 'OK'
if __name__ == '__main__':
app.run()
I POSTed some data to it using hurl.it (after forwarding 8080 on my router), and saw the following output:
$ python hooks.py
http://0.0.0.0:8080/
DATA RECEIVED:
test=thisisatest&test2=25
50.19.170.198:33407 - - [27/Jan/2013 10:18:37] "HTTP/1.1 POST /hooks" - 200 OK
You should be able to swap out the print statements for your JSON processing.
To specify the port number, call the script with an extra argument:
$ python hooks.py 1234
I would use:
https://github.com/carlos-jenkins/python-github-webhooks
You can configure a web server to use it, or if you just need a process running there without a web server you can launch the integrated server:
python webhooks.py
This will allow you to do everything you said you need. It, nevertheless, requires a bit of setup in your repository and in your hooks.
Late to the party and shameless autopromotion, sorry.
If you are using Flask, here's a very minimal code to listen for webhooks:
from flask import Flask, request, Response
app = Flask(__name__)
#app.route('/webhook', methods=['POST'])
def respond():
print(request.json) # Handle webhook request here
return Response(status=200)
And the same example using Django:
from django.http import HttpResponse
from django.views.decorators.http import require_POST
#require_POST
def example(request):
print(request.json) # Handle webhook request here
return HttpResponse('Hello, world. This is the webhook response.')
If you need more information, here's a great tutorial on how to listen for webhooks with Python.
If you're looking to watch for changes in any repo...
1. If you own the repo that you want to watch
In your repo page, Go to settings
click webhooks, new webhook (top right)
give it your ip/endpoint and setup everything to your liking
use any server to get notified
2. Not your Repo
take the url you want i.e https://github.com/fire17/gd-xo/
add /commits/master.atom to the end such as:
https://github.com/fire17/gd-xo/commits/master.atom
Use any library you want to get that page's content, like:
filter out the keys you want, for example the element
response = requests.get("https://github.com/fire17/gd-xo/commits/master.atom").text
response.split("<updated>")[1].split("</updated>")[0]
'2021-08-06T19:01:53Z'
make a loop that checks this every so often and if this string has changed, then you can initiate a clone/pull request or do whatever you like
I need to read some values from the wsgi request before my flask app is loaded. If I read the url from the wsgi request I can access the file without any issues once the flask app is loaded (after the middleware runs).
But if I attempt to access the params it seems to remove the post data once the flask app is loaded. I even went to the extreme of wrapping the wsgi request with a special Webob Request to prevent this "read once" problem.
Does anyone know how to access values from the wsgi request in middleware without doing any sort of side effect harm to the request so you can get post data / file data in a flask app?
from webob import Request
class SomeMiddleware(object):
def __init__(self, environ):
self.request = Request(environ)
self.orig_environ = environ
def apply_middleware(self):
print self.request.url #will not do any harm
print self.request.params #will cause me to lose data
Here is my flask view
#app.route('/')
def hello_world():
from flask import request
the_file = request.files['file']
print "and the file is", the_file
From what I can tell, this is a limitation of the way that WSGI works. The stream needs only be consumable once (PEP 333 and 3333 only require that the stream support read* calls, tell does not need to be supported). Once the stream is exhausted it cannot be re-streamed to other WSGI applications further "inward". Take a look at these two sections of Werkzeug's documentation for more information:
http://werkzeug.pocoo.org/docs/request_data/
http://werkzeug.pocoo.org/docs/http/#module-werkzeug.formparser
The way to avoid this issue is to wrap the input stream (wsgi.input) in an object that implements the read and readline methods. Then, only when the final application in the chain actually attempts to exhaust the stream will your methods be run. See Flask's documentation on generating a request checksum for an example of this pattern.
That being said, are you sure a middleware is the best solution to your problems? If you need to perform some action (dispatch, logging, authentication) based on the content of the body of the request you may be better off making it a part of your application, rather than a stand-alone application of its own.
Given, when a user requests /foo on my server, I send the following HTTP response (not closing the connection):
Content-Type: multipart/x-mixed-replace; boundary=-----------------------
-----------------------
Content-Type: text/html
foo
When the user goes to /bar (which will send 204 No Content so the view doesn't change), I want to send the following data in the initial response.
-----------------------
Content-Type: text/html
bar
How would I get the second request to trigger this from the initial response? I'm planning on possibly creating a fancy [engines that support multipart/x-mixed-replace (currently only Gecko)]-only email webapp that does server-push and Ajax effects without any JavaScript, just for fun.
No complete answer, but:
In your question, you're describing a Comet-style architecture. Regarding support of Comet-style techniques in Python/WSGI, there is a StackOverflow question, which talks about various Python servers with support for long-running requests a la Comet.
Also interesting is this mail thread in the Python Web-SIG: "Could WSGI handle Asynchronous response?". In May 2008, there was a broad discussion in the Web-SIG about the topic of asynchronous requests in WSGI.
A recent development is evserver, a lightweight WSGI server, which implements the Asynchronous WSGI extension proposed by Christopher Stawarz in the Web-SIG in May 2008.
Finally, the Tornado web server supports non-blocking asynchronous requests. It has a chat example application using long polling, which has similarities with your requirements.
If the problem is to pass some command from /bar application to /foo application and you are using some servlet-like approach (the Python code is loaded once and not for each request as in CGI), you can just change some class property of the /foo application and be ready to react to the change in the /foo instance (by checking the property state).
Obviously the /foo application should not return right after the first request and yield content line by line.
Thought this is just theory, I have not tried that myself.
I have created some small example (just for fun, you know :))
import threading
num = 0
cond = threading.Condition()
def app(environ, start_response):
global num
cond.acquire()
num += 1
cond.notifyAll()
cond.release()
start_response("200 OK", [("Content-Type", "multipart/x-mixed-replace; boundary=xxx")])
while True:
n = num
s = "--xxx\r\nContent-Type: text/html\r\n\r\n%s\n" % n
yield s
# wait for num change:
cond.acquire()
while num == n:
cond.wait()
cond.release()
from cherrypy.wsgiserver import CherryPyWSGIServer
server = CherryPyWSGIServer(("0.0.0.0", 3000), app)
try:
server.start()
except KeyboardInterrupt:
server.stop()
# Now whenever you visit http://127.0.0.1:3000/, the number increases.
# It also automatically increases in all previously opened windows/tabs.
The idea of a shared variable and thread synchronization (using condition variable object) is based on the fact that WSGI server provided by CherryPyWSGIServer is threaded.
Not sure if this is quite what you're looking for, but there is a fairly old way of doing server push using a mime content of multipart/x-mixed-replace
Basically you compose the response as a mime object with content type multipart/x-mixed-replace, and send the first "version" of a document down. The browser will keep the socket open.
Then as the server decides to push more data, a new "version" of the document gets sent from the server, and the browser will intelligently replace (within whatever frame/iframe contains the content) the content.
This was an early way of doing webcams, where the server would send down (push) image after image, and the browser would just keep replacing the image in the document over and over. This is also a way of doing a "Loading..." message over a single HTTP request.