Webob / Pyramid query string parameters out of order upon receipt - python

I am running Pyramid as my API server. Recently we started getting query string parameters out of order when handed to the RESTful API server. For example, a GET to /v1/finishedGoodRequests?exact=true&id=39&join=OR&exact=false&name=39
is logged by the RESTful api module upon init as request.url:
v1/finishedGoodRequests?join=OR&name=39&exact=true&exact=false&id=39
with request.query_string: join=OR&name=39&exact=true&exact=false&id=39
I process the query params in order to qualify the search, in this case id exactly 39 or 39 anywhere in the name. What kind of possible server setting or bug could have crept in to the server code to cause such a thing? It is still a MultiDict...

As a simple example, this works fine for me, and the MultiDict has always preserved the order and so I suspect something is getting rewritten by something you're using in your stack.
from pyramid.config import Configurator
from pyramid.view import view_config
from waitress import serve
#view_config(renderer='json')
def view(request):
return list(request.GET.items())
config = Configurator()
config.scan(__name__)
app = config.make_wsgi_app()
serve(app, listen='127.0.0.1:8080')
$ curl http://localhost:8080\?join=OR\&name=39\&exact=true\&exact=false\&id=39
[["join", "OR"], ["name", "39"], ["exact", "true"], ["exact", "false"], ["id", "39"]]
Depending on which WSGI server you are using, often you can view environ vars to see the original url which may be handy. Waitress does not, so instead just put something high up in the pipeline (wsgi middleware) that can log out the environ['QUERY_STRING'] and see if it doesn't match somewhere lower down in your stack.

Related

modify flask url before routing

My Flask app has url routing defined as
self.add_url_rule('/api/1/accounts/<id_>', view_func=self.accounts, methods=['GET'])
Problem is one of the application making queries to this app adds additional / in url like /api/1//accounts/id. It's not in my control to correct the application which makes such queries so I cant change it.
To resolve this problem currently I have added multiple rules
self.add_url_rule('/api/1/accounts/<id_>', view_func=self.accounts, methods=['GET'])
self.add_url_rule('/api/1//accounts/<id_>', view_func=self.accounts, methods=['GET'])
There are number of such routes and it's ugly workaround. Is there a way in flask to modify URL before it hits the routing logic?
I'd normalise the path before it gets to Flask, either by having the HTTP server that hosts the WSGI container or a proxy server that sits before your stack do this, or by using WSGI middleware.
The latter is easily written:
import re
from functools import partial
class PathNormaliser(object):
_collapse_slashes = partial(re.compile(r'/+').sub, r'/')
def __init__(self, application):
self.application = application
def __call__(self, env, start_response):
env['PATH_INFO'] = self._collapse_slashes(env['PATH_INFO'])
return self.application(env, start_response)
You may want to log that you are applying this transformation, together with diagnostic information like the REMOTE_HOST and HTTP_USER_AGENT entries. Personally, I'd force that specific application to generate non-broken URLs as soon as possible.
Look at your WSGI server documentation to see how to add in extra WSGI middleware components.

How to rewrite python jsronrpc2 application to be able to function as wsgi app

I am having some problems understanding how a application written in python jsonrpc2 is related to a wgsi application.
I have a json rpc test application in a file called greeting.py
It is a simple test case
def hello(name=None,greeting=None):
# Print to stdout the greeting
result = "From jsonrpc you have: {greeting} , {name}".format(greeting=greeting,name=name)
# print result
# You can basically now return the string result
return result
Using the jsonrpc2 module I am able to POST json to this function which then returns a json response.
Sample post :
self.call_values_dict_webpost = dict(zip(["jsonrpc","method","id","params"],["2.0","greeting.hello","2",["Hari","Hello"]]))
Response returned as json:
u"jsonrpc": u"2.0", u"id": u"2", u"result": u"From jsonrpc you have: Hello , Hari"
I start the server with an entry point defined in the jsonrpc2 module which essentially does the following
from jsonrpc2 import JsonRpcApplication
from wsgiref.simple_server import make_server
app = JsonRpcApplication()
app.rpc.add_module("greeting")
httpd = make_server(host, port, app)
httpd.serve_forever()
I can currently run this jsonrpc2 server as a standalone "web app" and test it approproately.
I wanted to understand how to go from this simple function web app to a wsgi web app that reads and writes json without using a web framework such as flask or django ( which I know some of)
I am looking for whether there is a simple conceptual step that makes my function above compatible with a wsgi "callable" : or am I just better off using flask or django to read/receive json "POST" and write json response.
I don't know that particular module, but it looks like your app object is the WSGI application. All you do in that code is instantiate the app, then create a server for it via wsgiref. So instead of doing that, just point your real WSGI server - Apache/mod_wsgi, or gunicorn, or whatever - to that app object in exactly the same way as you would serve Flask or Django.

Sending a flask request within a flask request

I am implementing an endpoint in my Flask application that receives a collection of HTTP requests, and returns a collection of the corresponding HTTP responses. In order to accomplish this, I need my endpoint to call other endpoints in order to construct the result. However, because Flask is blocking while processing the original request, it cannot process the nested requests and the application gets deadlocked.
Is there any way to issue a request within a request in flask in a way that doesn't result in a deadlock?
I included a segment of my code which I believe should be enough to illustrate the problem without overwhelming you. If you would like to see more of it please let me know and I'll share.
from requests import Session, Request
def split(request):
multipart = request.stream.read()
boundary = request.content_type.split(';')[1]
prefix = ' boundary"'
suffix = '"'
delimiter = '--%s' % boundary[len(prefix)+1:-len(suffix)]
subrequests = [s.lstrip() for s in multipart.split(delimiter)]
for sub in subrequests:
status_line, _, more_lines = sub.partition('\n')
method, path, version = status_line.split()
headers, _, body = more_lines.partition('\n\n')
url = 'http://localhost:3000' + path
return Request(method, url, headers=headers, data=body)
#app.route('/batch', methods=["GET", "POST"])
def batch():
subrequests = split(request)
session = Session()
responses = []
for sub in subrequests:
response.append(s.send(sub.prepare())) # Deadlock!
There are two candidate solutions that I considered which I found to be unsatisfactory:
Don't issue a full request. Instead, just call the function that is mapped to the endpoint of interest (url_for). I am unsatisfied by this approach because the nested requests have their own headers and cookies which are neglected by this approach. Furthermore, code in the 'before_request' and 'after_request' handlers won't get called automatically
Run multiple instances of the application. This will solve the problem, but expose my service to a pretty simple DoS attack. If I have X instances running, All an attacker would need to do is to hit my service with X different requests to cause a deadlock.
Thank you.
Knowing that the internal flask server is not production-ready, when using only for development, pass the threaded=true parameter to app.run.
app.run(debug=True, threaded=True)
This happens cause you're using the flask devserver. It's not for production use.
In production environment you would use an application server (uWSGI, GUnicorn, Tornado, ...) with or without a webserver layer (NGINX, Apache,...) to proxy/balance connections to the workers protecting (not completely but in a lot of environments it's acceptable) from DoS attacks.

How to access wsgi params from a request inside a middleware and a flask request without side effect?

I need to read some values from the wsgi request before my flask app is loaded. If I read the url from the wsgi request I can access the file without any issues once the flask app is loaded (after the middleware runs).
But if I attempt to access the params it seems to remove the post data once the flask app is loaded. I even went to the extreme of wrapping the wsgi request with a special Webob Request to prevent this "read once" problem.
Does anyone know how to access values from the wsgi request in middleware without doing any sort of side effect harm to the request so you can get post data / file data in a flask app?
from webob import Request
class SomeMiddleware(object):
def __init__(self, environ):
self.request = Request(environ)
self.orig_environ = environ
def apply_middleware(self):
print self.request.url #will not do any harm
print self.request.params #will cause me to lose data
Here is my flask view
#app.route('/')
def hello_world():
from flask import request
the_file = request.files['file']
print "and the file is", the_file
From what I can tell, this is a limitation of the way that WSGI works. The stream needs only be consumable once (PEP 333 and 3333 only require that the stream support read* calls, tell does not need to be supported). Once the stream is exhausted it cannot be re-streamed to other WSGI applications further "inward". Take a look at these two sections of Werkzeug's documentation for more information:
http://werkzeug.pocoo.org/docs/request_data/
http://werkzeug.pocoo.org/docs/http/#module-werkzeug.formparser
The way to avoid this issue is to wrap the input stream (wsgi.input) in an object that implements the read and readline methods. Then, only when the final application in the chain actually attempts to exhaust the stream will your methods be run. See Flask's documentation on generating a request checksum for an example of this pattern.
That being said, are you sure a middleware is the best solution to your problems? If you need to perform some action (dispatch, logging, authentication) based on the content of the body of the request you may be better off making it a part of your application, rather than a stand-alone application of its own.

Getting the Server URL in Google App Engine using python

How do I get App Engine to generate the URL of the server it is currently running on?
If the application is running on development server it should return
http://localhost:8080/
and if the application is running on Google's servers it should return
http://application-name.appspot.com
You can get the URL that was used to make the current request from within your webapp handler via self.request.url or you could piece it together using the self.request.environ dict (which you can read about on the WebOb docs - request inherits from webob)
You can't "get the url for the server" itself, as many urls could be used to point to the same instance.
If your aim is really to just discover wether you are in development or production then use:
'Development' in os.environ['SERVER_SOFTWARE']
Here is an alternative answer.
from google.appengine.api import app_identity
server_url = app_identity.get_default_version_hostname()
On the dev appserver this would show:
localhost:8080
and on appengine
your_app_id.appspot.com
If you're using webapp2 as framework chances are that you already using URI routing in you web application.
http://webapp2.readthedocs.io/en/latest/guide/routing.html
app = webapp2.WSGIApplication([
webapp2.Route('/', handler=HomeHandler, name='home'),
])
When building URIs with webapp2.uri_for() just pass _full=True attribute to generate absolute URI including current domain, port and protocol according to current runtime environment.
uri = uri_for('home')
# /
uri = uri_for('home', _full=True)
# http://localhost:8080/
# http://application-name.appspot.com/
# https://application-name.appspot.com/
# http://your-custom-domain.com/
This function can be used in your Python code or directly from templating engine (if you register it) - very handy.
Check webapp2.Router.build() in the API reference for a complete explanation of the parameters used to build URIs.

Categories

Resources