Python WSGI: Reading env['wsgi.input'] more than once

Python WSGI: Reading env['wsgi.input'] more than once - python

I am building a simple web service that requires all requests to be signed. The signature hash is generated using request data including the request body. My desire is to have a middleware component that validates the request signature, responding with an error if the signature is invalid. The problem is the middleware needs to read the request body using env['wsgi.input'].read(). This advances the pointer for the request body string to the end, which makes the data inaccessible to other components further down in the chain of execution.
Is there any way to make it so env['wsgi.input'] can be read twice?
Ex:
from myapp.lib.helpers import sign_request
from urlparse import parse_qs
import json
class ValidateSignedRequestMiddleware(object):
def __init__(self, app, secret):
self._app = app
self._secret = secret
def __call__(self, environ, start_response):
auth_params = environ['HTTP_AUTHORIZATION'].split(',', 1)
timestamp = auth_params[0].split('=', 1)[1]
signature = auth_params[1].split('=', 1)[1]
expected_signature = sign_request(
environ['REQUEST_METHOD'],
environ['HTTP_HOST'],
environ['PATH_INFO'],
parse_qs(environ['QUERY_STRING']),
environ['wsgi.input'].read(),
timestamp,
self._secret
)
if signature != expected_signature:
start_response('400 Bad Request', [('Content-Type', 'application/json')])
return [json.dumps({'error': ('Invalid request signature',)})]
return self._app(environ, start_response)

You can try seeking back to the beginning, but you may find that you'll have to replace it with a StringIO containing what you just read out.

The following specification deals with that exact problem, providing explanation of the problem as well as the solution including source code and special cases to take into account:
http://wsgi.readthedocs.org/en/latest/specifications/handling_post_forms.html

Related

Gateway Time-out with StreamingResponse and custom Middleware fastapi [duplicate]

We are writing a web service using Python FastAPI that is going to be hosted in Kubernetes. For auditing purposes, we need to save the raw JSON body of the request/response for specific routes. The body size of both request and response JSON is about 1MB, and preferably, this should not impact the response time.
How can we do that?

Option 1 - Using Middleware
You could use a Middleware. A middleware takes each request that comes to your application, and hence, allows you to handle the request before it is processed by any specific endpoint, as well as the response, before it is returned to the client. To create a middleware, you use the decorator #app.middleware("http") on top of a function, as shown below. As you need to consume the request body from the stream inside the middleware—using either request.body() or request.stream(), as shown in this answer (behind the scenes, the former method actually calls the latter, see here)—then it won't be available when you later pass the request to the corresponding endpoint. Thus, you can follow the approach described in this post to make the request body available down the line (i.e., using the set_body function below). As for the response body, you can use the same approach as described in this answer to consume the body and then return the response to the client. Either option described in the aforementioned linked answer would work; the below, however, uses Option 2, which stores the body in a bytes object and returns a custom Response directly (along with the status_code, headers and media_type of the original response).
To log the data, you could use a BackgroundTask, as described in this answer and this answer. A BackgroundTask will run only once the response has been sent (see Starlette documentation as well); thus, the client won't have to be waiting for the logging to complete before receiving the response (and hence, the response time won't be noticeably impacted).
Note
If you had a streaming request or response with a body that wouldn't fit into your server's RAM (for example, imagine a body of 100GB on a machine running 8GB RAM), it would become problematic, as you are storing the data to RAM, which wouldn't have enough space available to accommodate the accumulated data. Also, in case of a large response (e.g., a large FileResponse or StreamingResponse), you may be faced with Timeout errors on client side (or on reverse proxy side, if you are using one), as you would not be able to respond back to the client, until you have read the entire response body (as you are looping over response.body_iterator). You mentioned that "the body size of both request and response JSON is about 1MB"; hence, that should normally be fine (however, it is always a good practice to consider beforehand matters, such as how many requests your API is expected to be serving concurrently, what other applications might be using the RAM, etc., in order to rule whether this is an issue or not). If you needed to, you could limit the number of requests to your API endpoints using, for example, SlowAPI (as shown in this answer).
Limiting the usage of the middleware to specific routes only
You could limit the usage of the middleware to specific endpoints by:
checking the request.url.path inside the middleware against a
pre-defined list of routes for which you would like to log the
request and response, as described in this answer (see
"Update" section),
or using a sub application, as demonstrated in this
answer
or using a custom APIRoute class, as demonstrated in Option 2
below.
Working Example
from fastapi import FastAPI, APIRouter, Response, Request
from starlette.background import BackgroundTask
from fastapi.routing import APIRoute
from starlette.types import Message
from typing import Dict, Any
import logging
app = FastAPI()
logging.basicConfig(filename='info.log', level=logging.DEBUG)
def log_info(req_body, res_body):
logging.info(req_body)
logging.info(res_body)
async def set_body(request: Request, body: bytes):
async def receive() -> Message:
return {'type': 'http.request', 'body': body}
request._receive = receive
#app.middleware('http')
async def some_middleware(request: Request, call_next):
req_body = await request.body()
await set_body(request, req_body)
response = await call_next(request)
res_body = b''
async for chunk in response.body_iterator:
res_body += chunk
task = BackgroundTask(log_info, req_body, res_body)
return Response(content=res_body, status_code=response.status_code,
headers=dict(response.headers), media_type=response.media_type, background=task)
#app.post('/')
def main(payload: Dict[Any, Any]):
return payload
In case you would like to perform some validation on the request body—for example, ensruing that the request body size is not exceeding a certain value—instead of using request.body(), you can process the body one chunk at a time using the .stream() method, as shown below (similar to this answer).
#app.middleware('http')
async def some_middleware(request: Request, call_next):
req_body = b''
async for chunk in request.stream():
req_body += chunk
...
Option 2 - Using custom APIRoute class
You can alternatively use a custom APIRoute class—similar to here and here—which, among other things, would allow you to manipulate the request body before it is processed by your application, as well as the response body before it is returned to the client. This option also allows you to limit the usage of this class to the routes you wish, as only the endpoints under the APIRouter (i.e., router in the example below) will use the custom APIRoute class .
It should be noted that the same comments mentioned in Option 1 above, under the "Note" section, apply to this option as well. For example, if your API returns a StreamingResponse—such as in /video route of the example below, which is streaming a video file from an online source (public videos to test this can be found here, and you can even use a longer video than the one used below to see the effect more clearly)—you may come across issues on server side, if your server's RAM can't handle it, as well as delays on client side (and reverse proxy server, if using one) due to the whole (streaming) response being read and stored in RAM, before it is returned to the client (as explained earlier). In such cases, you could exclude such endpoints that return a StreamingResponse from the custom APIRoute class and limit its usage only to the desired routes—especially, if it is a large video file, or even live video that wouldn't likely make much sense to have it stored in the logs—simply by not using the #<name_of_router> decorator (i.e., #router in the example below) for such endpoints, but rather using the #<name_of_app> decorator (i.e., #app in the example below), or some other APIRouter or sub application.
Working Example
from fastapi import FastAPI, APIRouter, Response, Request
from starlette.background import BackgroundTask
from starlette.responses import StreamingResponse
from fastapi.routing import APIRoute
from starlette.types import Message
from typing import Callable, Dict, Any
import logging
import httpx
def log_info(req_body, res_body):
logging.info(req_body)
logging.info(res_body)
class LoggingRoute(APIRoute):
def get_route_handler(self) -> Callable:
original_route_handler = super().get_route_handler()
async def custom_route_handler(request: Request) -> Response:
req_body = await request.body()
response = await original_route_handler(request)
if isinstance(response, StreamingResponse):
res_body = b''
async for item in response.body_iterator:
res_body += item
task = BackgroundTask(log_info, req_body, res_body)
return Response(content=res_body, status_code=response.status_code,
headers=dict(response.headers), media_type=response.media_type, background=task)
else:
res_body = response.body
response.background = BackgroundTask(log_info, req_body, res_body)
return response
return custom_route_handler
app = FastAPI()
router = APIRouter(route_class=LoggingRoute)
logging.basicConfig(filename='info.log', level=logging.DEBUG)
#router.post('/')
def main(payload: Dict[Any, Any]):
return payload
#router.get('/video')
def get_video():
url = 'https://storage.googleapis.com/gtv-videos-bucket/sample/ForBiggerBlazes.mp4'
def gen():
with httpx.stream('GET', url) as r:
for chunk in r.iter_raw():
yield chunk
return StreamingResponse(gen(), media_type='video/mp4')
app.include_router(router)

You may try to customize APIRouter like in FastAPI official documentation:
import time
from typing import Callable
from fastapi import APIRouter, FastAPI, Request, Response
from fastapi.routing import APIRoute
class TimedRoute(APIRoute):
def get_route_handler(self) -> Callable:
original_route_handler = super().get_route_handler()
async def custom_route_handler(request: Request) -> Response:
before = time.time()
response: Response = await original_route_handler(request)
duration = time.time() - before
response.headers["X-Response-Time"] = str(duration)
print(f"route duration: {duration}")
print(f"route response: {response}")
print(f"route response headers: {response.headers}")
return response
return custom_route_handler
app = FastAPI()
router = APIRouter(route_class=TimedRoute)
#app.get("/")
async def not_timed():
return {"message": "Not timed"}
#router.get("/timed")
async def timed():
return {"message": "It's the time of my life"}
app.include_router(router)

As the other answers did not work for me and I searched quite extensively on stackoverflow to fix this problem, I will show my solution below.
The main issue is that when using the request body or response body many of the approaches/solutions offered online do simply not work as the request/response body is consumed in reading it from the stream.
To solve this issue I adapted an approach that basically reconstructs the request and response after reading them. This is heavily based on the comment by user 'kovalevvlad' on https://github.com/encode/starlette/issues/495.
Custom middleware is created that is later added to the app to log all requests and responses. Note that you need some kind of logger to store your logs.
from json import JSONDecodeError
import json
import logging
from typing import Callable, Awaitable, Tuple, Dict, List
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
from starlette.responses import Response, StreamingResponse
from starlette.types import Scope, Message
# Set up your custom logger here
logger = ""
class RequestWithBody(Request):
"""Creation of new request with body"""
def __init__(self, scope: Scope, body: bytes) -> None:
super().__init__(scope, self._receive)
self._body = body
self._body_returned = False
async def _receive(self) -> Message:
if self._body_returned:
return {"type": "http.disconnect"}
else:
self._body_returned = True
return {"type": "http.request", "body": self._body, "more_body": False}
class CustomLoggingMiddleware(BaseHTTPMiddleware):
"""
Use of custom middleware since reading the request body and the response consumes the bytestream.
Hence this approach to basically generate a new request/response when we read the attributes for logging.
"""
async def dispatch( # type: ignore
self, request: Request, call_next: Callable[[Request], Awaitable[StreamingResponse]]
) -> Response:
# Store request body in a variable and generate new request as it is consumed.
request_body_bytes = await request.body()
request_with_body = RequestWithBody(request.scope, request_body_bytes)
# Store response body in a variable and generate new response as it is consumed.
response = await call_next(request_with_body)
response_content_bytes, response_headers, response_status = await self._get_response_params(response)
# Logging
# If there is no request body handle exception, otherwise convert bytes to JSON.
try:
req_body = json.loads(request_body_bytes)
except JSONDecodeError:
req_body = ""
# Logging of relevant variables.
logger.info(
f"{request.method} request to {request.url} metadata\n"
f"\tStatus_code: {response.status_code}\n"
f"\tRequest_Body: {req_body}\n"
)
# Finally, return the newly instantiated response values
return Response(response_content_bytes, response_status, response_headers)
async def _get_response_params(self, response: StreamingResponse) -> Tuple[bytes, Dict[str, str], int]:
"""Getting the response parameters of a response and create a new response."""
response_byte_chunks: List[bytes] = []
response_status: List[int] = []
response_headers: List[Dict[str, str]] = []
async def send(message: Message) -> None:
if message["type"] == "http.response.start":
response_status.append(message["status"])
response_headers.append({k.decode("utf8"): v.decode("utf8") for k, v in message["headers"]})
else:
response_byte_chunks.append(message["body"])
await response.stream_response(send)
content = b"".join(response_byte_chunks)
return content, response_headers[0], response_status[0]

How to log raw HTTP request/response in Python FastAPI?

We are writing a web service using Python FastAPI that is going to be hosted in Kubernetes. For auditing purposes, we need to save the raw JSON body of the request/response for specific routes. The body size of both request and response JSON is about 1MB, and preferably, this should not impact the response time.
How can we do that?

Option 1 - Using Middleware
You could use a Middleware. A middleware takes each request that comes to your application, and hence, allows you to handle the request before it is processed by any specific endpoint, as well as the response, before it is returned to the client. To create a middleware, you use the decorator #app.middleware("http") on top of a function, as shown below. As you need to consume the request body from the stream inside the middleware—using either request.body() or request.stream(), as shown in this answer (behind the scenes, the former method actually calls the latter, see here)—then it won't be available when you later pass the request to the corresponding endpoint. Thus, you can follow the approach described in this post to make the request body available down the line (i.e., using the set_body function below). As for the response body, you can use the same approach as described in this answer to consume the body and then return the response to the client. Either option described in the aforementioned linked answer would work; the below, however, uses Option 2, which stores the body in a bytes object and returns a custom Response directly (along with the status_code, headers and media_type of the original response).
To log the data, you could use a BackgroundTask, as described in this answer and this answer. A BackgroundTask will run only once the response has been sent (see Starlette documentation as well); thus, the client won't have to be waiting for the logging to complete before receiving the response (and hence, the response time won't be noticeably impacted).
Note
If you had a streaming request or response with a body that wouldn't fit into your server's RAM (for example, imagine a body of 100GB on a machine running 8GB RAM), it would become problematic, as you are storing the data to RAM, which wouldn't have enough space available to accommodate the accumulated data. Also, in case of a large response (e.g., a large FileResponse or StreamingResponse), you may be faced with Timeout errors on client side (or on reverse proxy side, if you are using one), as you would not be able to respond back to the client, until you have read the entire response body (as you are looping over response.body_iterator). You mentioned that "the body size of both request and response JSON is about 1MB"; hence, that should normally be fine (however, it is always a good practice to consider beforehand matters, such as how many requests your API is expected to be serving concurrently, what other applications might be using the RAM, etc., in order to rule whether this is an issue or not). If you needed to, you could limit the number of requests to your API endpoints using, for example, SlowAPI (as shown in this answer).
Limiting the usage of the middleware to specific routes only
You could limit the usage of the middleware to specific endpoints by:
checking the request.url.path inside the middleware against a
pre-defined list of routes for which you would like to log the
request and response, as described in this answer (see
"Update" section),
or using a sub application, as demonstrated in this
answer
or using a custom APIRoute class, as demonstrated in Option 2
below.
Working Example
from fastapi import FastAPI, APIRouter, Response, Request
from starlette.background import BackgroundTask
from fastapi.routing import APIRoute
from starlette.types import Message
from typing import Dict, Any
import logging
app = FastAPI()
logging.basicConfig(filename='info.log', level=logging.DEBUG)
def log_info(req_body, res_body):
logging.info(req_body)
logging.info(res_body)
async def set_body(request: Request, body: bytes):
async def receive() -> Message:
return {'type': 'http.request', 'body': body}
request._receive = receive
#app.middleware('http')
async def some_middleware(request: Request, call_next):
req_body = await request.body()
await set_body(request, req_body)
response = await call_next(request)
res_body = b''
async for chunk in response.body_iterator:
res_body += chunk
task = BackgroundTask(log_info, req_body, res_body)
return Response(content=res_body, status_code=response.status_code,
headers=dict(response.headers), media_type=response.media_type, background=task)
#app.post('/')
def main(payload: Dict[Any, Any]):
return payload
In case you would like to perform some validation on the request body—for example, ensruing that the request body size is not exceeding a certain value—instead of using request.body(), you can process the body one chunk at a time using the .stream() method, as shown below (similar to this answer).
#app.middleware('http')
async def some_middleware(request: Request, call_next):
req_body = b''
async for chunk in request.stream():
req_body += chunk
...
Option 2 - Using custom APIRoute class
You can alternatively use a custom APIRoute class—similar to here and here—which, among other things, would allow you to manipulate the request body before it is processed by your application, as well as the response body before it is returned to the client. This option also allows you to limit the usage of this class to the routes you wish, as only the endpoints under the APIRouter (i.e., router in the example below) will use the custom APIRoute class .
It should be noted that the same comments mentioned in Option 1 above, under the "Note" section, apply to this option as well. For example, if your API returns a StreamingResponse—such as in /video route of the example below, which is streaming a video file from an online source (public videos to test this can be found here, and you can even use a longer video than the one used below to see the effect more clearly)—you may come across issues on server side, if your server's RAM can't handle it, as well as delays on client side (and reverse proxy server, if using one) due to the whole (streaming) response being read and stored in RAM, before it is returned to the client (as explained earlier). In such cases, you could exclude such endpoints that return a StreamingResponse from the custom APIRoute class and limit its usage only to the desired routes—especially, if it is a large video file, or even live video that wouldn't likely make much sense to have it stored in the logs—simply by not using the #<name_of_router> decorator (i.e., #router in the example below) for such endpoints, but rather using the #<name_of_app> decorator (i.e., #app in the example below), or some other APIRouter or sub application.
Working Example
from fastapi import FastAPI, APIRouter, Response, Request
from starlette.background import BackgroundTask
from starlette.responses import StreamingResponse
from fastapi.routing import APIRoute
from starlette.types import Message
from typing import Callable, Dict, Any
import logging
import httpx
def log_info(req_body, res_body):
logging.info(req_body)
logging.info(res_body)
class LoggingRoute(APIRoute):
def get_route_handler(self) -> Callable:
original_route_handler = super().get_route_handler()
async def custom_route_handler(request: Request) -> Response:
req_body = await request.body()
response = await original_route_handler(request)
if isinstance(response, StreamingResponse):
res_body = b''
async for item in response.body_iterator:
res_body += item
task = BackgroundTask(log_info, req_body, res_body)
return Response(content=res_body, status_code=response.status_code,
headers=dict(response.headers), media_type=response.media_type, background=task)
else:
res_body = response.body
response.background = BackgroundTask(log_info, req_body, res_body)
return response
return custom_route_handler
app = FastAPI()
router = APIRouter(route_class=LoggingRoute)
logging.basicConfig(filename='info.log', level=logging.DEBUG)
#router.post('/')
def main(payload: Dict[Any, Any]):
return payload
#router.get('/video')
def get_video():
url = 'https://storage.googleapis.com/gtv-videos-bucket/sample/ForBiggerBlazes.mp4'
def gen():
with httpx.stream('GET', url) as r:
for chunk in r.iter_raw():
yield chunk
return StreamingResponse(gen(), media_type='video/mp4')
app.include_router(router)

You may try to customize APIRouter like in FastAPI official documentation:
import time
from typing import Callable
from fastapi import APIRouter, FastAPI, Request, Response
from fastapi.routing import APIRoute
class TimedRoute(APIRoute):
def get_route_handler(self) -> Callable:
original_route_handler = super().get_route_handler()
async def custom_route_handler(request: Request) -> Response:
before = time.time()
response: Response = await original_route_handler(request)
duration = time.time() - before
response.headers["X-Response-Time"] = str(duration)
print(f"route duration: {duration}")
print(f"route response: {response}")
print(f"route response headers: {response.headers}")
return response
return custom_route_handler
app = FastAPI()
router = APIRouter(route_class=TimedRoute)
#app.get("/")
async def not_timed():
return {"message": "Not timed"}
#router.get("/timed")
async def timed():
return {"message": "It's the time of my life"}
app.include_router(router)

As the other answers did not work for me and I searched quite extensively on stackoverflow to fix this problem, I will show my solution below.
The main issue is that when using the request body or response body many of the approaches/solutions offered online do simply not work as the request/response body is consumed in reading it from the stream.
To solve this issue I adapted an approach that basically reconstructs the request and response after reading them. This is heavily based on the comment by user 'kovalevvlad' on https://github.com/encode/starlette/issues/495.
Custom middleware is created that is later added to the app to log all requests and responses. Note that you need some kind of logger to store your logs.
from json import JSONDecodeError
import json
import logging
from typing import Callable, Awaitable, Tuple, Dict, List
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
from starlette.responses import Response, StreamingResponse
from starlette.types import Scope, Message
# Set up your custom logger here
logger = ""
class RequestWithBody(Request):
"""Creation of new request with body"""
def __init__(self, scope: Scope, body: bytes) -> None:
super().__init__(scope, self._receive)
self._body = body
self._body_returned = False
async def _receive(self) -> Message:
if self._body_returned:
return {"type": "http.disconnect"}
else:
self._body_returned = True
return {"type": "http.request", "body": self._body, "more_body": False}
class CustomLoggingMiddleware(BaseHTTPMiddleware):
"""
Use of custom middleware since reading the request body and the response consumes the bytestream.
Hence this approach to basically generate a new request/response when we read the attributes for logging.
"""
async def dispatch( # type: ignore
self, request: Request, call_next: Callable[[Request], Awaitable[StreamingResponse]]
) -> Response:
# Store request body in a variable and generate new request as it is consumed.
request_body_bytes = await request.body()
request_with_body = RequestWithBody(request.scope, request_body_bytes)
# Store response body in a variable and generate new response as it is consumed.
response = await call_next(request_with_body)
response_content_bytes, response_headers, response_status = await self._get_response_params(response)
# Logging
# If there is no request body handle exception, otherwise convert bytes to JSON.
try:
req_body = json.loads(request_body_bytes)
except JSONDecodeError:
req_body = ""
# Logging of relevant variables.
logger.info(
f"{request.method} request to {request.url} metadata\n"
f"\tStatus_code: {response.status_code}\n"
f"\tRequest_Body: {req_body}\n"
)
# Finally, return the newly instantiated response values
return Response(response_content_bytes, response_status, response_headers)
async def _get_response_params(self, response: StreamingResponse) -> Tuple[bytes, Dict[str, str], int]:
"""Getting the response parameters of a response and create a new response."""
response_byte_chunks: List[bytes] = []
response_status: List[int] = []
response_headers: List[Dict[str, str]] = []
async def send(message: Message) -> None:
if message["type"] == "http.response.start":
response_status.append(message["status"])
response_headers.append({k.decode("utf8"): v.decode("utf8") for k, v in message["headers"]})
else:
response_byte_chunks.append(message["body"])
await response.stream_response(send)
content = b"".join(response_byte_chunks)
return content, response_headers[0], response_status[0]

Is it possible to run custom code before Swagger validations in a python/flask server stub?

I'm using the swagger editor (OpenApi 2) for creating flask apis in python. When you define a model in swagger and use it as a schema for the body of a request, swagger validates the body before handing the control to you in the X_controller.py files.
I want to add some code before that validation happens (for printing logs for debugging purposes). Swagger just prints to stdout errors like the following and they are not useful when you have a lot of fields (I need the key that isn't valid).
https://host/path validation error: False is not of type 'string'
10.255.0.2 - - [20/May/2020:20:20:20 +0000] "POST /path HTTP/1.1" 400 116 "-" "GuzzleHttp/7"
I know tecnically you can remove the validations in swagger and do them manually in your code but I want to keep using this feature, when it works it's awesome.
Any ideas on how to do this or any alternative to be able to log the requests are welcome.

After some time studying the matter this is what I learnt.
First let's take a look at how a python-flask server made with Swagger Editor works.
Swagger Editor generates the server stub through Swagger Codegen using the definition written in Swagger Editor. This server stub returned by codegen uses the framework Connexion on top of flask to handle all the HTTP requests and responses, including the validation against the swagger definition (swagger.yaml).
Connexion is a framework that makes it easy to develop python-flask servers because it has a lot of functionality you'd have to make yourself already built in, like parameter validation. All we need to do is replace (in this case modify) these connexion validators.
There are three validators:
ParameterValidator
RequestBodyValidator
ResponseValidator
They get mapped to flask by default but we can replace them easily in the __main__.py file as we'll see.
Our goal is to replace the default logs and default error response to some custom ones. I'm using a custom Error model and a function called error_response() for preparing error responses, and Loguru for logging the errors (not mandatory, you can keep the original one).
To make the changes needed, looking at the connexion validators code, we can see that most of it can be reused, we only need to modify:
RequestBodyValidator: __call__() and validate_schema()
ParameterValidator: __call__()
So we only need to create two new classes that extend the original ones, and copy and modify those functions.
Be careful when copying and pasting. This code is based on connexion==1.1.15. If your are on a different version you should base your classes on it.
In a new file custom_validators.py we need:
import json
import functools
from flask import Flask
from loguru import logger
from requests import Response
from jsonschema import ValidationError
from connexion.utils import all_json, is_null
from connexion.exceptions import ExtraParameterProblem
from swagger_server.models import Error
from connexion.decorators.validation import ParameterValidator, RequestBodyValidator
app = Flask(__name__)
def error_response(response: Error) -> Response:
return app.response_class(
response=json.dumps(response.to_dict(), default=str),
status=response.status,
mimetype='application/json')
class CustomParameterValidator(ParameterValidator):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def __call__(self, function):
"""
:type function: types.FunctionType
:rtype: types.FunctionType
"""
#functools.wraps(function)
def wrapper(request):
if self.strict_validation:
query_errors = self.validate_query_parameter_list(request)
formdata_errors = self.validate_formdata_parameter_list(request)
if formdata_errors or query_errors:
raise ExtraParameterProblem(formdata_errors, query_errors)
for param in self.parameters.get('query', []):
error = self.validate_query_parameter(param, request)
if error:
response = error_response(Error(status=400, description=f'Error: {error}'))
return self.api.get_response(response)
for param in self.parameters.get('path', []):
error = self.validate_path_parameter(param, request)
if error:
response = error_response(Error(status=400, description=f'Error: {error}'))
return self.api.get_response(response)
for param in self.parameters.get('header', []):
error = self.validate_header_parameter(param, request)
if error:
response = error_response(Error(status=400, description=f'Error: {error}'))
return self.api.get_response(response)
for param in self.parameters.get('formData', []):
error = self.validate_formdata_parameter(param, request)
if error:
response = error_response(Error(status=400, description=f'Error: {error}'))
return self.api.get_response(response)
return function(request)
return wrapper
class CustomRequestBodyValidator(RequestBodyValidator):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def __call__(self, function):
"""
:type function: types.FunctionType
:rtype: types.FunctionType
"""
#functools.wraps(function)
def wrapper(request):
if all_json(self.consumes):
data = request.json
if data is None and len(request.body) > 0 and not self.is_null_value_valid:
# the body has contents that were not parsed as JSON
return error_response(Error(
status=415,
description="Invalid Content-type ({content_type}), JSON data was expected".format(content_type=request.headers.get("Content-Type", ""))
))
error = self.validate_schema(data, request.url)
if error and not self.has_default:
return error
response = function(request)
return response
return wrapper
def validate_schema(self, data, url):
if self.is_null_value_valid and is_null(data):
return None
try:
self.validator.validate(data)
except ValidationError as exception:
description = f'Validation error. Attribute "{exception.validator_value}" return this error: "{exception.message}"'
logger.error(description)
return error_response(Error(
status=400,
description=description
))
return None
Once we have our validators, we have to map them to the flask app (__main__.py) using validator_map:
validator_map = {
'parameter': CustomParameterValidator,
'body': CustomRequestBodyValidator,
'response': ResponseValidator,
}
app = connexion.App(__name__, specification_dir='./swagger/', validator_map=validator_map)
app.app.json_encoder = encoder.JSONEncoder
app.add_api(Path('swagger.yaml'), arguments={'title': 'MyApp'})
If you also need to replace the validator I didn't use in this example, just create a custom child class of ResponseValidator and replace it on the validator_map dictionary in __main__.py.
Connexion docs:
https://connexion.readthedocs.io/en/latest/request.html

Forgive me for repeating an answer first posted at https://stackoverflow.com/a/73051652/1630244
Have you tried the Connexion before_request feature? Here's an example that logs the headers and content before Connexion validates the body:
import connexion
import logging
from flask import request
logger = logging.getLogger(__name__)
conn_app = connexion.FlaskApp(__name__)
#conn_app.app.before_request
def before_request():
for h in request.headers:
logger.debug('header %s', h)
logger.debug('data %s', request.get_data())

What is the best way to force a keyword while using **kwargs?

I'm not sure if I have used the correct terminology in the question.
Currently, I am trying to make a wrapper/interface around Google's Blogger API (Blog service).
[I know it has been done already, but I am using this as a project to learn OOP/python.]
I have made a method that gets a set of 25 posts from a blog:
def get_posts(self, **kwargs):
""" Makes an API request. Returns list of posts. """
api_url = '/blogs/{id}/posts'.format(id=self.id)
return self._send_request(api_url, kwargs)
def _send_request(self, url, parameters={}):
""" Sends an HTTP GET request to the Blogger API.
Returns JSON decoded response as a dict. """
url = '{base}{url}?'.format(base=self.base, url=url)
# Requests formats the parameters into the URL for me
try:
r = requests.get(url, params=parameters)
except:
print "** Could not reach url:\n", url
return
api_response = r.text
return self._jload(api_response)
The problem is, I have to specify the API key every time I call the get_posts function:
someblog = BloggerClient(url='http://someblog.blogger.com', key='0123')
someblog.get_posts(key=self.key)
Every API call requires that the key be sent as a parameter on the URL.
Then, what is the best way to do that?
I'm thinking a possible way (but probably not the best way?), is to add the key to the kwargs dictionary in the _send_request():
def _send_request(self, url, parameters={}):
""" Sends an HTTP get request to Blogger API.
Returns JSON decoded response. """
# Format the full API URL:
url = '{base}{url}?'.format(base=self.base, url=url)
# The api key will be always be added:
parameters['key']= self.key
try:
r = requests.get(url, params=parameters)
except:
print "** Could not reach url:\n", url
return
api_response = r.text
return self._jload(api_response)
I can't really get my head around what is the best way (or most pythonic way) to do it.

You could store it in a named constant.
If this code doesn't need to be secure, simply
API_KEY = '1ih3f2ihf2f'
If it's going to be hosted on a server somewhere or needs to be more secure, you could store the value in an environment variable
In your terminal:
export API_KEY='1ih3f2ihf2f'
then in your python script:
import os
API_KEY = os.environ.get('API_KEY')

The problem is, I have to specify the API key every time I call the get_posts function:
If it really is just this one method, the obvious idea is to write a wrapper:
def get_posts(blog, *args, **kwargs):
returns blog.get_posts(*args, key=key, **kwargs)
Or, better, wrap up the class to do it for you:
class KeyRememberingBloggerClient(BloggerClient):
def __init__(self, *args, **kwargs):
self.key = kwargs.pop('key')
super(KeyRememberingBloggerClient, self).__init__(*args, **kwargs)
def get_posts(self, *args, **kwargs):
return super(KeyRememberingBloggerClient, self).get_posts(
*args, key=self.key, **kwargs)
So now:
someblog = KeyRememberingBloggerClient(url='http://someblog.blogger.com', key='0123')
someblog.get_posts()
Yes, you can override or monkeypatch the _send_request method that all of the other methods use, but if there's only 1 or 2 methods that need to be fixed, why delve into the undocumented internals of the class, and fork the body of one of those methods just so you can change it in a way you clearly weren't expected to, instead of doing it cleanly?
Of course if there are 90 different methods scattered across 4 different classes, you might want to consider building these wrappers programmatically (and/or monkeypatching the classes)… or just patching the one private method, as you're doing. That seems reasonable.

How do I unit test a module that relies on urllib2?

I've got a piece of code that I can't figure out how to unit test! The module pulls content from external XML feeds (twitter, flickr, youtube, etc.) with urllib2. Here's some pseudo-code for it:
params = (url, urlencode(data),) if data else (url,)
req = Request(*params)
response = urlopen(req)
#check headers, content-length, etc...
#parse the response XML with lxml...
My first thought was to pickle the response and load it for testing, but apparently urllib's response object is unserializable (it raises an exception).
Just saving the XML from the response body isn't ideal, because my code uses the header information too. It's designed to act on a response object.
And of course, relying on an external source for data in a unit test is a horrible idea.
So how do I write a unit test for this?

urllib2 has a functions called build_opener() and install_opener() which you should use to mock the behaviour of urlopen()
import urllib2
from StringIO import StringIO
def mock_response(req):
if req.get_full_url() == "http://example.com":
resp = urllib2.addinfourl(StringIO("mock file"), "mock message", req.get_full_url())
resp.code = 200
resp.msg = "OK"
return resp
class MyHTTPHandler(urllib2.HTTPHandler):
def http_open(self, req):
print "mock opener"
return mock_response(req)
my_opener = urllib2.build_opener(MyHTTPHandler)
urllib2.install_opener(my_opener)
response=urllib2.urlopen("http://example.com")
print response.read()
print response.code
print response.msg

It would be best if you could write a mock urlopen (and possibly Request) which provides the minimum required interface to behave like urllib2's version. You'd then need to have your function/method which uses it able to accept this mock urlopen somehow, and use urllib2.urlopen otherwise.
This is a fair amount of work, but worthwhile. Remember that python is very friendly to ducktyping, so you just need to provide some semblance of the response object's properties to mock it.
For example:
class MockResponse(object):
def __init__(self, resp_data, code=200, msg='OK'):
self.resp_data = resp_data
self.code = code
self.msg = msg
self.headers = {'content-type': 'text/xml; charset=utf-8'}
def read(self):
return self.resp_data
def getcode(self):
return self.code
# Define other members and properties you want
def mock_urlopen(request):
return MockResponse(r'<xml document>')
Granted, some of these are difficult to mock, because for example I believe the normal "headers" is an HTTPMessage which implements fun stuff like case-insensitive header names. But, you might be able to simply construct an HTTPMessage with your response data.

Build a separate class or module responsible for communicating with your external feeds.
Make this class able to be a test double. You're using python, so you're pretty golden there; if you were using C#, I'd suggest either in interface or virtual methods.
In your unit test, insert a test double of the external feed class. Test that your code uses the class correctly, assuming that the class does the work of communicating with your external resources correctly. Have your test double return fake data rather than live data; test various combinations of the data and of course the possible exceptions urllib2 could throw.
Aand... that's it.
You can't effectively automate unit tests that rely on external sources, so you're best off not doing it. Run an occasional integration test on your communication module, but don't include those tests as part of your automated tests.
Edit:
Just a note on the difference between my answer and #Crast's answer. Both are essentially correct, but they involve different approaches. In Crast's approach, you use a test double on the library itself. In my approach, you abstract the use of the library away into a separate module and test double that module.
Which approach you use is entirely subjective; there's no "correct" answer there. I prefer my approach because it allows me to build more modular, flexible code, something I value. But it comes at a cost in terms of additional code to write, something that may not be valued in many agile situations.

You can use pymox to mock the behavior of anything and everything in the urllib2 (or any other) package. It's 2010, you shouldn't be writing your own mock classes.

I think the easiest thing to do is to actually create a simple web server in your unit test. When you start the test, create a new thread that listens on some arbitrary port and when a client connects just returns a known set of headers and XML, then terminates.
I can elaborate if you need more info.
Here's some code:
import threading, SocketServer, time
# a request handler
class SimpleRequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
data = self.request.recv(102400) # token receive
senddata = file(self.server.datafile).read() # read data from unit test file
self.request.send(senddata)
time.sleep(0.1) # make sure it finishes receiving request before closing
self.request.close()
def serve_data(datafile):
server = SocketServer.TCPServer(('127.0.0.1', 12345), SimpleRequestHandler)
server.datafile = datafile
http_server_thread = threading.Thread(target=server.handle_request())
To run your unit test, call serve_data() then call your code that requests a URL that looks like http://localhost:12345/anythingyouwant.

Why not just mock a website that returns the response you expect? then start the server in a thread in setup and kill it in the teardown. I ended up doing this for testing code that would send email by mocking an smtp server and it works great. Surely something more trivial could be done for http...
from smtpd import SMTPServer
from time import sleep
import asyncore
SMTP_PORT = 6544
class MockSMTPServer(SMTPServer):
def __init__(self, localaddr, remoteaddr, cb = None):
self.cb = cb
SMTPServer.__init__(self, localaddr, remoteaddr)
def process_message(self, peer, mailfrom, rcpttos, data):
print (peer, mailfrom, rcpttos, data)
if self.cb:
self.cb(peer, mailfrom, rcpttos, data)
self.close()
def start_smtp(cb, port=SMTP_PORT):
def smtp_thread():
_smtp = MockSMTPServer(("127.0.0.1", port), (None, 0), cb)
asyncore.loop()
return Thread(None, smtp_thread)
def test_stuff():
#.......snip noise
email_result = None
def email_back(*args):
email_result = args
t = start_smtp(email_back)
t.start()
sleep(1)
res.form["email"]= self.admin_email
res = res.form.submit()
assert res.status_int == 302,"should've redirected"
sleep(1)
assert email_result is not None, "didn't get an email"

Trying to improve a bit on #john-la-rooy answer, I've made a small class allowing simple mocking for unit tests
Should work with python 2 and 3
try:
import urllib.request as urllib
except ImportError:
import urllib2 as urllib
from io import BytesIO
class MockHTTPHandler(urllib.HTTPHandler):
def mock_response(self, req):
url = req.get_full_url()
print("incomming request:", url)
if url.endswith('.json'):
resdata = b'[{"hello": "world"}]'
headers = {'Content-Type': 'application/json'}
resp = urllib.addinfourl(BytesIO(resdata), header, url, 200)
resp.msg = "OK"
return resp
raise RuntimeError('Unhandled URL', url)
http_open = mock_response
#classmethod
def install(cls):
previous = urllib._opener
urllib.install_opener(urllib.build_opener(cls))
return previous
#classmethod
def remove(cls, previous=None):
urllib.install_opener(previous)
Used like this:
class TestOther(unittest.TestCase):
def setUp(self):
previous = MockHTTPHandler.install()
self.addCleanup(MockHTTPHandler.remove, previous)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python WSGI: Reading env['wsgi.input'] more than once - python

You can try seeking back to the beginning, but you may find that you'll have to replace it with a StringIO containing what you just read out.

The following specification deals with that exact problem, providing explanation of the problem as well as the solution including source code and special cases to take into account: http://wsgi.readthedocs.org/en/latest/specifications/handling_post_forms.html

Related

Gateway Time-out with StreamingResponse and custom Middleware fastapi [duplicate]

How to log raw HTTP request/response in Python FastAPI?

Is it possible to run custom code before Swagger validations in a python/flask server stub?

What is the best way to force a keyword while using **kwargs?

How do I unit test a module that relies on urllib2?

Categories

Resources