Validating query string parameters and request body in AWS lambda using webargs - python

I am trying to figure out ways of validating query string parameters for an API created using AWS API gateway and backed by a Python Lambda function. API Gateway can validate the presence of the required query string parameters. However, I could not find a way for additional validations such as determining if the length of a certain parameter is within some limit (e.g. config_id should be minimum 7 characters long). Such validations are possible for the request body using the API Gateway request validation. Refer this link. However, for the query string paramaters only required/not required validation is possible as it does not use any json schema for validation.
Hence, to overcome this issue, I decided to try the webargs module in Python for validating the query string parameters. It is generally used for request validations for APIs created using Python frameworks such as flask or django. I am using the core parser (Refer webargs doc) as follows:
from webargs import fields, validate, core, ValidationError
parser = core.Parser()
params = {"config_id": fields.Str(required=True, validate=lambda p: len(p) >= 7)}
def main(event, context: Dict):
try:
# print(event["queryStringParameters"])
input_params = event.get("queryStringParameters")
print("queryStringParameters: ", str(input_params))
if input_params is None:
input_params = {}
parsed_params = parser.parse(params, input_params)
print("parsedParams: ", str(parsed_params))
except ValidationError as e:
return {
"statusCode": 400,
"headers": {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Credentials": True,
"x-amzn-ErrorType": "ValidationError",
},
"body": str(e),
}
This is how the validation is done in the lambda function. However, only the required validation works correctly. When I pass a config_id of length 5 it does not return any error and proceeds further in the lambda function.
What could be going wrong with this? The parser seems to work, however, the validate function doesn't.
Any help is appreciated as I am new to this. Also, is there a better way of doing validations in lambda functions especially for queryStringParameters? It can be handled by the code, but we can have many parameters and many APIs which makes writing code for all such validations a cumbersome task. The webargs module comes in handy.

webargs Library is mostly used for validating HTTP Requests coming via popular Python frameworks like Flask, Django, Bottle etc. The core Parser that you are trying to use should not be used directly as it does not have the methods like load_json, load_query etc implemented (Source code showing the missing implementation here). There are child class implementations of the core parser for each of the frameworks, but using them on API GW does not make sense.
So it's better to use a simpler json validation library like jsonschema. I've modified your code to use jsonschema instead of webargs as follows -
from jsonschema import validate, ValidationError
schema = {
"type" : "object",
"properties" : {
"queryStringParameters" : {
"type" : "object",
"properties": {
"config_id": {
"type": "string",
"minLength": 7,
}
}
},
},
}
def main(event, context):
try:
validate(instance=event, schema=schema)
except ValidationError as e:
return {
"statusCode": 400,
"headers": {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Credentials": True,
"x-amzn-ErrorType": "ValidationError",
},
"body": e.message,
}

Related

How to customize FastAPI request body documentation

I'm using FastAPI to serve ML models. My endpoint receives and sends JSON data of the form:
[
{"id": 1, "data": [{"code": "foo", "value": 0.1}, {"code": "bar", "value": 0.2}, ...]},
{"id": 2, "data": [{"code": "baz", "value": 0.3}, {"code": "foo", "value": 0.4}, ...]},
...
]
My models and app look as follows:
from typing import Dict, List
from fastapi import Body
from fastapi.responses import JSONResponse
from pydantic import BaseModel
import pandas as pd
class Item(BaseModel):
code: str
value: float
class Sample(BaseModel):
id: int
data: List[Item]
app = FastAPI()
#app.post("/score", response_model=List[Sample]) # correct response documentation
def score(input_data: List[Sample] = Body(...)): # 1. conversion dict -> Pydantic models, slow
input_df: pd.DataFrame = models_to_df(input_data) # 2. conversion Pydantic models -> df
output_df: pd.DataFrame = predict(input_df)
output_data: Dict = df_to_dict(output_df) # direct conversion df -> dict, fast
return JSONResponse(output_data)
Everything works fine and the automated documentation looks good, but the performance is bad. Since the data can be quite large, Pydantic conversion and validation can take a lot of time.
This can easily be solved by writing direct conversion functions between JSON data and data frames, skipping the intermediary representation of Pydantic models. This is what I did for the response, achieving a 10x speedup, at the same time preserving the automated API documentation with the response_model=List[Sample] argument.
I would like to achieve the same with the request: being able to use custom JSON input parsing, while at the same time preserving API documentation using Pydantic models. Sadly I can't find a way to do it in the FastAPI docs. How can I accomplish this?
You can always accept the raw request, load the request.body() data as bytes and do your own decoding. The schema of the request body should then be documented as a (partial) raw OpenAPI Operation structure using the openapi_extra argument to the #app.post() decorator:
#app.post(
"/score",
response_model=List[Sample],
openapi_extra={
"requestBody": {
"content": {
"application/json": {
"schema": {
"type": "array",
"items": Sample.schema(ref_template="#/components/schemas/{model}"),
}
}
}
}
},
)
async def score(request: Request):
raw_body = await request.body()
# parse the `raw_body` request data (bytes) into your DF directly.
The openapi_extra structure is merged into the operation structure generated from other components (such as the response_model). I used your existing Sample model here to provide the schema for the array items, but you can also map out the whole schema manually.
Instead of using the raw bytes of the body, you could also delegate parsing as JSON to the request object:
data = await request.json()
If there is a way to parse the data as a stream (pushing chunks to a parser), you could avoid the memory overhead of loading the whole body at once by treating the request as a stream in an async loop:
parser = ... # something that can be fed chunks of data
async for chunk in request.stream():
parser.feed(chunk)
This is documented in the Custom OpenAPI path operation schema section in the Advanced User Guide. The same section also covers Us[ing] the Request object directly, and the various options for handling the Request body can be found in the Starlette Request class documentation.

Creating rest api with get method using AWS amplify, with a python lambda function

Is there a way to set up a rest api using a python lambda function with a get method that uses query parameters by using the amplify CLI or editing the code?
I know this can be done through the AWS Management Console, but was hoping for a more code-oriented solution. Below is the sample lambda I'm trying to use and a simple example of how I would like to get different api responses (length of dummy text string) based on the get method called by the api using something like "curl https://......../myapi?length=4"
import json
def handler(event, context):
print('received event:')
str_len = event['queryStringParameters']['length']
body = {
"message" : "DUMMY TEST"[1:str_len]
}
response = {
"statusCode" : 200,
"body" : json.dumps(body),
"headers" : {
"Content-Type": "application/json",
"Access-Control-Allow-Origin": "*"
}
}
return response

How to validate JSON request body before sending PUT request in python

It's when I send a PUT request to my API endpoint from python with a JSON request body I receive empty request body, because sometimes It's containing special characters which is not supported by JSON.
How can I sanitize my JSON before sending my request?
I've tried with stringify and parsing json before I sent my request!
profile = json.loads(json.dumps(profile))
My example invalid json is:
{
"url": "https://www.example.com/edmund-chand/",
"name": "Edmund Chand",
"current_location": "FrankfurtAmMainArea, Germany",
"education": [],
"skills": []
}
and My expected validated json should be:
{
"url": "https://www.example.com/edmund-chand/",
"name": "Edmund Chand",
"current_location": "Frankfurt Am Main Area, Germany",
"education": [],
"skills": []
}
If you're looking for something quick to sanitize json data for limited fields i.e. current_location, you can try something like the following below:
def sanitize(profile):
profile['current_location'] = ', '.join([val.strip() for val in profile['current_location'].split(',')])
return profile
profile = sanitize(profile)
The idea here is that you would write code to sanitize each bits in that function and send it your api or throw exception if invalid etc.
For more robust validation, you can consider using jsonschema package. More details here.
With that package you can validate strings and json schema more flexibly.
Example taken from the package readme:
from jsonschema import validate
# A sample schema, like what we'd get from json.load()
schema = {
"type" : "object",
"properties" : {
"url" : {"type" : "string", "format":"uri"},
"current_location" : {"type" : "string", "maxLength":25, "pattern": "your_regex_pattern"},
},
}
# If no exception is raised by validate(), the instance is valid.
validate(instance=profile, schema=schema)
You can find more infor and types of available validation for strings here.
Thank you #Rithin for your solution but that one seems more coupled with one field of the whole JSON.
I found a solution to replace it with below example code which works for any field:
profile = json.loads(json.dumps(profile).replace("\t", " "))

Nested Dict As HttpRequest Django

I am trying to write some test cases for some code I've developed using Elasticsearch and Django. The concept is straightforward - I just want to test a get request, which will be an Elasticsearch query. However, I am constructing the query as a nested dict. When I pass the nested dict to the Client object in the test script it gets passed through Django until it ends up at the urlencode function which doesn't look like it can handle nested dicts only MultivalueDicts. Any suggestions or solutions? I don't want to use any additional packages as I don't want to depend on potentially non-supported packages for this application.
Generic Code:
class MyViewTest(TestCase):
es_connection = elasticsearch.Elasticsearch("localhost:9200")
def test_es_query(self):
client = Client()
query = {
"query": {
"term": {
"city": "some city"
}
}
}
response = client.get("", query)
print(response)
Link for urlencode function: urlencode Django
The issue is clearly at the conditional statement when the urlencode function checks if the dictionary value is a str or bytes object. If it isn't it creates a generator object which can never access the nested portions of the dictionary.
EDIT: 07/25/2018
So I was able to come up with a temporary work around to at least run the test. However, it is ugly and I feel like there must be a better way. The first thing I tried was specifying the content_type and converting the dict to a json string first. However, Django still kicked back and error in the urlencode function.
class MyViewTest(TestCase):
es_connection = elasticsearch.Elasticsearch("localhost:9200")
def test_es_query(self):
client = Client()
query = {
"query": {
"term": {
"city": "some city"
}
}
}
response = client.get("", data=json.dumps(query), content_type="application/json")
print(response)
So instead I had to do:
class MyViewTest(TestCase):
es_connection = elasticsearch.Elasticsearch("localhost:9200")
def test_es_query(self):
client = Client()
query = {
"query": {
"term": {
"city": "some city"
}
}
}
query = json.dumps(query)
response = client.get("", data={"q": query}, content_type="application/json")
print(response)
This let me send the HttpRequest to my View and parse it back out using:
json.loads(request.GET["q"])
Then I was able to successfully get the requested data from Elasticsearch and return it as an HttpResponse. I feel like in Django though there has to be a way to just pass a json formatted string directly to the Client object's get function. I thought specifying the content_type as application/json would work but it still calls the urlencode function. Any ideas? I really don't want to implement this current system into production.

Need Example of passing Jasper Reports Parameters for REST v2 API using JSON

When I look at the documentation for passing parameters to the Jasper Report REST 2 API here: http://community.jaspersoft.com/documentation/jasperreports-server-web-services-guide/v550/running-report-asynchronously I see that I need to have a "parameters" dict. The example in the link shows the XML which is not all that useful since it's unclear exactly what the equivalent JSON should look like. The closest I could find is in this link: http://community.jaspersoft.com/documentation/jasperreports-server-web-services-guide/v56/modifying-report-parameters. Now, I am sending the equivalent of that to the server (and every other permutation I can think of), and I continue to get a "400 Client Error: Bad Request" back. I could really use an exact example of the python code to generate the required "parameters" parameter for say "my_parameter_1="test_value_1".
Here is my current POST data (with a few params missing for brevity). I know this is correct since the report works fine if I omit the "parameters" parameter:
{
'outputFormat': 'pdf',
'parameters': [{'name': 'ReportID', 'value': ['my_value_1']}],
'async': 'true',
'pages': '',
'interactive': 'false'
}
Nice Job there Staggart. I got it now. Because I wasn't reading with max. scrutinity, I wasted some additional time. So the interested coder is not only advised to be aware of the nested, syntactictally interesting reportParameter-property, but especially that the value-property inside that is an array. I suppose one could pass some form of Lists/Arrays/Collections here?
What irritated me was, if I should construct more than one "reportParameter" property, but that would be nonsense according to
Does JSON syntax allow duplicate keys in an object.
So just for the record, how to post multiple parameters:
{
"reportUnitUri": "/reports/Top10/Top10Customers",
"async": true,
"freshData": true,
"saveDataSnapshot": false,
"outputFormat": "pdf",
"interactive": false,
"ignorePagination": true,
"parameters": {
"reportParameter": [
{
"name": "DATE_START_STRING",
"value": ["14.07.2014"]
},
{
"name": "DATE_END_STRING",
"value": ["14.10.2014"]
}
]
}
}
If someone accidently is struggling with communicating with jasper via REST and PHP. Do yourself a favour and use the Requests for PHP instead of pure CURL. It even has a fallback for internally using Sockets instead of CURL, when latter isn't available.
Upvote for you Staggart.
OK, thanks to rafkacz1 # http://community.jaspersoft.com/questions/825719/json-equivalent-xml-post-reportexecutions-rest-service who posted an answer, I figured it out. As he report there, the required format is:
"parameters":{
"reportParameter":[
{"name":"my_parameter_1","value":["my_value_1"]}
]
}
Pay particular attention to the plurality of "reportParameter".
Here is an example that worked for me. Im using Python 2.7, and the community edition of Jaspersoft. Like the C# example above, this example also uses the rest v2 which made it very simple for me to download a pdf report quickly
import requests
sess = requests.Session()
auth = ('username', 'password')
res = sess.get(url='http://your.jasper.domain:8080/jasperserver/', auth=auth)
res.raise_for_status()
url = 'http://your.jasper.domain:8080/jasperserver/rest_v2/reports/report_folder/sub_folder/report_name.pdf'
params = {'Month':'2', 'Year':'2017','Project': 'ProjectName'}
res = sess.get(url=url, params=params, stream=True)
res.raise_for_status()
path = '/path/to/Downloads/report_name.pdf'
with open(path, "wb") as f:
f.write(res.content)
Here's a full example about generate a report using Rest V2, in my case it's running on C#:
try {
var server = "http://localhost:8080/jasperserver";
var login = server + "/rest/login";
var report = "/rest_v2/reports/organization/Reports/report_name.pdf";
var client = new WebClient();
//Set the content type of the request
client.Headers[HttpRequestHeader.ContentType] = "application/x-www-form-urlencoded";
//Set the username and password
NameValueCollection parametros = new NameValueCollection();
parametros.Add("j_username", "jasperadmin");
parametros.Add("j_password", "123456");
//Request to login
client.UploadValues(login, "POST", parametros);
//Get session cookie
string session = client.ResponseHeaders.Get("Set-Cookie");
//Set session cookie to the next request
client.Headers.Add("Cookie", session);
//Generate report with parameters: "start" and "end"
var reporte = client.DownloadData(server + report + "?start=2015-10-01&end=2015-10-10");
//Returns the report as response
return File(reporte, "application/pdf", "test.pdf");
}catch(WebException e){
//return Content("There was a problem, status code: " + ((HttpWebResponse)e.Response).StatusCode);
return null;
}

Categories

Resources