I'm trying to figure out how to receive a file sent by a browser through an API call in Python.
The web client is allowed to send any types of files (let's say .txt, .docx, .xlsx, ...). I don't know if I should use binary or not.
The idea was to save the file after on S3. Now I know it's possible to use js libraries like Aws Amplify and generate a temporary url but i'm not too interested in that solution.
Any help appreciated, I've searched extensively a solution in Python but i can't find anything actually working !
My API is private and i'm using serverless to deploy.
files_post:
handler: post/post.post
events:
- http:
path: files
method: post
cors: true
authorizer:
name: authorizer
arn: ${cf:lCognito.CognitoUserPoolMyUserPool}
EDIT
I have a half solution that works for text files but doesn't for PDF, XLSX, or images, if someone had i'd be super happy
from cgi import parse_header, parse_multipart
from io import BytesIO
import json
def post(event, context):
print event['queryStringParameters']['filename']
c_type, c_data = parse_header(event['headers']['content-type'])
c_data['boundary'] = bytes(c_data['boundary']).encode("utf-8")
body_file = BytesIO(bytes(event['body']).encode("utf-8"))
form_data = parse_multipart(body_file, c_data)
s3 = boto3.resource('s3')
object = s3.Object('storage', event['queryStringParameters']['filename'])
object.put(Body=form_data['upload'][0])
You are using API Gateway so your lambda event will map to something like this (from Amazon Docs):
{
"resource": "Resource path",
"path": "Path parameter",
"httpMethod": "Incoming request's method name"
"headers": {String containing incoming request headers}
"multiValueHeaders": {List of strings containing incoming request headers}
"queryStringParameters": {query string parameters }
"multiValueQueryStringParameters": {List of query string parameters}
"pathParameters": {path parameters}
"stageVariables": {Applicable stage variables}
"requestContext": {Request context, including authorizer-returned key-value pairs}
"body": "A JSON string of the request payload."
"isBase64Encoded": "A boolean flag to indicate if the applicable request payload is Base64-encode"
}
You can pass the file as a base64 value in the body and decode it in your lambda function. Take the following Python snippet
def lambda_handler(event, context):
data = json.loads(event['body'])
# Let's say we user a regular <input type='file' name='uploaded_file'/>
encoded_file = data['uploaded_file']
decoded_file = base64.decodestring(encoded_file)
# now save it to S3
Related
I want to upload an image to s3 with lambda and Api gateway when i submit form how can i do it in python.
currently i am getting this error while i am trying to upload image through PostMan
Could not parse request body into json: Could not parse payload into json: Unexpected character (\'-\' (code 45))
my code currently is
import json
import boto3
import base64
s3 = boto3.client('s3')
def lambda_handler(event, context):
print(event)
try:
if event['httpMethod'] == 'POST' :
print(event['body'])
data = json.loads(event['body'])
name = data['name']
image = data['file']
image = image[image.find(",")+1:]
dec = base64.b64decode(image + "===")
s3.put_object(Bucket='', Key="", Body=dec)
return {'statusCode': 200, 'body': json.dumps({'message': 'successful lambda function call'}), 'headers': {'Access-Control-Allow-Origin': '*'}}
except Exception as e:
return{
'body':json.dumps(e)
}
Doing an upload through API Gateway and Lambda has its limitations:
You can not handle large files and there is an execution timeout of 30 seconds as I recall.
I would go with creating a presigned url that is requested by the client through API gateway, then use it as the endpoint to put the file.
Something like this will go in your Lambda ,
(This is a NodeJs example)
const uploadUrl = S3.getSignedUrl( 'putObject', {
Bucket: get(aPicture, 'Bucket'),
Key: get( aPicture, 'Key'),
Expires: 600,
})
callback( null, { url })
(NodeJs)
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getSignedUrl-property
(Python)
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.generate_presigned_url
You don't need Lambda for this. You can proxy S3 API with API Gateway
https://docs.aws.amazon.com/apigateway/latest/developerguide/integrating-api-with-aws-services-s3.html
Ooops it's over engineering.
anyways, it's seems like your are getting error from API-gateway, first check the lambda through "test lambda" using aws console, if it's working fine and getting the response back from lambda then please check with API-gateway side.
i doubt you are using some mapping templates in gateway, AWS uses AWS Velocity templates, which looks like JSON, but it's different. those mapping template at integration request causing this issue.
So the issue is that I initially send this as raw data via Postman:
Raw data as JSON send via Postman:
{
"id":1,
"receiver":"2222222222222",
"message":{
"Name":"testing",
"PersonId":2,
"CarId":2,
"GUID":"1s3q1d-s546dq1-8e22e",
"LineId":2,
"SvcId":2,
"Lat":-64.546547,
"Lon":-64.546547,
"TimeStamp":"2021-03-18T08:29:36.758Z",
"Recorder":"dq65ds4qdezzer",
"Env":"DEV"
},
"operator":20404,
"sender":"MSISDN",
"binary":1,
"sent":"2021-03-18T08:29:36.758Z"
}
Once this is caught by Event Hub Capture it converts to an Avro file.
I am trying to retrieve the data by using fastavro and converting it to a JSON format.
The problem is that I am not getting back the same raw data that was initially sent by Postman. I can't find a way to convert it back to its original state, why does Avro also send me additional information from Postman?
I probably need to find a way to set the "Body" to only convert. But for some reason, it also adds "bytes" inside the body
I am just trying to get my original raw data back that was sent via Postman.
init.py (Azure function)
import logging
import os
import string
import json
import uuid
import avro.schema
import tempfile
import azure.functions as func
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
from fastavro import reader, json_writer
#Because the Apache Python avro package is written in pure Python, it is relatively slow, therefoer I make use of fastavro
def avroToJson(avroFile):
with open("json_file.json", "w") as json_file:
with open(avroFile, "rb") as avro_file:
avro_reader = reader(avro_file)
json_writer(json_file, avro_reader.writer_schema, avro_reader)
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
print('Processor started using path ' + os.getcwd())
connect_str = "###########"
container = ContainerClient.from_connection_string(connect_str, container_name="####")
blob_list = container.list_blobs() # List the blobs in the container.
for blob in blob_list:
# Content_length == 508 is an empty file, so process only content_length > 508 (skip empty files).
if blob.size > 508:
print('Downloaded a non empty blob: ' + blob.name)
# Create a blob client for the blob.
blob_client = ContainerClient.get_blob_client(container, blob=blob.name)
# Construct a file name based on the blob name.
cleanName = str.replace(blob.name, '/', '_')
cleanName = os.getcwd() + '\\' + cleanName
# Download file
with open(cleanName, "wb+") as my_file: # Open the file to write. Create it if it doesn't exist.
my_file.write(blob_client.download_blob().readall())# Write blob contents into the file.
avroToJson(cleanName)
with open('json_file.json','r') as file:
jsonStr = file.read()
return func.HttpResponse(jsonStr, status_code=200)
Expected result:
{
"id":1,
"receiver":"2222222222222",
"message":{
"Name":"testing",
"PersonId":2,
"CarId":2,
"GUID":"1s3q1d-s546dq1-8e22e",
"LineId":2,
"SvcId":2,
"Lat":-64.546547,
"Lon":-64.546547,
"TimeStamp":"2021-03-18T08:29:36.758Z",
"Recorder":"dq65ds4qdezzer",
"Env":"DEV"
},
"operator":20404,
"sender":"MSISDN",
"binary":1,
"sent":"2021-03-18T08:29:36.758Z"
}
Actual result:
{
"SequenceNumber":19,
"Offset":"10928",
"EnqueuedTimeUtc":"4/1/2021 8:43:19 AM",
"SystemProperties":{
"x-opt-enqueued-time":{
"long":1617266599145
}
},
"Properties":{
"Postman-Token":{
"string":"37ff4cc6-9124-45e5-ba9d-######e"
}
},
"Body":{
"bytes":"{\r\n \"id\": 1,\r\n \"receiver\": \"2222222222222\",\r\n \"message\": {\r\n \"Name\": \"testing\",\r\n \"PersonId\": 2,\r\n \"CarId\": 2,\r\n \"GUID\": \"1s3q1d-s546dq1-8e22e\",\r\n \"LineId\": 2,\r\n \"SvcId\": 2,\r\n \"Lat\": -64.546547,\r\n \"Lon\": -64.546547,\r\n \"TimeStamp\": \"2021-03-18T08:29:36.758Z\",\r\n \"Recorder\": \"dq65ds4qdezzer\",\r\n \"Env\": \"DEV\"\r\n },\r\n \"operator\": 20404,\r\n \"sender\": \"MSISDN\",\r\n \"binary\": 1,\r\n \"sent\": \"2021-03-29T08:29:36.758Z\"\r\n}"
}
}
This question was originally posted under this Alternative to Azure Event Hub Capture for sending Event Hub messages to Blob Storage? thread because another question emerged from the initial issue.
In case this is not the way to proceed on StackOverflow, please feel free to comment on how I should handle this next time. Kind regards.
Try Returning body:
return func.HttpResponse(json.loads(jsonStr)['body']['bytes'], status_code=200)
JSON document in the HTTP request body goes into Event Hub's message body and that message soon written into capture destination with some extra properties as overhead such as sequence number, offset, enqueue time, system properties and etc.
At the time of deserialization, reader needs to use body object alone which should be the same body from HTTP request.
Feel free to check Event Hubs AVRO schema in this page - https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-capture-overview#use-avro-tools
so I have this sort of http cloud function (python 3.7)
from google.cloud import storage
def upload_blob(bucket_name, blob_text, destination_blob_name):
"""Uploads a file to the bucket."""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_string(blob_text)
print('File {} uploaded to {}.'.format(
source_file_name,
destination_blob_name))
Now I want to test this function from Testing tab of the Functions details page where it asks for triggering event in JSON format.
i tried with various formats like
{
"name":[param1,param2,param3]
}
but always ended up getting error
upload_blob() missing 2 required positional arguments: 'blob_text' and 'destination_blob_name'
Also tried to find out any documentation but was unable to do so, can you help in pointing to the right direction
Your upload_blob function doesn't look like an HTTP Cloud Function. HTTP Cloud Functions take a single parameter, request. For example:
def hello_http(request):
...
See https://cloud.google.com/functions/docs/writing/http#writing_http_helloworld-python for more details.
As suggested by Dustin you should add an entry point which parses the json request.
A code snipped for the following request will be as follows:
{"bucket_name": "xyz", "blob_text": "abcdfg", "destination_blob_name": "sample.txt"}
The entry function will be as follows:
entry_point_function(request):
{
content_type = request.headers['content-type']
if 'application/json' in content_type:
request_json = request.get_json(silent=True)
if request_json and 'bucket_name' in request_json:
text = request_json['bucket_name']
else:
raise ValueError("JSON is invalid, or missing a 'bucket_name' property")
}
Rename the entry point in the function configuration. For the cloud UI
You need to change
Context: I have some log files in an S3 bucket that I need to retrieve. The permissions on the bucket prevent me from downloading them directly from the S3 bucket console. I need a "backdoor" approach to retrieve the files. I have an API Gateway setup to hit a Lambda, which will figure out what files to retrieve and get them from the S3 bucket. However, the files are over 10 MB and the AWS API Gateway has a maximum payload size of 10 MB. Now, I need a way to compress the files and serve them to the client as a downloadable zip.
import json
import boto3
import zipfile
import zlib
import os
S3 = boto3.resource('s3')
BUCKET = S3.Bucket(name="my-bucket")
TEN_MEGA_BYTES = 1000000000
def lambda_handler(event, context):
# utilize Lambda's temporary storage (512 MB)
retrieved = zipfile.ZipFile("/tmp/retrieved.zip", mode="w", compression=zipfile.ZIP_DEFLATED, compresslevel=9)
for bucket_obj in BUCKET.objects.all():
# logic to decide which file I want is done here
log_file_obj = BUCKET.Object(bucket_obj.key).get()
# get the object's binary encoded content (a bytes object)
content = log_file_obj["Body"].read()
# write the content to a file within the zip
# writestr() requires a bytes or str object
retrieved.writestr(bucket_obj.key, content)
# close the zip
retrieved.close()
# visually checking zip size for debugging
zip_size = os.path.getsize("/tmp/retrieved.zip")
print("{} bytes".format(zip_size), "{} percent of 10 MB".format(zip_size / TEN_MEGA_BYTES * 100))
return {
"header": {
"contentType": "application/zip, application/octet-stream",
"contentDisposition": "attachment, filename=retrieved.zip",
"contentEncoding": "deflate"
},
"body": retrieved
}
# return retrieved
I have tried returning the zipfile object directly and within a JSON structure with headers that are supposed to be mapped from the integration response to the method response (i.e. the headers I'm setting programmatically in the Lambda should be mapped to the response headers that the client actually receives). In either case, I get a marshal error.
Response:
{
"errorMessage": "Unable to marshal response: <zipfile.ZipFile [closed]> is not JSON serializable",
"errorType": "Runtime.MarshalError"
}
I have done a lot of tinkering in the API Gateway in the AWS Console trying to set different combinations of headers and/or content types, but I am stuck. I'm not entirely sure what would be the correct header/content-type combination.
From the above error message, it appears like Lambda can only return JSON structures but I am unable to confirm either way.
I got to the point of being able to return a compressed JSON payload, not a downloadable .zip file, but that is good enough for the time being. My team and I are looking into requesting a time-limited pre-signed URL for the S3 bucket that will allow us to bypass the permissions for a limited time to access the files. Even with the compression, we expect to someday reach the point where even the compressed payload is too large for the API Gateway to handle. We also discovered that Lambda has a smaller payload limit than even the API Gateway, Lambda Limits, 6 MB (synchronous).
Solution:
This article was a huge help, Lambda Compression. The main thing that was missing was using "Lambda Proxy Integration." When configuring the API Gateway Integration Request, choose "Use Lambda Proxy Integration." That will limit your ability to use any mapping templates on your method request and will mean that you have to return a specific structure from your Lambda function. Also, ensure content encoding is enabled in the settings of the API Gateway and with 'application/json' specified as an acceptable Binary Media Type.
Then, when sending the request, use these headers:
Accept-Encoding: application/gzip,
Accept: application/json
import json
import boto3
import gzip
import base64
from io import BytesIO
S3 = boto3.resource('s3')
BUCKET = S3.Bucket(name="my-bucket")
def gzip_b64encode(data):
compressed = BytesIO()
with gzip.GzipFile(fileobj=compressed, mode='w') as f:
json_response = json.dumps(data)
f.write(json_response.encode('utf-8'))
return base64.b64encode(compressed.getvalue()).decode('ascii')
def lambda_handler(event, context):
# event body is JSON string, this is request body
req_body = json.loads(event["body"])
retrieved_data = {}
for bucket_obj in BUCKET.objects.all():
# logic to decide what file to grab based on request body done here
log_file_obj = BUCKET.Object(bucket_obj.key).get()
content = log_file_obj["Body"].read().decode("UTF-8")
retrieved_data[bucket_obj.key] = json.loads(content)
# integration response format
return {
"isBase64Encoded": True,
"statusCode": 200,
"headers": {
"Content-Type": "application/json",
"Content-Encoding": "gzip",
"Access-Control-Allow-Origin": "*"
},
"body": gzip_b64encode(retrieved_data)
}
I am kind of newbie to REST and testing dept. I needed to write automation scripts to test our REST services.We are planning to run these scripts from a Jenkins CI job regularly. I prefer writing these in python as we already have UI functionality testing scripts in python generated by selenium IDE, but I am open to any good solution.I checked httplib,simplejson and Xunit, but looking for better solutions available out there.
And also, I would prefer to write a template and generate actual script for each REST API by reading api info from xml or something. Advance thanks to all advices.
I usually use Cucumber to test my restful APIs. The following example is in Ruby, but could easily be translated to python using either the rubypy gem or lettuce.
Start with a set of RESTful base steps:
When /^I send a GET request for "([^\"]*)"$/ do |path|
get path
end
When /^I send a POST request to "([^\"]*)" with the following:$/ do |path, body|
post path, body
end
When /^I send a PUT request to "([^\"]*)" with the following:$/ do |path, body|
put path, body
end
When /^I send a DELETE request to "([^\"]*)"$/ do |path|
delete path
end
Then /^the response should be "([^\"]*)"$/ do |status|
last_response.status.should == status.to_i
end
Then /^the response JSON should be:$/ do |body|
JSON.parse(last_response.body).should == JSON.parse(body)
end
And now we can write features that test the API by actually issuing the requests.
Feature: The users endpoints
Scenario: Creating a user
When I send a POST request to "/users" with the following:
"""
{ "name": "Swift", "status": "awesome" }
"""
Then the response should be "200"
Scenario: Listing users
Given I send a POST request to "/users" with the following:
"""
{ "name": "Swift", "status": "awesome" }
"""
When I send a GET request for "/users"
Then the response should be "200"
And the response JSON should be:
"""
[{ "name": "Swift", "status": "awesome" }]
"""
... etc ...
These are easy to run on a CI system of your choice. See these links for references:
http://www.anthonyeden.com/2010/11/testing-rest-apis-with-cucumber-and-rack-test/
http://jeffkreeftmeijer.com/2011/the-pain-of-json-api-testing/
http://www.cheezyworld.com/2011/08/09/running-your-cukes-in-jenkins/
import openpyxl
import requests
import json
from requests.auth import HTTPBasicAuth
urlHead='https://IP_ADDRESS_HOST:PORT_NUMBER/'
rowStartAt=2
apiColumn=2
#payloadColumn=3
responseBodyColumn=12
statusCodeColumn=13
headerTypes = {'Content-Type':'application/json',
'Accept':'application/json',
'Authorization': '23324'
}
wb = openpyxl.load_workbook('Excel_WORKBOOK.xlsx')
# PROCESS EACH SHEET
for sheetName in (wb.get_sheet_names()):
print ('Sheet Name = ' + sheetName)
flagVar = input('Enter N To avoid APIs Sheets')
if (flagVar=='N'):
print ('Sheet got skipped')
continue
#get a sheet
sheetObj = wb.get_sheet_by_name(sheetName)
#for each sheet iterate the API's
for i in range(2, sheetObj.max_row+1):
#below is API with method type
apiFromSheet = (sheetObj.cell(row=i, column=apiColumn).value)
if apiFromSheet is None:
continue
#print (i, apiFromSheet)
#Let's split the api
apiType = apiFromSheet.split()[0]
method = apiFromSheet.split()[1]
if (apiType!='GET'):
continue
#lets process GET API's
absPath = urlHead + method
print ("REQUESTED TYPE AND PATH = ", apiType, absPath)
print('\n')
res = requests.get(absPath, auth=HTTPBasicAuth(user, pwd), verify=False, headers=headerTypes)
#LET's write res body into relevant cell
sheetObj.cell(row=i, column=responseBodyColumn).value = (res.text)
sheetObj.cell(row=i, column=statusCodeColumn).value = (res.status_code)
wb.save('Excel_WORKBOOK.xlsx')
`#exit(0)`