Save an image from a HTTP request in S3 using python lambda

Save an image from a HTTP request in S3 using python lambda - python

I'm trying to create a simple serverless function in AWS with a Gateway API that allows uploading an image to the server, it should then save the image in a S3 bucket.
This is my code:
def lambda_handler(event, context):
operation = event['httpMethod']
if operation != 'PUT':
return respond(ValueError('Unsupported method "{}"'.format(operation)))
try:
queryParameters = event['queryStringParameters']
username = queryParameters['username']
picturename = username + "_image"
print("picture name: " + picturename)
fileContent = event['body']
print("body size: " + str(len(fileContent)))
object = s3.Object('zoneuserimagesbucket', picturename)
object.put(Body=fileContent)
return respond(None, "OK")
The image being save is twice as big as the original and not the same as the original.. What I'm missing?

Looks like you have a base64 encoded data in event.body.
There are two ways to overcome this situation.
Option 1 (easy)
Check event.isBase64Encoded - if it's set to true, base64.decode event.content to get a bytes object to write to S3.
Option 2 (needs config and time)
API Gateway could decode the data for you, if asked.
This has the benefit your lambda function will need less memory and time to run, which is great in the long term
Steps:
1 - Decide how you want API Gateway to encode binary data for you (base64, raw) etc. If you receive the data as you want, your lambda will need less memory and time to run.
2 - Decide the 'field' in the request where you want your data to come (instead of just ´event.body´).
3 - Verify you have in API Gateway a media encoding defined for the field you want. For images: image/jpeg, image/png, image/gif, etc.
3 - Check that you have the correct mediaHandling in the responseTemplate:
CONVERT_TO_BINARY: Transcode it binary data if it's in base64.
CONVERT_TO_TEXT: Transcode the data as base64.
Related documentation:
See the [announce in the AWS blog post (https://aws.amazon.com/es/blogs/compute/binary-support-for-api-integrations-with-amazon-api-gateway/)
and AWS API Gateway Payload Encodings.

Related

Export spreadsheet as text/csv using Drive v3 gives 500 Internal Error

I was trying to export a Google Spreadsheet in csv format using the Google client library for Python:
# OAuth and setups...
req = g['service'].files().export_media(fileId=fileid, mimeType=MIMEType)
fh = io.BytesIO()
downloader = http.MediaIoBaseDownload(fh, req)
# Other file IO handling...
This works for MIMEType: application/pdf, MS Excel, etc.
According to Google's documentation, text/csv is supported. But when I try to make a request, the server gives a 500 Internal Error.
Even using google's Drive API playground, it gives the same error.
Tried:
Like in v2, I added a field:
gid = 0
to the request to specify the worksheet, but then it's a bad request.

This is a known bug in Google's code. https://code.google.com/a/google.com/p/apps-api-issues/issues/detail?id=4289
However, if you manually build your own request, you can download the whole file in bytes (the media management stuff won't work).
With file as the file ID, http as the http object that you've authorized against you can download a file with:
from apiclient.http import HttpRequest
def postproc(*args):
return args[1]
data = HttpRequest(http=http,
postproc=postproc,
uri='https://docs.google.com/feeds/download/spreadsheets/Export?key=%s&exportFormat=csv' % file,
headers={ }).execute()
data here is a bytes object that contains your CSV. You can open it something like:
import io
lines = io.TextIOWrapper(io.BytesIO(data), encoding='utf-8', errors='replace')
for line in lines:
#Do whatever

You just need to implement an Exponential Backoff.
Looking at this documentation of ExponentialBackOffPolicy.
The idea is that the servers are only temporarily unavailable, and they should not be overwhelmed when they are trying to get back up.
The default implementation requires back off for 500 and 503 status codes. Subclasses may override if different status codes are required.
Here is an snippet of an implementation of Exponential Backoff from the first link:
ExponentialBackOff backoff = ExponentialBackOff.builder()
.setInitialIntervalMillis(500)
.setMaxElapsedTimeMillis(900000)
.setMaxIntervalMillis(6000)
.setMultiplier(1.5)
.setRandomizationFactor(0.5)
.build();
request.setUnsuccessfulResponseHandler(new HttpBackOffUnsuccessfulResponseHandler(backoff));
You may want to look at this documentation for the summary of the ExponentialBackoff implementation.

Python Boto3 AWS Multipart Upload Syntax

I am successfully authenticating with AWS and using the 'put_object' method on the Bucket object to upload a file. Now I want to use the multipart API to accomplish this for large files. I found the accepted answer in this question:
How to save S3 object to a file using boto3
But when trying to implement I am getting "unknown method" errors. What am I doing wrong? My code is below. Thanks!
## Get an AWS Session
self.awsSession = Session(aws_access_key_id=accessKey,
aws_secret_access_key=secretKey,
aws_session_token=session_token,
region_name=region_type)
...
# Upload the file to S3
s3 = self.awsSession.resource('s3')
s3.Bucket('prodbucket').put_object(Key=fileToUpload, Body=data) # WORKS
#s3.Bucket('prodbucket').upload_file(dataFileName, 'prodbucket', fileToUpload) # DOESNT WORK
#s3.upload_file(dataFileName, 'prodbucket', fileToUpload) # DOESNT WORK

The upload_file method has not been ported over to the bucket resource yet. For now you'll need to use the client object directly to do this:
client = self.awsSession.client('s3')
client.upload_file(...)

Libcloud S3 wrapper transparently handles all the splitting and uploading of the parts for you.
Use upload_object_via_stream method to do so:
from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver
# Path to a very large file you want to upload
FILE_PATH = '/home/user/myfile.tar.gz'
cls = get_driver(Provider.S3)
driver = cls('api key', 'api secret key')
container = driver.get_container(container_name='my-backups-12345')
# This method blocks until all the parts have been uploaded.
extra = {'content_type': 'application/octet-stream'}
with open(FILE_PATH, 'rb') as iterator:
obj = driver.upload_object_via_stream(iterator=iterator,
container=container,
object_name='backup.tar.gz',
extra=extra)
For official documentation on S3 Multipart feature, refer to AWS Official Blog.

How do you get Google App Engine to gunzip during download?

I am trying to get Google App Engine to gunzip my .gz blob file (single file compressed) automatically by setting the response headers as follows:
class download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
self.response.headers['Content-Encoding'] = str('gzip')
# self.response.headers['Content-type'] = str('application/x-gzip')
self.response.headers['Content-type'] = str(blob_info.content_type)
self.response.headers['Content-Length'] = str(blob_info.size)
cd = 'attachment; filename=%s' % (blob_info.filename)
self.response.headers['Content-Disposition'] = str(cd)
self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
self.response.headers['Pragma'] = str(' public')
self.send_blob(blob_info)
When this runs, the file is downloaded without the .gz extension. However, the downloaded file is still gzipped. The file size of the downloaded data match the .gz file size on the server. Also, I can confirm this by manually gunzipping the downloaded file. I am trying to avoid the manual gunzip step.
I am trying to get the blob file to automatically gunzip during the download. What am I doing wrong?
By the way, the gzip file contains only a single file. On my self-hosted (non Google) server, I could accomplish the automatic gunzip by setting same response headers; albeit, my code there is written in PHP.
UPDATE:
I rewrote the handler to serve data from the bucket. However, this generates HTML 500 error. The file is partially downloaded before the failure. The rewrite is as follows:
class download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
file = '/gs/mydatabucket/%s' % blob_info.filename
print file
self.response.headers['Content-Encoding'] = str('gzip')
self.response.headers['Content-Type'] = str('application/x-gzip')
# self.response.headers['Content-Length'] = str(blob_info.size)
cd = 'filename=%s' % (file)
self.response.headers['Content-Disposition'] = str(cd)
self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
self.response.headers['Pragma'] = str(' public')
self.send_blob(file)
This downloads 540,672 bytes of the 6,094,848 bytes file to the client before the server terminate and issued a 500 error. When I issue 'file' on the partially downloaded file from the command line, Mac OS seems to correctly identify the file format as 'SQLite 3.x database' file. Any idea of why the 500 error on the server? How can I fix the problem?

You should first check to see if your requesting client supports gzipped content. If it does support gzip content encoding, then you may pass the gzipped blob as is with the proper content-encoding and content-type headers, otherwise you need to decompress the blob for the client. You should also verify that your blob's content_type isn't gzip (this depends on how you created your blob to begin with!)
You may also want to look at Google Cloud Storage as this automatically handles gzip transportation so long as you properly compress the data before storing it with the proper content-encoding and content-type metadata.
See this SO question: Google cloud storage console Content-Encoding to gzip
Or the GCS Docs: https://cloud.google.com/storage/docs/gsutil/addlhelp/WorkingWithObjectMetadata#content-encoding
You may use GCS as easily (if not more easily) as you use the blobstore in AppEngine and it seems to be the preferred storage layer to use going forward. I say this because the File API has been deprecated which made blobstore interaction easier and great efforts and advancements have been made to the GCS libraries making the API similar to the base python file interaction API
UPDATE:
Since the objects are stored in GCS, you can use 302 redirects to point users to files rather than relying on the Blobstore API. This eliminates any unknown behavior of the Blobstore API and GAE delivering your stored objects with the content-type and content-encoding you intended to use. For objects with a public-read ACL, you may simply direct them to either storage.googleapis.com/<bucket>/<object> or <bucket>.storage.googleapis.com/<object>. Alternatively, if you'd like to have application logic dictate access, you should keep the ACL to the objects private and can use GCS Signed URLs to create short lived URLs to use when doing a 302 redirect.
Its worth noting that if you want users to be able to upload objects via GAE, you'd still use the Blobstore API to handle storing the file in GCS, but you'd have to modify the object after it was uploaded to ensure proper gzip compressing and content-encoding meta data is used.
class legacy_download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
filename = str(urllib.unquote(resource))
url = 'https://storage.googleapis.com/mybucket/' + filename
self.redirect(url)

GAE already serves everything using gzip if the client supports it.
So I think what's happening after your update is that the browser expects there to be more of the file, but GAE thinks it's already at the end of the file since it's already gzipped. That's why you get the 500.
(if that makes sense)
Anyway, since GAE already handles compression for you, the easiest way is probably to put non compressed files in GCS and let the Google infrastructure handle the compression automatically for you when you serve them.

GAE - how to use blobstore stub in testbed?

My code goes like this:
self.testbed.init_blobstore_stub()
upload_url = blobstore.create_upload_url('/image')
upload_url = re.sub('^http://testbed\.example\.com', '', upload_url)
response = self.testapp.post(upload_url, params={
'shopid': id,
'description': 'JLo',
}, upload_files=[('file', imgPath)])
self.assertEqual(response.status_int, 200)
how come it shows 404 error? For some reasons the upload path does not seem to exist at all.

You can't do this. I think the problem is that webtest (which I assume is where self.testapp came from) doesn't work well with testbed blobstore functionality. You can find some info at this question.
My solution was to override unittest.TestCase and add the following methods:
def create_blob(self, contents, mime_type):
"Since uploading blobs doesn't work in testing, create them this way."
fn = files.blobstore.create(mime_type = mime_type,
_blobinfo_uploaded_filename = "foo.blt")
with files.open(fn, 'a') as f:
f.write(contents)
files.finalize(fn)
return files.blobstore.get_blob_key(fn)
def get_blob(self, key):
return self.blobstore_stub.storage.OpenBlob(key).read()
You will also need the solution here.
For my tests where I would normally do a get or post to a blobstore handler, I instead call one of the two methods above. It is a bit hacky but it works.
Another solution I am considering is to use Selenium's HtmlUnit driver. This would require the dev server to be running but should allow full testing of blobstore and also javascript (as a side benefit).

I think Kekito is right, you cannot POST to the upload_url directly.
But if you want to test the BlobstoreUploadHandler, you can fake the POST request it would normally received from the blobstore in the following way. Assuming your handler is at /handler :
import email
...
def test_upload(self):
blob_key = 'abcd'
# The blobstore upload handler receives a multipart form request
# containing uploaded files. But instead of containing the actual
# content, the files contain an 'email' message that has some meta
# information about the file. They also contain a blob-key that is
# the key to get the blob from the blobstore
# see blobstore._get_upload_content
m = email.message.Message()
m.add_header('Content-Type', 'image/png')
m.add_header('Content-Length', '100')
m.add_header('X-AppEngine-Upload-Creation', '2014-03-02 23:04:05.123456')
# This needs to be valie base64 encoded
m.add_header('content-md5', 'd74682ee47c3fffd5dcd749f840fcdd4')
payload = m.as_string()
# The blob-key in the Content-type is important
params = [('file', webtest.forms.Upload('test.png', payload,
'image/png; blob-key='+blob_key))]
self.testapp.post('/handler', params, content_type='blob-key')
I figured that out by digging into the blobstore code. The important bit is that the POST request that the blobstore sends to the UploadHandler doesn't contain the file content. Instead, it contains an "email message" (well, informations encoded like in an email) with metadata about the file (content-type, content-length, upload time and md5). It also contains a blob-key that can be used to retrieve the file from the blobstore.

Face.com API upload image from python

I'm trying to upload an image to the Face.com API. It either takes a url to an image, or images can be uploaded directly. Their website says:
A requests that uploads a photo must be formed as a MIME multi-part
message sent using POST data. Each argument, including the raw image
data, should be specified as a separate chunk of form data.
Problem is, I don't know exactly what that means. Right now my code looks like this:
import urllib
import json
apikey = "[redacted]"
secret = "[redacted]"
img = raw_input("Enter the URL of an image: ");
url = "http://api.face.com/faces/detect.json?api_key=" + apikey + "&api_secret=" + secret + "&urls=" + urllib.quote(img) + "&attributes=all"
data = json.loads(urllib.urlopen(url).read())
How can I convert this to work with a locally stored image?

The easiest way to upload photo in Python to face.com API is just using the Python Client Library that can be downloaded form http://developers.face.com/download/.
You got 2 there. Both support uploading by passing filename to the detected method (as a different param than the urls).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Save an image from a HTTP request in S3 using python lambda - python

Related

Export spreadsheet as text/csv using Drive v3 gives 500 Internal Error

Python Boto3 AWS Multipart Upload Syntax

How do you get Google App Engine to gunzip during download?

GAE - how to use blobstore stub in testbed?

Face.com API upload image from python

Categories

Resources