so I have this sort of http cloud function (python 3.7)
from google.cloud import storage
def upload_blob(bucket_name, blob_text, destination_blob_name):
"""Uploads a file to the bucket."""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_string(blob_text)
print('File {} uploaded to {}.'.format(
source_file_name,
destination_blob_name))
Now I want to test this function from Testing tab of the Functions details page where it asks for triggering event in JSON format.
i tried with various formats like
{
"name":[param1,param2,param3]
}
but always ended up getting error
upload_blob() missing 2 required positional arguments: 'blob_text' and 'destination_blob_name'
Also tried to find out any documentation but was unable to do so, can you help in pointing to the right direction
Your upload_blob function doesn't look like an HTTP Cloud Function. HTTP Cloud Functions take a single parameter, request. For example:
def hello_http(request):
...
See https://cloud.google.com/functions/docs/writing/http#writing_http_helloworld-python for more details.
As suggested by Dustin you should add an entry point which parses the json request.
A code snipped for the following request will be as follows:
{"bucket_name": "xyz", "blob_text": "abcdfg", "destination_blob_name": "sample.txt"}
The entry function will be as follows:
entry_point_function(request):
{
content_type = request.headers['content-type']
if 'application/json' in content_type:
request_json = request.get_json(silent=True)
if request_json and 'bucket_name' in request_json:
text = request_json['bucket_name']
else:
raise ValueError("JSON is invalid, or missing a 'bucket_name' property")
}
Rename the entry point in the function configuration. For the cloud UI
You need to change
Related
I am trying to put my CSV file from S3 to DynamoDB through the Lambda function. In the first stage, I was uploading my .csv file manually in S3 manually. When uploading the file manually, I know the name of the file, and I put this file name as a key in the test event and it works file.
I want to automate things because my .csv files are automatically generated in the S3 and I don't know what will be the name of the next file. Someone suggested me to create trigger in S3 that will invoke your Lambda on every file generation. The only issue I am dealing with is what to put in the test event at the place of "key", where we are supposed to put a file name whose data we want to fetch from S3.
I don't have a file name now. Following is the Lambda code:
import json
import boto3
s3_client = boto3.client("s3")
dynamodb = boto3.resource("dynamodb")
student_table = dynamodb.Table('AgentMetrics')
def lambda_handler(event, context):
source_bucket_name = event['Records'][0]['s3']['bucket']['name']
file_name = event['Records'][0]['s3']['object']['key']
file_object = s3_client.get_object(Bucket=source_bucket_name,Key=file_name)
print("file_object :",file_object)
file_content = file_object['Body'].read().decode("utf-8")
print("file_content :",file_content)
students = file_content.split("\n")
print("students :",students)
for student in students:
data = student.split(",")
try:
student_table.put_item(
Item = {
"Agent" : data[0],
"StartInterval" : data[1],
"EndInterval" : data[2],
"Agent idle time" : data[3],
"Agent on contact time" : data[4],
"Nonproductive time" : data[5],
"Online time" : data[6],
"Lunch Break time" : data[7],
"Service level 120 seconds" : data[8],
"After contact work time" : data[9],
"Contacts handled" : data[10],
"Contacts queued" : data[11]
} )
except Exception as e:
print("File Completed")
The error I am facing is ["errorMessage": "An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.",
"errorType": "NoSuchKey",]
Kindly help me here, I am getting frustrated because of this issue. I would really appreciate any help, thanks.
As suggested in your question you have to add trigger to S3 bucket on action POST, PUT OR DELETE whichever action need to track.
Here is more details :
https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html
Select Lambda either python or nodeJs whichever you prefer from blueprint option
Then select S3 bucket and action like PUT, POST OR DELETE or all.
Write your above code make entry in db in this lambda.
The "Test Event" you are using is an example of a message that Amazon S3 will send to your Lambda function when a new object is created in the bucket.
When S3 triggers your AWS Lambda function, it will provide details of the object that trigger the event. Your program will then use the event supplied by S3. It will not use your Test Event.
One more thing...
It is possible that the Lambda function will be triggered with more than one object being passed via the event. Your function should be able to handle this happening. You can do this by adding a for loop:
import urllib
...
def lambda_handler(event, context):
for record in event['Records']:
source_bucket_name = record['s3']['bucket']['name']
file_name = urllib.parse.unquote_plus(record['s3']['object']['key'])
...
I want to upload an image to s3 with lambda and Api gateway when i submit form how can i do it in python.
currently i am getting this error while i am trying to upload image through PostMan
Could not parse request body into json: Could not parse payload into json: Unexpected character (\'-\' (code 45))
my code currently is
import json
import boto3
import base64
s3 = boto3.client('s3')
def lambda_handler(event, context):
print(event)
try:
if event['httpMethod'] == 'POST' :
print(event['body'])
data = json.loads(event['body'])
name = data['name']
image = data['file']
image = image[image.find(",")+1:]
dec = base64.b64decode(image + "===")
s3.put_object(Bucket='', Key="", Body=dec)
return {'statusCode': 200, 'body': json.dumps({'message': 'successful lambda function call'}), 'headers': {'Access-Control-Allow-Origin': '*'}}
except Exception as e:
return{
'body':json.dumps(e)
}
Doing an upload through API Gateway and Lambda has its limitations:
You can not handle large files and there is an execution timeout of 30 seconds as I recall.
I would go with creating a presigned url that is requested by the client through API gateway, then use it as the endpoint to put the file.
Something like this will go in your Lambda ,
(This is a NodeJs example)
const uploadUrl = S3.getSignedUrl( 'putObject', {
Bucket: get(aPicture, 'Bucket'),
Key: get( aPicture, 'Key'),
Expires: 600,
})
callback( null, { url })
(NodeJs)
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getSignedUrl-property
(Python)
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.generate_presigned_url
You don't need Lambda for this. You can proxy S3 API with API Gateway
https://docs.aws.amazon.com/apigateway/latest/developerguide/integrating-api-with-aws-services-s3.html
Ooops it's over engineering.
anyways, it's seems like your are getting error from API-gateway, first check the lambda through "test lambda" using aws console, if it's working fine and getting the response back from lambda then please check with API-gateway side.
i doubt you are using some mapping templates in gateway, AWS uses AWS Velocity templates, which looks like JSON, but it's different. those mapping template at integration request causing this issue.
I'm trying to figure out how to receive a file sent by a browser through an API call in Python.
The web client is allowed to send any types of files (let's say .txt, .docx, .xlsx, ...). I don't know if I should use binary or not.
The idea was to save the file after on S3. Now I know it's possible to use js libraries like Aws Amplify and generate a temporary url but i'm not too interested in that solution.
Any help appreciated, I've searched extensively a solution in Python but i can't find anything actually working !
My API is private and i'm using serverless to deploy.
files_post:
handler: post/post.post
events:
- http:
path: files
method: post
cors: true
authorizer:
name: authorizer
arn: ${cf:lCognito.CognitoUserPoolMyUserPool}
EDIT
I have a half solution that works for text files but doesn't for PDF, XLSX, or images, if someone had i'd be super happy
from cgi import parse_header, parse_multipart
from io import BytesIO
import json
def post(event, context):
print event['queryStringParameters']['filename']
c_type, c_data = parse_header(event['headers']['content-type'])
c_data['boundary'] = bytes(c_data['boundary']).encode("utf-8")
body_file = BytesIO(bytes(event['body']).encode("utf-8"))
form_data = parse_multipart(body_file, c_data)
s3 = boto3.resource('s3')
object = s3.Object('storage', event['queryStringParameters']['filename'])
object.put(Body=form_data['upload'][0])
You are using API Gateway so your lambda event will map to something like this (from Amazon Docs):
{
"resource": "Resource path",
"path": "Path parameter",
"httpMethod": "Incoming request's method name"
"headers": {String containing incoming request headers}
"multiValueHeaders": {List of strings containing incoming request headers}
"queryStringParameters": {query string parameters }
"multiValueQueryStringParameters": {List of query string parameters}
"pathParameters": {path parameters}
"stageVariables": {Applicable stage variables}
"requestContext": {Request context, including authorizer-returned key-value pairs}
"body": "A JSON string of the request payload."
"isBase64Encoded": "A boolean flag to indicate if the applicable request payload is Base64-encode"
}
You can pass the file as a base64 value in the body and decode it in your lambda function. Take the following Python snippet
def lambda_handler(event, context):
data = json.loads(event['body'])
# Let's say we user a regular <input type='file' name='uploaded_file'/>
encoded_file = data['uploaded_file']
decoded_file = base64.decodestring(encoded_file)
# now save it to S3
Tried this:
import boto3
from boto3.s3.transfer import TransferConfig, S3Transfer
path = "/temp/"
fileName = "bigFile.gz" # this happens to be a 5.9 Gig file
client = boto3.client('s3', region)
config = TransferConfig(
multipart_threshold=4*1024, # number of bytes
max_concurrency=10,
num_download_attempts=10,
)
transfer = S3Transfer(client, config)
transfer.upload_file(path+fileName, 'bucket', 'key')
Result: 5.9 gig file on s3. Doesn't seem to contain multiple parts.
I found this example, but part is not defined.
import boto3
bucket = 'bucket'
path = "/temp/"
fileName = "bigFile.gz"
key = 'key'
s3 = boto3.client('s3')
# Initiate the multipart upload and send the part(s)
mpu = s3.create_multipart_upload(Bucket=bucket, Key=key)
with open(path+fileName,'rb') as data:
part1 = s3.upload_part(Bucket=bucket
, Key=key
, PartNumber=1
, UploadId=mpu['UploadId']
, Body=data)
# Next, we need to gather information about each part to complete
# the upload. Needed are the part number and ETag.
part_info = {
'Parts': [
{
'PartNumber': 1,
'ETag': part['ETag']
}
]
}
# Now the upload works!
s3.complete_multipart_upload(Bucket=bucket
, Key=key
, UploadId=mpu['UploadId']
, MultipartUpload=part_info)
Question: Does anyone know how to use the multipart upload with boto3?
Your code was already correct. Indeed, a minimal example of a multipart upload just looks like this:
import boto3
s3 = boto3.client('s3')
s3.upload_file('my_big_local_file.txt', 'some_bucket', 'some_key')
You don't need to explicitly ask for a multipart upload, or use any of the lower-level functions in boto3 that relate to multipart uploads. Just call upload_file, and boto3 will automatically use a multipart upload if your file size is above a certain threshold (which defaults to 8MB).
You seem to have been confused by the fact that the end result in S3 wasn't visibly made up of multiple parts:
Result: 5.9 gig file on s3. Doesn't seem to contain multiple parts.
... but this is the expected outcome. The whole point of the multipart upload API is to let you upload a single file over multiple HTTP requests and end up with a single object in S3.
As described in official boto3 documentation:
The AWS SDK for Python automatically manages retries and multipart and
non-multipart transfers.
The management operations are performed by using reasonable default
settings that are well-suited for most scenarios.
So all you need to do is just to set the desired multipart threshold value that will indicate the minimum file size for which the multipart upload will be automatically handled by Python SDK:
import boto3
from boto3.s3.transfer import TransferConfig
# Set the desired multipart threshold value (5GB)
GB = 1024 ** 3
config = TransferConfig(multipart_threshold=5*GB)
# Perform the transfer
s3 = boto3.client('s3')
s3.upload_file('FILE_NAME', 'BUCKET_NAME', 'OBJECT_NAME', Config=config)
Moreover, you can also use multithreading mechanism for multipart upload by setting max_concurrency:
# To consume less downstream bandwidth, decrease the maximum concurrency
config = TransferConfig(max_concurrency=5)
# Download an S3 object
s3 = boto3.client('s3')
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME', Config=config)
And finally in case you want perform multipart upload in single thread just set use_threads=False:
# Disable thread use/transfer concurrency
config = TransferConfig(use_threads=False)
s3 = boto3.client('s3')
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME', Config=config)
Complete source code with explanation: Python S3 Multipart File Upload with Metadata and Progress Indicator
I would advise you to use boto3.s3.transfer for this purpose. Here is an example:
import boto3
def upload_file(filename):
session = boto3.Session()
s3_client = session.client("s3")
try:
print("Uploading file: {}".format(filename))
tc = boto3.s3.transfer.TransferConfig()
t = boto3.s3.transfer.S3Transfer(client=s3_client, config=tc)
t.upload_file(filename, "my-bucket-name", "name-in-s3.dat")
except Exception as e:
print("Error uploading: {}".format(e))
In your code snippet, clearly should be part -> part1 in the dictionary. Typically, you would have several parts (otherwise why use multi-part upload), and the 'Parts' list would contain an element for each part.
You may also be interested in the new pythonic interface to dealing with S3: http://s3fs.readthedocs.org/en/latest/
Why not use just the copy option in boto3?
s3.copy(CopySource={
'Bucket': sourceBucket,
'Key': sourceKey},
Bucket=targetBucket,
Key=targetKey,
ExtraArgs={'ACL': 'bucket-owner-full-control'})
There are details on how to initialise s3 object and obviously further options for the call available here boto3 docs.
copy from boto3 is a managed transfer which will perform a multipart copy in multiple threads if necessary.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.copy
This works with objects greater than 5Gb and I have already tested this.
Change Part to Part1
import boto3
bucket = 'bucket'
path = "/temp/"
fileName = "bigFile.gz"
key = 'key'
s3 = boto3.client('s3')
# Initiate the multipart upload and send the part(s)
mpu = s3.create_multipart_upload(Bucket=bucket, Key=key)
with open(path+fileName,'rb') as data:
part1 = s3.upload_part(Bucket=bucket
, Key=key
, PartNumber=1
, UploadId=mpu['UploadId']
, Body=data)
# Next, we need to gather information about each part to complete
# the upload. Needed are the part number and ETag.
part_info = {
'Parts': [
{
'PartNumber': 1,
'ETag': part1['ETag']
}
]
}
# Now the upload works!
s3.complete_multipart_upload(Bucket=bucket
, Key=key
, UploadId=mpu['UploadId']
, MultipartUpload=part_info)
I'm going to write a Python program to check if a file is in certain folder of my Google Cloud Storage, the basic idea is to get the list of all objects in a folder, a file name list, then check if the file abc.txt is in the file name list.
Now the problem is, it looks Google only provide the one way to get obj list, which is uri.get_bucket(), see below code which is from https://developers.google.com/storage/docs/gspythonlibrary#listing-objects
uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE)
for obj in uri.get_bucket():
print '%s://%s/%s' % (uri.scheme, uri.bucket_name, obj.name)
print ' "%s"' % obj.get_contents_as_string()
The defect of uri.get_bucket() is, it looks it is getting all of the object first, this is what I don't want, I just need get the obj name list of particular folder(e.g gs//mybucket/abc/myfolder) , which should be much quickly.
Could someone help answer? Appreciate every answer!
Update: the below is true for the older "Google API Client Libraries" for Python, but if you're not using that client, prefer the newer "Google Cloud Client Library" for Python ( https://googleapis.dev/python/storage/latest/index.html ). For the newer library, the equivalent to the below code is:
from google.cloud import storage
client = storage.Client()
for blob in client.list_blobs('bucketname', prefix='abc/myfolder'):
print(str(blob))
Answer for older client follows.
You may find it easier to work with the JSON API, which has a full-featured Python client. It has a function for listing objects that takes a prefix parameter, which you could use to check for a certain directory and its children in this manner:
from apiclient import discovery
# Auth goes here if necessary. Create authorized http object...
client = discovery.build('storage', 'v1') # add http=whatever param if auth
request = client.objects().list(
bucket="mybucket",
prefix="abc/myfolder")
while request is not None:
response = request.execute()
print json.dumps(response, indent=2)
request = request.list_next(request, response)
Fuller documentation of the list call is here: https://developers.google.com/storage/docs/json_api/v1/objects/list
And the Google Python API client is documented here:
https://code.google.com/p/google-api-python-client/
This worked for me:
client = storage.Client()
BUCKET_NAME = 'DEMO_BUCKET'
bucket = client.get_bucket(BUCKET_NAME)
blobs = bucket.list_blobs()
for blob in blobs:
print(blob.name)
The list_blobs() method will return an iterator used to find blobs in the bucket.
Now you can iterate over blobs and access every object in the bucket. In this example I just print out the name of the object.
This documentation helped me alot:
https://googleapis.github.io/google-cloud-python/latest/storage/blobs.html
https://googleapis.github.io/google-cloud-python/latest/_modules/google/cloud/storage/client.html#Client.bucket
I hope I could help!
You might also want to look at gcloud-python and documentation.
from gcloud import storage
connection = storage.get_connection(project_name, email, private_key_path)
bucket = connection.get_bucket('my-bucket')
for key in bucket:
if key.name == 'abc.txt':
print 'Found it!'
break
However, you might be better off just checking if the file exists:
if 'abc.txt' in bucket:
print 'Found it!'
Install python package google-cloud-storage by pip or pycharm and use below code
from google.cloud import storage
client = storage.Client()
for blob in client.list_blobs(BUCKET_NAME, prefix=FOLDER_NAME):
print(str(blob))
I know this is an old question, but I stumbled over this because I was looking for the exact same answer. Answers from Brandon Yarbrough and Abhijit worked for me, but I wanted to get into more detail.
When you run this:
from google.cloud import storage
storage_client = storage.Client()
blobs = list(storage_client.list_blobs(bucket_name, prefix=PREFIX, fields="items(name)"))
You will get Blob objects, with just the name field of all files in the given bucket, like this:
[<Blob: BUCKET_NAME, PREFIX, None>,
<Blob: xml-BUCKET_NAME, [PREFIX]claim_757325.json, None>,
<Blob: xml-BUCKET_NAME, [PREFIX]claim_757390.json, None>,
...]
If you are like me and you want to 1) filter out the first item in the list because it does NOT represent a file - its just the prefix, 2) just get the name string value, and 3) remove the PREFIX from the file name, you can do something like this:
blob_names = [blob_name.name[len(PREFIX):] for blob_name in blobs if blob_name.name != folder_name]
Complete code to get just the string files names from a storage bucket:
from google.cloud import storage
storage_client = storage.Client()
blobs = list(storage_client.list_blobs(bucket_name, prefix=PREFIX, fields="items(name)"))
blob_names = [blob_name.name[len(PREFIX):] for blob_name in blobs if blob_name.name != folder_name]
print(f"blob_names = {blob_names}")