Is there any specific way to unzip single file contents from s3 to multiple cloudfront urls by triggering lambda once.
Lets say in there is a zip file contains multiple jpg/ png files already uploaded to s3. Intention is to run lambda function only once to unzip all its file content and make them available in multiple cloudfront urls.
in s3 bucket
archive.zip
a.jpg
b.jpg
c.jpg
through cloudfront
https://1232.cloudfront.net/a.jpg
https://1232.cloudfront.net/b.jpg
https://1232.cloudfront.net/c.jpg
I am looking for a solution such that lambda function trigger function calls whenever a s3 upload happens and make all files available in the zip through cloudfront multiple urls.
Hello Prathap Parameswar,
I think you can resolve your problem like this:
First you need to exact your zip file
Seconds you upload them again to S3.
This is lambda python function:
import json
import boto3
from io import BytesIO
import zipfile
def lambda_handler(event, context):
# TODO implement
s3_resource = boto3.resource('s3')
source_bucket = 'upload-zip-folder'
target_bucket = 'upload-extracted-folder'
my_bucket = s3_resource.Bucket(source_bucket)
for file in my_bucket.objects.all():
if(str(file.key).endswith('.zip')):
zip_obj = s3_resource.Object(bucket_name=source_bucket, key=file.key)
buffer = BytesIO(zip_obj.get()["Body"].read())
z = zipfile.ZipFile(buffer)
for filename in z.namelist():
file_info = z.getinfo(filename)
try:
response = s3_resource.meta.client.upload_fileobj(
z.open(filename),
Bucket=target_bucket,
Key=f'{filename}'
)
except Exception as e:
print(e)
else:
print(file.key+ ' is not a zip file.')
Hope this can help you
Trying to get count of objects in S3 folder
Current code
bucket='some-bucket'
File='someLocation/File/'
objs = boto3.client('s3').list_objects_v2(Bucket=bucket,Prefix=File)
fileCount = objs['KeyCount']
This gives me the count as 1+actual number of objects in S3.
Maybe it is counting "File" as a key too?
Assuming you want to count the keys in a bucket and don't want to hit the limit of 1000 using list_objects_v2. The below code worked for me but I'm wondering if there is a better faster way to do it! Tried looking if there's a packaged function in boto3 s3 connector but there isn't!
# connect to s3 - assuming your creds are all set up and you have boto3 installed
s3 = boto3.resource('s3')
# identify the bucket - you can use prefix if you know what your bucket name starts with
for bucket in s3.buckets.all():
print(bucket.name)
# get the bucket
bucket = s3.Bucket('my-s3-bucket')
# use loop and count increment
count_obj = 0
for i in bucket.objects.all():
count_obj = count_obj + 1
print(count_obj)
If there are more than 1000 entries, you need to use paginators, like this:
count = 0
client = boto3.client('s3')
paginator = client.get_paginator('list_objects')
for result in paginator.paginate(Bucket='your-bucket', Prefix='your-folder/', Delimiter='/'):
count += len(result.get('CommonPrefixes'))
"Folders" do not actually exist in Amazon S3. Instead, all objects have their full path as their filename ('Key'). I think you already know this.
However, it is possible to 'create' a folder by creating a zero-length object that has the same name as the folder. This causes the folder to appear in listings and is what happens if folders are created via the management console.
Thus, you could exclude zero-length objects from your count.
For an example, see: Determine if folder or file key - Boto
The following code worked perfectly
def getNumberOfObjectsInBucket(bucketName,prefix):
count = 0
response = boto3.client('s3').list_objects_v2(Bucket=bucketName,Prefix=prefix)
for object in response['Contents']:
if object['Size'] != 0:
#print(object['Key'])
count+=1
return count
object['Size'] == 0 will take you to folder names, if want to check them, object['Size'] != 0 will lead you to all non-folder keys.
Sample function call below:
getNumberOfObjectsInBucket('foo-test-bucket','foo-output/')
If you have credentials to access that bucket, then you can use this simple code. Below code will give you a list. List comprehension is used for more readability.
Filter is used to filter objects because in bucket to identify the files ,folder names are used. As explained by John Rotenstein concisely.
import boto3
bucket = "Sample_Bucket"
folder = "Sample_Folder"
s3 = boto3.resource("s3")
s3_bucket = s3.Bucket(bucket)
files_in_s3 = [f.key.split(folder + "/")[1] for f in s3_bucket.objects.filter(Prefix=folder).all()]
I am uploading images to a folder currently on local . like in site/uploads.
And After searching I got that for uploading images to Amazon S3, I have to do like this
import boto3
s3 = boto3.resource('s3')
# Get list of objects for indexing
images=[('image01.jpeg','Albert Einstein'),
('image02.jpeg','Candy'),
('image03.jpeg','Armstrong'),
('image04.jpeg','Ram'),
('image05.jpeg','Peter'),
('image06.jpeg','Shashank')
]
# Iterate through list to upload objects to S3
for image in images:
file = open(image[0],'rb')
object = s3.Object('rekognition-pictures','index/'+ image[0])
ret = object.put(Body=file,
Metadata={'FullName':image[1]}
)
Clarification
Its my code to send images and name to S3 . But I dont know how to get image in this line of code images=[('image01.jpeg','Albert Einstein'), like how can I get this image in this code from /upload/image01.jpeg . and 2ndly how can I get images from s3 and show in my website image page ?
I know your question is specific to boto3 so you might not like my answer, but it will achieve the same outcome as what you would like to achieve and the aws-cli also makes use of boto3.
See here: http://bigdatums.net/2016/09/17/copy-local-files-to-s3-aws-cli/
This example is from the site and could easily be used in a script:
#!/bin/bash
#copy all files in my-data-dir into the "data" directory located in my-s3-bucket
aws s3 cp my-data-dir/ s3://my-s3-bucket/data/ --recursive
The very first thing, the code snippet you are showing as a reference is not for your use case as I had written that code snippet for batch uploads from boto3 where you have to provide image paths in your script along with metadata for image, so the names in your code snippet are metadata.So upto what i get to understand from your question, you want files in a local folder to be uploaded and want to provide custom names before uploading , so this is how you will do that.
import os
import boto3
s3 = boto3.resource('s3')
directory_in_str="E:\\streethack\\hold"
directory = os.fsencode(directory_in_str)
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".jpeg") or filename.endswith(".jpg") or filename.endswith(".png"):
strg=directory_in_str+'\\'+filename
print(strg)
print("Enter name for your image : ")
inp_val = input()
strg2=inp_val+'.jpeg'
file = open(strg,'rb')
object = s3.Object('mausamrest','test/'+ strg2) #mausamrest is bucket
object.put(Body=file,ContentType='image/jpeg',ACL='public-read')
else:
continue
programmatically , you have to provide path of folder which is hard-coded in this example in directory_in_str variable. then , this code will iterate over each file searching for image , then it will ask for input for custom name and then it will upload your file.
Moreover, you want to show these images on your website , so public_read for images have been turned on using ACL , so you can directly use s3 links to embedd images in your webpages like this one.
https://s3.amazonaws.com/mausamrest/test/jkl.jpeg
This above file is the one i used to test this code snippet. your images will be availbale like this. Make sure you change bucket name. :)
Using the Resource method:
# Iterate through list to upload objects to S3
bucket = s3.Bucket('rekognition-pictures')
for image in images:
bucket.upload_file(Filename='/upload/' + image[0],
Key='index/' + image[0],
ExtraArgs={'FullName': image[1]}
)
Using the client method:
import boto3
client = boto3.client('s3')
...
# Iterate through list to upload objects to S3
for image in images:
client.upload_file(Filename='/upload/' + image[0],
Bucket='rekognition-pictures',
Key='index/' + image[0],
ExtraArgs={'FullName': image[1]}
)
I'm having trouble figuring out how to access a certain folder within a bucket in s3 using Python
Let's say I'm trying to access this folder in the bucket which contains a bunch of images that I want to run rekognition on:
"myBucket/subfolder/images/"
In /images/ folder there are:
one.jpg
two.jpg
three.jpg
four.jpg
I want to run rekognition's detect_labels on this folder. However, I can't seem to access this folder but if I change the bucket_name to just the root folder ("myBucket"/), then I can access just that folder.
bucket_name = "myBucket/subfolder/images/"
rekognition = boto3.client('rekognition')
s3 = boto3.resource('s3')
bucket = s3.Bucket(name=bucket_name)
That is operating as expected. The bucket name should be just the name of the bucket.
You can then run operation on the bucket, eg:
import boto3
s3 = boto3.resource('s3', region_name='ap-southeast-2')
bucket = s3.Bucket('my-bucket')
for object in bucket.objects.all():
if object.key.startswith('images'):
print object.key
Or, using the client instead of the resource:
import boto3
client = boto3.client('s3', region_name='ap-southeast-2')
response = client.list_objects_v2(Bucket='my-bucket', Prefix='images/')
for object in response['Contents']:
print object['Key']
You could index elsewhere what you have in S3, in such a way, you directly access what you need. Have in mind that to loop against files stored in buckets, may offer very low performance and it gets really slow if the number of keys you have is big.
Following your example, another way to do it:
bucket_name = "myBucket"
folder_name = "subfolder/images/"
rekognition = boto3.client('rekognition')
keys= ['one.jpg','two.jpg','three.jpg','four.jpg']
s3 = boto3.resource('s3')
for k in keys:
obj = s3.Object(bucket_name, folder_name+k )
print(obj.key)
Get the list of items(keys) from any db table in your system.
In the case of AWS Rekognition (as asked), an image file stored in a folder in a S3 bucket will have the key in the form of folder_name/subfolder_name/image_name.jpg. So, since the boto3 Rekognition detect_labels() method has this syntax (per https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/rekognition.html#Rekognition.Client.detect_labels):
response = client.detect_labels(
Image={
'Bytes': b'bytes',
'S3Object': {
'Bucket': 'string',
'Name': 'string',
'Version': 'string'
}
},
MaxLabels=123,
MinConfidence=...
)
where the value of Name should be a S3 object key name, you can just feed the entire folder-image path to that dictionary as a string. To loop over multiple images, generate a list of image file names, as recommended in Evhz's answer, and loop over that list while calling the detect_labels() method above (or use a generator).
How can I see what's inside a bucket in S3 with boto3? (i.e. do an "ls")?
Doing the following:
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('some/path/')
returns:
s3.Bucket(name='some/path/')
How do I see its contents?
One way to see the contents would be:
for my_bucket_object in my_bucket.objects.all():
print(my_bucket_object)
This is similar to an 'ls' but it does not take into account the prefix folder convention and will list the objects in the bucket. It's left up to the reader to filter out prefixes which are part of the Key name.
In Python 2:
from boto.s3.connection import S3Connection
conn = S3Connection() # assumes boto.cfg setup
bucket = conn.get_bucket('bucket_name')
for obj in bucket.get_all_keys():
print(obj.key)
In Python 3:
from boto3 import client
conn = client('s3') # again assumes boto.cfg setup, assume AWS S3
for key in conn.list_objects(Bucket='bucket_name')['Contents']:
print(key['Key'])
I'm assuming you have configured authentication separately.
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('bucket_name')
for file in my_bucket.objects.all():
print(file.key)
My s3 keys utility function is essentially an optimized version of #Hephaestus's answer:
import boto3
s3_paginator = boto3.client('s3').get_paginator('list_objects_v2')
def keys(bucket_name, prefix='/', delimiter='/', start_after=''):
prefix = prefix.lstrip(delimiter)
start_after = (start_after or prefix) if prefix.endswith(delimiter) else start_after
for page in s3_paginator.paginate(Bucket=bucket_name, Prefix=prefix, StartAfter=start_after):
for content in page.get('Contents', ()):
yield content['Key']
In my tests (boto3 1.9.84), it's significantly faster than the equivalent (but simpler) code:
import boto3
def keys(bucket_name, prefix='/', delimiter='/'):
prefix = prefix.lstrip(delimiter)
bucket = boto3.resource('s3').Bucket(bucket_name)
return (_.key for _ in bucket.objects.filter(Prefix=prefix))
As S3 guarantees UTF-8 binary sorted results, a start_after optimization has been added to the first function.
In order to handle large key listings (i.e. when the directory list is greater than 1000 items), I used the following code to accumulate key values (i.e. filenames) with multiple listings (thanks to Amelio above for the first lines). Code is for python3:
from boto3 import client
bucket_name = "my_bucket"
prefix = "my_key/sub_key/lots_o_files"
s3_conn = client('s3') # type: BaseClient ## again assumes boto.cfg setup, assume AWS S3
s3_result = s3_conn.list_objects_v2(Bucket=bucket_name, Prefix=prefix, Delimiter = "/")
if 'Contents' not in s3_result:
#print(s3_result)
return []
file_list = []
for key in s3_result['Contents']:
file_list.append(key['Key'])
print(f"List count = {len(file_list)}")
while s3_result['IsTruncated']:
continuation_key = s3_result['NextContinuationToken']
s3_result = s3_conn.list_objects_v2(Bucket=bucket_name, Prefix=prefix, Delimiter="/", ContinuationToken=continuation_key)
for key in s3_result['Contents']:
file_list.append(key['Key'])
print(f"List count = {len(file_list)}")
return file_list
If you want to pass the ACCESS and SECRET keys (which you should not do, because it is not secure):
from boto3.session import Session
ACCESS_KEY='your_access_key'
SECRET_KEY='your_secret_key'
session = Session(aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
your_bucket = s3.Bucket('your_bucket')
for s3_file in your_bucket.objects.all():
print(s3_file.key)
#To print all filenames in a bucket
import boto3
s3 = boto3.client('s3')
def get_s3_keys(bucket):
"""Get a list of keys in an S3 bucket."""
resp = s3.list_objects_v2(Bucket=bucket)
for obj in resp['Contents']:
files = obj['Key']
return files
filename = get_s3_keys('your_bucket_name')
print(filename)
#To print all filenames in a certain directory in a bucket
import boto3
s3 = boto3.client('s3')
def get_s3_keys(bucket, prefix):
"""Get a list of keys in an S3 bucket."""
resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
for obj in resp['Contents']:
files = obj['Key']
print(files)
return files
filename = get_s3_keys('your_bucket_name', 'folder_name/sub_folder_name/')
print(filename)
Update:
The most easiest way is to use awswrangler
import awswrangler as wr
wr.s3.list_objects('s3://bucket_name')
import boto3
s3 = boto3.resource('s3')
## Bucket to use
my_bucket = s3.Bucket('city-bucket')
## List objects within a given prefix
for obj in my_bucket.objects.filter(Delimiter='/', Prefix='city/'):
print obj.key
Output:
city/pune.csv
city/goa.csv
A more parsimonious way, rather than iterating through via a for loop you could also just print the original object containing all files inside your S3 bucket:
session = Session(aws_access_key_id=aws_access_key_id,aws_secret_access_key=aws_secret_access_key)
s3 = session.resource('s3')
bucket = s3.Bucket('bucket_name')
files_in_s3 = bucket.objects.all()
#you can print this iterable with print(list(files_in_s3))
So you're asking for the equivalent of aws s3 ls in boto3. This would be listing all the top level folders and files. This is the closest I could get; it only lists all the top level folders. Surprising how difficult such a simple operation is.
import boto3
def s3_ls():
s3 = boto3.resource('s3')
bucket = s3.Bucket('example-bucket')
result = bucket.meta.client.list_objects(Bucket=bucket.name,
Delimiter='/')
for o in result.get('CommonPrefixes'):
print(o.get('Prefix'))
ObjectSummary:
There are two identifiers that are attached to the ObjectSummary:
bucket_name
key
boto3 S3: ObjectSummary
More on Object Keys from AWS S3 Documentation:
Object Keys:
When you create an object, you specify the key name, which uniquely identifies the object in the bucket. For example, in the Amazon S3 console (see AWS Management Console), when you highlight a bucket, a list of objects in your bucket appears. These names are the object keys. The name for a key is a sequence of Unicode characters whose UTF-8 encoding is at most 1024 bytes long.
The Amazon S3 data model is a flat structure: you create a bucket, and the bucket stores objects. There is no hierarchy of subbuckets or subfolders; however, you can infer logical hierarchy using key name prefixes and delimiters as the Amazon S3 console does. The Amazon S3 console supports a concept of folders. Suppose that your bucket (admin-created) has four objects with the following object keys:
Development/Projects1.xls
Finance/statement1.pdf
Private/taxdocument.pdf
s3-dg.pdf
Reference:
AWS S3: Object Keys
Here is some example code that demonstrates how to get the bucket name and the object key.
Example:
import boto3
from pprint import pprint
def main():
def enumerate_s3():
s3 = boto3.resource('s3')
for bucket in s3.buckets.all():
print("Name: {}".format(bucket.name))
print("Creation Date: {}".format(bucket.creation_date))
for object in bucket.objects.all():
print("Object: {}".format(object))
print("Object bucket_name: {}".format(object.bucket_name))
print("Object key: {}".format(object.key))
enumerate_s3()
if __name__ == '__main__':
main()
Here is a simple function that returns you the filenames of all files or files with certain types such as 'json', 'jpg'.
def get_file_list_s3(bucket, prefix="", file_extension=None):
"""Return the list of all file paths (prefix + file name) with certain type or all
Parameters
----------
bucket: str
The name of the bucket. For example, if your bucket is "s3://my_bucket" then it should be "my_bucket"
prefix: str
The full path to the the 'folder' of the files (objects). For example, if your files are in
s3://my_bucket/recipes/deserts then it should be "recipes/deserts". Default : ""
file_extension: str
The type of the files. If you want all, just leave it None. If you only want "json" files then it
should be "json". Default: None
Return
------
file_names: list
The list of file names including the prefix
"""
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket(bucket)
file_objs = my_bucket.objects.filter(Prefix=prefix).all()
file_names = [file_obj.key for file_obj in file_objs if file_extension is not None and file_obj.key.split(".")[-1] == file_extension]
return file_names
One way that I used to do this:
import boto3
s3 = boto3.resource('s3')
bucket=s3.Bucket("bucket_name")
contents = [_.key for _ in bucket.objects.all() if "subfolders/ifany/" in _.key]
Here is the solution
import boto3
s3=boto3.resource('s3')
BUCKET_NAME = 'Your S3 Bucket Name'
allFiles = s3.Bucket(BUCKET_NAME).objects.all()
for file in allFiles:
print(file.key)
I just did it like this, including the authentication method:
s3_client = boto3.client(
's3',
aws_access_key_id='access_key',
aws_secret_access_key='access_key_secret',
config=boto3.session.Config(signature_version='s3v4'),
region_name='region'
)
response = s3_client.list_objects(Bucket='bucket_name', Prefix=key)
if ('Contents' in response):
# Object / key exists!
return True
else:
# Object / key DOES NOT exist!
return False
With little modification to #Hephaeastus 's code in one of the above comments, wrote the below method to list down folders and objects (files) in a given path. Works similar to s3 ls command.
from boto3 import session
def s3_ls(profile=None, bucket_name=None, folder_path=None):
folders=[]
files=[]
result=dict()
bucket_name = bucket_name
prefix= folder_path
session = boto3.Session(profile_name=profile)
s3_conn = session.client('s3')
s3_result = s3_conn.list_objects_v2(Bucket=bucket_name, Delimiter = "/", Prefix=prefix)
if 'Contents' not in s3_result and 'CommonPrefixes' not in s3_result:
return []
if s3_result.get('CommonPrefixes'):
for folder in s3_result['CommonPrefixes']:
folders.append(folder.get('Prefix'))
if s3_result.get('Contents'):
for key in s3_result['Contents']:
files.append(key['Key'])
while s3_result['IsTruncated']:
continuation_key = s3_result['NextContinuationToken']
s3_result = s3_conn.list_objects_v2(Bucket=bucket_name, Delimiter="/", ContinuationToken=continuation_key, Prefix=prefix)
if s3_result.get('CommonPrefixes'):
for folder in s3_result['CommonPrefixes']:
folders.append(folder.get('Prefix'))
if s3_result.get('Contents'):
for key in s3_result['Contents']:
files.append(key['Key'])
if folders:
result['folders']=sorted(folders)
if files:
result['files']=sorted(files)
return result
This lists down all objects / folders in a given path. Folder_path can be left as None by default and method will list the immediate contents of the root of the bucket.
It can also be done as follows:
csv_files = s3.list_objects_v2(s3_bucket_path)
for obj in csv_files['Contents']:
key = obj['Key']
Using cloudpathlib
cloudpathlib provides a convenience wrapper so that you can use the simple pathlib API to interact with AWS S3 (and Azure blob storage, GCS, etc.). You can install with pip install "cloudpathlib[s3]".
Like with pathlib you can use glob or iterdir to list the contents of a directory.
Here's an example with a public AWS S3 bucket that you can copy and past to run.
from cloudpathlib import CloudPath
s3_path = CloudPath("s3://ladi/Images/FEMA_CAP/2020/70349")
# list items with glob
list(
s3_path.glob("*")
)[:3]
#> [ S3Path('s3://ladi/Images/FEMA_CAP/2020/70349/DSC_0001_5a63d42e-27c6-448a-84f1-bfc632125b8e.jpg'),
#> S3Path('s3://ladi/Images/FEMA_CAP/2020/70349/DSC_0002_a89f1b79-786f-4dac-9dcc-609fb1a977b1.jpg'),
#> S3Path('s3://ladi/Images/FEMA_CAP/2020/70349/DSC_0003_02c30af6-911e-4e01-8c24-7644da2b8672.jpg')]
# list items with iterdir
list(
s3_path.iterdir()
)[:3]
#> [ S3Path('s3://ladi/Images/FEMA_CAP/2020/70349/DSC_0001_5a63d42e-27c6-448a-84f1-bfc632125b8e.jpg'),
#> S3Path('s3://ladi/Images/FEMA_CAP/2020/70349/DSC_0002_a89f1b79-786f-4dac-9dcc-609fb1a977b1.jpg'),
#> S3Path('s3://ladi/Images/FEMA_CAP/2020/70349/DSC_0003_02c30af6-911e-4e01-8c24-7644da2b8672.jpg')]
Created at 2021-05-21 20:38:47 PDT by reprexlite v0.4.2
A good option may also be to run aws cli command from lambda functions
import subprocess
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def run_command(command):
command_list = command.split(' ')
try:
logger.info("Running shell command: \"{}\"".format(command))
result = subprocess.run(command_list, stdout=subprocess.PIPE);
logger.info("Command output:\n---\n{}\n---".format(result.stdout.decode('UTF-8')))
except Exception as e:
logger.error("Exception: {}".format(e))
return False
return True
def lambda_handler(event, context):
run_command('/opt/aws s3 ls s3://bucket-name')
I was stuck on this for an entire night because I just wanted to get the number of files under a subfolder but it was also returning one extra file in the content that was the subfolder itself,
After researching about it I found that this is how s3 works but I had
a scenario where I unloaded the data from redshift in the following directory
s3://bucket_name/subfolder/<10 number of files>
and when I used
paginator.paginate(Bucket=price_signal_bucket_name,Prefix=new_files_folder_path+"/")
it would only return the 10 files, but when I created the folder on the s3 bucket itself then it would also return the subfolder
Conclusion
If the whole folder is uploaded to s3 then listing the only returns the files under prefix
But if the fodler was created on the s3 bucket itself then listing it using boto3 client will also return the subfolder and the files
First, create an s3 client object:
s3_client = boto3.client('s3')
Next, create a variable to hold the bucket name and folder. Pay attention to the slash "/" ending the folder name:
bucket_name = 'my-bucket'
folder = 'some-folder/'
Next, call s3_client.list_objects_v2 to get the folder's content object's metadata:
response = s3_client.list_objects_v2(
Bucket=bucket_name,
Prefix=folder
)
Finally, with the object's metadata, you can obtain the S3 object by calling the s3_client.get_object function:
for object_metadata in response['Contents']:
object_key = object_metadata['Key']
response = s3_client.get_object(
Bucket=bucket_name,
Key=object_key
)
object_body = response['Body'].read()
print(object_body)
As you can see, the object content in the string format is available by calling response['Body'].read()