I have an S3 bucket with a few top level folders, and hundreds of files in each of these folders. How do I get the names of these top level folders?
I have tried the following:
s3 = boto3.resource('s3', region_name='us-west-2', endpoint_url='https://s3.us-west-2.amazonaws.com')
bucket = s3.Bucket('XXX')
for obj in bucket.objects.filter(Prefix='', Delimiter='/'):
print obj.key
But this doesn't seem to work. I have thought about using regex to filter all the folder names, but this doesn't seem time efficient.
Thanks in advance!
Try this.
import boto3
client = boto3.client('s3')
paginator = client.get_paginator('list_objects')
result = paginator.paginate(Bucket='my-bucket', Delimiter='/')
for prefix in result.search('CommonPrefixes'):
print(prefix.get('Prefix'))
The Amazon S3 data model is a flat structure: you create a bucket, and the bucket stores objects. There is no hierarchy of subbuckets or subfolders; however, you can infer logical hierarchy using key name prefixes and delimiters as the Amazon S3 console does (source)
In other words, there's no way around iterating all of the keys in the bucket and extracting whatever structure that you want to see (depending on your needs, a dict-of-dicts may be a good approach for you).
You could also use Amazon Athena in order to analyse/query S3 buckets.
https://aws.amazon.com/athena/
Related
I'm trying to read a file content's (not to download it) from an S3 bucket. The problem is that the file is located under a multi-level folder. For instance, the full path could be s3://s3-bucket/folder-1/folder-2/my_file.json. How can I get that specific file instead of using my iterative approach that lists all objects?
Here is the code that I want to change:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('s3-bucket')
for obj in my_bucket.objects.all():
key = obj.key
if key == 'folder-1/folder-2/my_file.json':
return obj.get()['Body'].read()
Can it be done in a simpler, more direct way?
Yes - there is no need to enumerate the bucket.
Read the file directly using s3.Object, providing the bucket name as the 1st parameter & the object key as the 2nd parameter.
"Folders" don't really exist in S3 - Amazon S3 doesn't use hierarchy to organize its objects and files. For the sake of organizational simplicity, the Amazon S3 console shows "folders" as a means of grouping objects but they are ultimately baked into your object key.
This should work:
import boto3
s3 = boto3.resource('s3')
obj = s3.Object("s3-bucket", "folder-1/folder-2/my_file.json")
body = obj.get()['Body'].read()
I have some files in a specific folder of an s3 bucket. All file names are in the same pattern like below:
s3://example_bucket/products/product1/rawmat-2343274absdfj7827d.csv
s3://example_bucket/products/product1/rawmat-7997werewr666ee.csv
s3://example_bucket/products/product1/rawmat-8qwer897hhw776w3.csv
s3://example_bucket/products/product1/rawmat-2364875349873uy68848732.csv
....
....
Here, I think we can say:
bucket_name = 'example_bucket'
prefix = 'products/product1/'
key = 'rawmat-*.csv'
I need to read each of them. I highly prefer not to list of the objects in the bucket.
What could be the most efficient way to do this?
Iterate over the objects at the folder using a prefix
bucket_name = 'example_bucket'
prefix = 'products/product1/rawmat'
for my_object in bucket_name.objects.filter(Prefix= prefix):
print(my_object)
I have boto code that collects S3 sub-folders in levelOne folder:
import boto
s3 = boto.connect_s3()
bucket = s3.get_bucket("MyBucket")
for level2 in bucket.list(prefix="levelOne/", delimiter="/"):
print(level2.name)
Please help to discover similar functionality in boto3. The code should not iterate through all S3 objects because the bucket has a very big number of objects.
If you are simply seeking a list of folders, then use CommonPrefixes returned when listing objects. Note that a Delimiter must be specified to obtain the CommonPrefixes:
import boto3
s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket='BUCKET-NAME', Delimiter = '/')
for prefix in response['CommonPrefixes']:
print(prefix['Prefix'][:-1])
If your bucket has a HUGE number of folders and objects, you might consider using Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects.
I think the following should be equivalent:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('MyBucket')
for object in bucket.objects.filter(Prefix="levelOne/", Delimiter="/"):
print(object.key)
Have found many questions related to this with solutions using boto3, however I am in a position where I have to use boto, running Python 2.38.
Now I can successfully transfer my files in their folders (Not real folders I know as S3 doesn't have this concept) but I want them to be saved into a particular folder in my destination bucket
from boto.s3.connection import S3Connection
def transfer_files():
conn = S3Connection()
srcBucket = conn.get_bucket("source_bucket")
dstBucket = conn.get_bucket(bucket_name="destination_bucket")
objectlist = srcbucket.list()
for obj in objectlist:
dstBucket.copy_key(obj.key, srcBucket.name, obj.key)
My srcBucket will look like folder/subFolder/anotherSubFolder/file.txt which when transferred will land in the dstBucket like so destination_bucket/folder/subFolder/anotherSubFolder/file.txt
I would like it to end up in destination_bucket/targetFolder so the final directory structure would look like
destination_bucket/targetFolder/folder/subFolder/anotherSubFolder/file.txt
Hopefully I have explained this well enough and it makes sense
The first parameter is the name of the destination key.
Therefore, just use:
dstBucket.copy_key('targetFolder/' + obj.key, srcBucket.name, obj.key)
I am trying to traverse all objects inside a specific folder in my S3 bucket. The code I already have is like follows:
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket-name')
for obj in bucket.objects.filter(Prefix='folder/'):
do_stuff(obj)
I need to use boto3.resource and not client. This code is not getting any objects at all although I have a bunch of text files in the folder. Can someone advise?
Try adding the Delimiter attribute: Delimiter = '\' as you are filtering objects. The rest of the code looks fine.
I had to make sure to skip the first file. For some reason it thinks the folder name is the first file and that may not be what you want.
for video_item in source_bucket.objects.filter(Prefix="my-folder-name/", Delimiter='/'):
if video_item.key == 'my-folder-name/':
continue
do_something(video_item.key)