How to get objects from a folder in an S3 bucket - python

I am trying to traverse all objects inside a specific folder in my S3 bucket. The code I already have is like follows:
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket-name')
for obj in bucket.objects.filter(Prefix='folder/'):
do_stuff(obj)
I need to use boto3.resource and not client. This code is not getting any objects at all although I have a bunch of text files in the folder. Can someone advise?

Try adding the Delimiter attribute: Delimiter = '\' as you are filtering objects. The rest of the code looks fine.

I had to make sure to skip the first file. For some reason it thinks the folder name is the first file and that may not be what you want.
for video_item in source_bucket.objects.filter(Prefix="my-folder-name/", Delimiter='/'):
if video_item.key == 'my-folder-name/':
continue
do_something(video_item.key)

Related

AWS Lambda - Combine multiple CSV files from S3 into one file

I am trying to understand and learn how to get all my files from the specific bucket into one csv file. I have the files that are like logs and are always in the same format and are kept in the same bucket. I have this code to access them and read them:
bucket = s3_resource.Bucket(bucket_name)
for obj in bucket.objects.all():
x = obj.get()['Body'].read().decode('utf-8')
print(x)
It does print them with separation between specific files and also column headers.
The question I have got is, how can I modify my loop to get them into just one csv file?
You should create a file in /tmp/ and write the contents of each object into that file.
Then, when all files have been read, upload the file (or do whatever you want to do with it).
output = open('/tmp/outfile.txt', 'w')
bucket = s3_resource.Bucket(bucket_name)
for obj in bucket.objects.all():
output.write(obj.get()['Body'].read().decode('utf-8'))
output.close
Please note that there is a limit of 512MB in the /tmp/ directory.

How to get top-level folders in an S3 bucket using boto3?

I have an S3 bucket with a few top level folders, and hundreds of files in each of these folders. How do I get the names of these top level folders?
I have tried the following:
s3 = boto3.resource('s3', region_name='us-west-2', endpoint_url='https://s3.us-west-2.amazonaws.com')
bucket = s3.Bucket('XXX')
for obj in bucket.objects.filter(Prefix='', Delimiter='/'):
print obj.key
But this doesn't seem to work. I have thought about using regex to filter all the folder names, but this doesn't seem time efficient.
Thanks in advance!
Try this.
import boto3
client = boto3.client('s3')
paginator = client.get_paginator('list_objects')
result = paginator.paginate(Bucket='my-bucket', Delimiter='/')
for prefix in result.search('CommonPrefixes'):
print(prefix.get('Prefix'))
The Amazon S3 data model is a flat structure: you create a bucket, and the bucket stores objects. There is no hierarchy of subbuckets or subfolders; however, you can infer logical hierarchy using key name prefixes and delimiters as the Amazon S3 console does (source)
In other words, there's no way around iterating all of the keys in the bucket and extracting whatever structure that you want to see (depending on your needs, a dict-of-dicts may be a good approach for you).
You could also use Amazon Athena in order to analyse/query S3 buckets.
https://aws.amazon.com/athena/

Transfer files from S3 Bucket to another keeping folder structure - python boto

Have found many questions related to this with solutions using boto3, however I am in a position where I have to use boto, running Python 2.38.
Now I can successfully transfer my files in their folders (Not real folders I know as S3 doesn't have this concept) but I want them to be saved into a particular folder in my destination bucket
from boto.s3.connection import S3Connection
def transfer_files():
conn = S3Connection()
srcBucket = conn.get_bucket("source_bucket")
dstBucket = conn.get_bucket(bucket_name="destination_bucket")
objectlist = srcbucket.list()
for obj in objectlist:
dstBucket.copy_key(obj.key, srcBucket.name, obj.key)
My srcBucket will look like folder/subFolder/anotherSubFolder/file.txt which when transferred will land in the dstBucket like so destination_bucket/folder/subFolder/anotherSubFolder/file.txt
I would like it to end up in destination_bucket/targetFolder so the final directory structure would look like
destination_bucket/targetFolder/folder/subFolder/anotherSubFolder/file.txt
Hopefully I have explained this well enough and it makes sense
The first parameter is the name of the destination key.
Therefore, just use:
dstBucket.copy_key('targetFolder/' + obj.key, srcBucket.name, obj.key)

No file after S3 boto put

I am trying to write the file to S3 from the JSON structure in the Python 2.7 script. The code is as follows:
S3_bucket = s3.Bucket(__S3_BUCKET__)
result = S3_bucket.put_object(Key=__S3_BUCKET_PATH__ + 'file_prefix_' + str(int(time.time()))+'.json', Body = str(json.dumps(dict_list)).encode("utf-8"))
I end up with the S3 bucket handler is which is
s3.Bucket(name='bucket_name')
S3 file path is /file_prefix_1545039898.json
{'statusCode': s3.Object(bucket_name='bucket_name', key='/file_prefix_1545039898.json')}
But I see nothing on S3 - no files were created. I have a suspicion that I may require commit of some kind, bit all the manuals I came across are saying otherwise. Did anyone had a problem like this?
Apparently, the leading slash works not as a standard path designator - it creates an empty name directory, which has not been seen. Removing the one puts things where they belong.

Access a amazon s3 bucket subfolder using python

I am trying to access a bucket subfolder using python's boto3.
The problem is that I cannot find anywhere how to input the subfolder information inside the boto code.
All I find is how to put the bucket name, but I do not have access to the whole bucket, just to a specific subfolder. Can anyone give me a light?
What I did so far:
BUCKET = "folder/subfolder"
conn = S3Connection(AWS_KEY, AWS_SECRET)
bucket = conn.get_bucket(BUCKET)
for key in bucket.list():
print key.name.encode('utf-8')
The error messages:
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListBuckets operation: Access Denied
I do not need to use boto for the operation, I just need to list/get the files inside this subfolder.
P.S.: I can access the files using cyberduck by putting the path folder/subfolder, which means I have access to the date.
Sincerely,
Israel
I fixed the problem using something similar vtl suggested:
I had to put the prefix in my bucket and a delimiter. The final code was something like this:
objects = s3.list_objects(Bucket=bucketName, Prefix=bucketPath+'/', Delimiter='/')
As he said, there's not folder structure, then you have to state a delimiter and also put it after the Prefix like I did.
Thanks for the reply.
Try:
for obj in bucket.objects.filter(Prefix="your_subfolder"):
do_something()
AWS doesn't actually have a directory structure - it just fakes one by putting "/"s in names. The Prefix option restricts the search to all objects whose name starts with the given prefix, which should be your "subfolder".

Categories

Resources