Access a amazon s3 bucket subfolder using python - python

I am trying to access a bucket subfolder using python's boto3.
The problem is that I cannot find anywhere how to input the subfolder information inside the boto code.
All I find is how to put the bucket name, but I do not have access to the whole bucket, just to a specific subfolder. Can anyone give me a light?
What I did so far:
BUCKET = "folder/subfolder"
conn = S3Connection(AWS_KEY, AWS_SECRET)
bucket = conn.get_bucket(BUCKET)
for key in bucket.list():
print key.name.encode('utf-8')
The error messages:
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListBuckets operation: Access Denied
I do not need to use boto for the operation, I just need to list/get the files inside this subfolder.
P.S.: I can access the files using cyberduck by putting the path folder/subfolder, which means I have access to the date.
Sincerely,
Israel

I fixed the problem using something similar vtl suggested:
I had to put the prefix in my bucket and a delimiter. The final code was something like this:
objects = s3.list_objects(Bucket=bucketName, Prefix=bucketPath+'/', Delimiter='/')
As he said, there's not folder structure, then you have to state a delimiter and also put it after the Prefix like I did.
Thanks for the reply.

Try:
for obj in bucket.objects.filter(Prefix="your_subfolder"):
do_something()
AWS doesn't actually have a directory structure - it just fakes one by putting "/"s in names. The Prefix option restricts the search to all objects whose name starts with the given prefix, which should be your "subfolder".

Related

Trying to determine if an S3 path exists in code

So I am creating a glue job and part of it is to check if a path exists in s3. Imagine I had a path like so:
s3://my-bucket/level0/level1/level2 (etc)
Using Variables:
varBucket = "my-bucket"
varKey = "level0/"
Then this code works like so:
import boto3
from botocore.errorfactory import ClientError
s3 = boto3.client('s3')
try:
s3.head_object(Bucket=varBucket, Key=varKey)
print("Path Exists")
except ClientError:
print("Path Does Not Exist")
pass
I get the Print Output of "Path Exists"
BUT if I change the Key to this:
varKey="level0/level1/"
Then I get the Print that "Path Does Not Exist" - even though I know it does. I can go there in s3.
Its almost as if I can only go 1 level deep with the key but as soon as I try going to the next level and beyond, an exception happens. Any ideas where I am going wrong?
There are no "directories" in S3, only prefixes.
If you have a file s3://my-bucket/level0/level1/level2/file.dat, there is no "directory" object by the name of level0/level1/.
You can use the list objects call with a Prefix= to filter objects whose key (e.g. level0/level1/level2/file.dat) starts with such a prefix, but trying to HeadObject or GetObject on a prefix will not work.
OK so I got something working - maybe quite clunky but tested to work OK:
for my_bucket_object in s3.Bucket(varBucket).objects.filter(Prefix=varKey):
if varKey in my_bucket_object.key:
##Do Stuff
break
Basically if the key exists in the bucket, it will DoStuff. In my case add the full s3 URI to an array for later use.
Directories magically appear in S3 if there are files in that path. They then magically disappear if there are no files there.
If you want to know if they 'exist', then call:
list_objects(Bucket='your-bucket', Delimiter='/')
and look at the list of CommonPrefixes that are returned. They are the equivalent of directories.
To see directories at a deeper level, also specify a Prefix, eg:
list_objects(Bucket='your-bucket', Delimiter='/', Prefix='level0/level1/')

create a new bucket inside bucket in google storage using python

I need to create a folder inside folder in google clous storage using python.
I know how to create one folder:
bucket_name = 'data_bucket_10_07'
# create a new bucket
bucket = storage_client.bucket(bucket_name)
bucket.storage_class = 'COLDLINE' # Archive | Nearline | Standard
bucket.location = 'US' # Taiwan
bucket = storage_client.create_bucket(bucket) # returns Bucket object
my_bucket = storage_client.get_bucket(bucket_name)
when I try to change bucket_name = 'data_bucket_10_07' to bucket_name = 'data_bucket_10_07/data_bucket_10_07_1' I got an error:
google.api_core.exceptions.BadRequest: 400 POST https://storage.googleapis.com/storage/v1/b?project=effective-forge-317205&prettyPrint=false: Invalid bucket name: 'data_bucket_10_07/data_bucket_10_07_1'
How should I solve my problem?
As John mentioned in the comment, it may not be ontologically possible to have a bucket inside a bucket.
See Bucket naming guidelines for documentation details.
In nutshell:
There is only one level of buckets in a global namespace (thus the bucket name is to be global unique). Everything beyond the bucket name - belongs to an object name.
For example, you can create a bucket (let's guess the name is not already in use) like data_bucket_10_07. In that case, it may look like gs://data_bucket_10_07
Then, you probably would like to store some objects (files) in a such way, that it looks like a directory hierarchy, so, let say there are /01/data.csv object and /02/data.csv object. Where the 01 and 02 should presumably semantically reflect some date.
Those /01/ and /02/ elements - are essentially beginning parts of the object names (or prefixes for the objects in other words).
So far the bucket name is gs://data_bucket_10_07
The object names are /01/data.csv and /02/data.csv
I would suggest checking Object naming guidelines documentation where those ideas are described much better then I can do in one sentence.
Other comments do a great job of detailing that nested buckets are not possible, but they only suggest at the following shortly: GCS does not rely on folders, and only presents contents with a hierarchical structure on the web UI for ease of use.
From the documentation:
Cloud Storage operates with a flat namespace, which means that folders don't actually exist within Cloud Storage. If you create an object named folder1/file.txt in the bucket your-bucket, the path to the object is your-bucket/folder1/file.txt. There is no folder1 folder, just a single object with folder1 as part of its name.
So if you'd like to create a "folder" for organization and immediately place an object in it, name your object with the "folders" ahead of the name, and GCS will take care of 'creating' them if they don't already exist.

No file after S3 boto put

I am trying to write the file to S3 from the JSON structure in the Python 2.7 script. The code is as follows:
S3_bucket = s3.Bucket(__S3_BUCKET__)
result = S3_bucket.put_object(Key=__S3_BUCKET_PATH__ + 'file_prefix_' + str(int(time.time()))+'.json', Body = str(json.dumps(dict_list)).encode("utf-8"))
I end up with the S3 bucket handler is which is
s3.Bucket(name='bucket_name')
S3 file path is /file_prefix_1545039898.json
{'statusCode': s3.Object(bucket_name='bucket_name', key='/file_prefix_1545039898.json')}
But I see nothing on S3 - no files were created. I have a suspicion that I may require commit of some kind, bit all the manuals I came across are saying otherwise. Did anyone had a problem like this?
Apparently, the leading slash works not as a standard path designator - it creates an empty name directory, which has not been seen. Removing the one puts things where they belong.

Adding s3 bucket folder's address to output bucket + python

How do I link my s3 bucket folder to output bucket in python?
I tried several permutation and combination but it still didn't work out. All I need is to link my folder address to output bucket in python.
I found error when I tried below combination -
output Bucket = "s3-bucket.folder-name"
output Bucket = "s3-bucket/folder-name/"
output Bucket = "s3-bucket\folder-name\"
None from the above worked, throws an error as -
Parameter validation failed:
Invalid bucket name "s3-bucket/folder-name/": Bucket name must match the
regex "^[a-z A-Z 0-9.\-_]{1,255}$"
Is there any alternate way to put the folder address into python script?
Please help!
In AWS, the "folder", or the object file path, is all part of the object key.
So when accessing a bucket you specify the bucket name, which is strictly the bucket name with nothing else, and then the object key would be the file path.

How to get objects from a folder in an S3 bucket

I am trying to traverse all objects inside a specific folder in my S3 bucket. The code I already have is like follows:
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket-name')
for obj in bucket.objects.filter(Prefix='folder/'):
do_stuff(obj)
I need to use boto3.resource and not client. This code is not getting any objects at all although I have a bunch of text files in the folder. Can someone advise?
Try adding the Delimiter attribute: Delimiter = '\' as you are filtering objects. The rest of the code looks fine.
I had to make sure to skip the first file. For some reason it thinks the folder name is the first file and that may not be what you want.
for video_item in source_bucket.objects.filter(Prefix="my-folder-name/", Delimiter='/'):
if video_item.key == 'my-folder-name/':
continue
do_something(video_item.key)

Categories

Resources