Adding s3 bucket folder's address to output bucket + python - python

How do I link my s3 bucket folder to output bucket in python?
I tried several permutation and combination but it still didn't work out. All I need is to link my folder address to output bucket in python.
I found error when I tried below combination -
output Bucket = "s3-bucket.folder-name"
output Bucket = "s3-bucket/folder-name/"
output Bucket = "s3-bucket\folder-name\"
None from the above worked, throws an error as -
Parameter validation failed:
Invalid bucket name "s3-bucket/folder-name/": Bucket name must match the
regex "^[a-z A-Z 0-9.\-_]{1,255}$"
Is there any alternate way to put the folder address into python script?
Please help!

In AWS, the "folder", or the object file path, is all part of the object key.
So when accessing a bucket you specify the bucket name, which is strictly the bucket name with nothing else, and then the object key would be the file path.

Related

Decoding s3 file names having special characters

while writing lambda code to copy one file from source to destination bucket I'm facing some issue. Getting cloudwatch logs specifying folder name as file_name
insidesource/newsource/test/Active%3D1/devopsnotes.txt
whereas my actual folder name contains Active=1
Please find logs from cloud watch and lambda code as well.
Need assistance on how to decode particular file name in cloudwatch logs.
lambda code and cloudwatch logs
The s3 object key names are URL encoded in cloudwatch logs.
Check out Characters that might require special handling
for s3 object names.
You can use the unquote_plus() from urllib.parse to get the decoded file name -
from urllib.parse import unquote_plus
file_name = "insidesource/newsource/test/Active%3D1/devopsnotes.txt"
print(unquote_plus(file_name))
Output:
insidesource/newsource/test/Active=1/devopsnotes.txt

Read content of a file located under subfolders of S3 in Python

I'm trying to read a file content's (not to download it) from an S3 bucket. The problem is that the file is located under a multi-level folder. For instance, the full path could be s3://s3-bucket/folder-1/folder-2/my_file.json. How can I get that specific file instead of using my iterative approach that lists all objects?
Here is the code that I want to change:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('s3-bucket')
for obj in my_bucket.objects.all():
key = obj.key
if key == 'folder-1/folder-2/my_file.json':
return obj.get()['Body'].read()
Can it be done in a simpler, more direct way?
Yes - there is no need to enumerate the bucket.
Read the file directly using s3.Object, providing the bucket name as the 1st parameter & the object key as the 2nd parameter.
"Folders" don't really exist in S3 - Amazon S3 doesn't use hierarchy to organize its objects and files. For the sake of organizational simplicity, the Amazon S3 console shows "folders" as a means of grouping objects but they are ultimately baked into your object key.
This should work:
import boto3
s3 = boto3.resource('s3')
obj = s3.Object("s3-bucket", "folder-1/folder-2/my_file.json")
body = obj.get()['Body'].read()

create a new bucket inside bucket in google storage using python

I need to create a folder inside folder in google clous storage using python.
I know how to create one folder:
bucket_name = 'data_bucket_10_07'
# create a new bucket
bucket = storage_client.bucket(bucket_name)
bucket.storage_class = 'COLDLINE' # Archive | Nearline | Standard
bucket.location = 'US' # Taiwan
bucket = storage_client.create_bucket(bucket) # returns Bucket object
my_bucket = storage_client.get_bucket(bucket_name)
when I try to change bucket_name = 'data_bucket_10_07' to bucket_name = 'data_bucket_10_07/data_bucket_10_07_1' I got an error:
google.api_core.exceptions.BadRequest: 400 POST https://storage.googleapis.com/storage/v1/b?project=effective-forge-317205&prettyPrint=false: Invalid bucket name: 'data_bucket_10_07/data_bucket_10_07_1'
How should I solve my problem?
As John mentioned in the comment, it may not be ontologically possible to have a bucket inside a bucket.
See Bucket naming guidelines for documentation details.
In nutshell:
There is only one level of buckets in a global namespace (thus the bucket name is to be global unique). Everything beyond the bucket name - belongs to an object name.
For example, you can create a bucket (let's guess the name is not already in use) like data_bucket_10_07. In that case, it may look like gs://data_bucket_10_07
Then, you probably would like to store some objects (files) in a such way, that it looks like a directory hierarchy, so, let say there are /01/data.csv object and /02/data.csv object. Where the 01 and 02 should presumably semantically reflect some date.
Those /01/ and /02/ elements - are essentially beginning parts of the object names (or prefixes for the objects in other words).
So far the bucket name is gs://data_bucket_10_07
The object names are /01/data.csv and /02/data.csv
I would suggest checking Object naming guidelines documentation where those ideas are described much better then I can do in one sentence.
Other comments do a great job of detailing that nested buckets are not possible, but they only suggest at the following shortly: GCS does not rely on folders, and only presents contents with a hierarchical structure on the web UI for ease of use.
From the documentation:
Cloud Storage operates with a flat namespace, which means that folders don't actually exist within Cloud Storage. If you create an object named folder1/file.txt in the bucket your-bucket, the path to the object is your-bucket/folder1/file.txt. There is no folder1 folder, just a single object with folder1 as part of its name.
So if you'd like to create a "folder" for organization and immediately place an object in it, name your object with the "folders" ahead of the name, and GCS will take care of 'creating' them if they don't already exist.

How to check give directory or folder exist in given s3 bucket and if exist how to delete the folder from s3?

I want to check whether folder or directory exist in give s3 bucket, if exist i want delete folder from s3 bucket using python code.
example for : s3:/bucket124/test
Here "bucket124" is bucket and "test" is folder contains some files like test.txt test1.txt
I want to delete folder "test" from my s3 bucket.
Here is how you will do that,
import boto3
s3 = boto3.resource('s3')
bucket=s3.Bucket('mausamrest');
obj = s3.Object('mausamrest','test/hello')
counter=0
for key in bucket.objects.filter(Prefix='test/hello/'):
counter=counter+1
if(counter!=0):
obj.delete()
print(counter)
mausamrest is the bucket and test/hello/ is the directory you want to check for items , but take care of one thing that after checking you have to delete test/hello instead of test/hello/ to delete a particular sub folder and hence the keyname in 5th line is test/hello

Access a amazon s3 bucket subfolder using python

I am trying to access a bucket subfolder using python's boto3.
The problem is that I cannot find anywhere how to input the subfolder information inside the boto code.
All I find is how to put the bucket name, but I do not have access to the whole bucket, just to a specific subfolder. Can anyone give me a light?
What I did so far:
BUCKET = "folder/subfolder"
conn = S3Connection(AWS_KEY, AWS_SECRET)
bucket = conn.get_bucket(BUCKET)
for key in bucket.list():
print key.name.encode('utf-8')
The error messages:
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListBuckets operation: Access Denied
I do not need to use boto for the operation, I just need to list/get the files inside this subfolder.
P.S.: I can access the files using cyberduck by putting the path folder/subfolder, which means I have access to the date.
Sincerely,
Israel
I fixed the problem using something similar vtl suggested:
I had to put the prefix in my bucket and a delimiter. The final code was something like this:
objects = s3.list_objects(Bucket=bucketName, Prefix=bucketPath+'/', Delimiter='/')
As he said, there's not folder structure, then you have to state a delimiter and also put it after the Prefix like I did.
Thanks for the reply.
Try:
for obj in bucket.objects.filter(Prefix="your_subfolder"):
do_something()
AWS doesn't actually have a directory structure - it just fakes one by putting "/"s in names. The Prefix option restricts the search to all objects whose name starts with the given prefix, which should be your "subfolder".

Categories

Resources