So I am creating a glue job and part of it is to check if a path exists in s3. Imagine I had a path like so:
s3://my-bucket/level0/level1/level2 (etc)
Using Variables:
varBucket = "my-bucket"
varKey = "level0/"
Then this code works like so:
import boto3
from botocore.errorfactory import ClientError
s3 = boto3.client('s3')
try:
s3.head_object(Bucket=varBucket, Key=varKey)
print("Path Exists")
except ClientError:
print("Path Does Not Exist")
pass
I get the Print Output of "Path Exists"
BUT if I change the Key to this:
varKey="level0/level1/"
Then I get the Print that "Path Does Not Exist" - even though I know it does. I can go there in s3.
Its almost as if I can only go 1 level deep with the key but as soon as I try going to the next level and beyond, an exception happens. Any ideas where I am going wrong?
There are no "directories" in S3, only prefixes.
If you have a file s3://my-bucket/level0/level1/level2/file.dat, there is no "directory" object by the name of level0/level1/.
You can use the list objects call with a Prefix= to filter objects whose key (e.g. level0/level1/level2/file.dat) starts with such a prefix, but trying to HeadObject or GetObject on a prefix will not work.
OK so I got something working - maybe quite clunky but tested to work OK:
for my_bucket_object in s3.Bucket(varBucket).objects.filter(Prefix=varKey):
if varKey in my_bucket_object.key:
##Do Stuff
break
Basically if the key exists in the bucket, it will DoStuff. In my case add the full s3 URI to an array for later use.
Directories magically appear in S3 if there are files in that path. They then magically disappear if there are no files there.
If you want to know if they 'exist', then call:
list_objects(Bucket='your-bucket', Delimiter='/')
and look at the list of CommonPrefixes that are returned. They are the equivalent of directories.
To see directories at a deeper level, also specify a Prefix, eg:
list_objects(Bucket='your-bucket', Delimiter='/', Prefix='level0/level1/')
Related
Have found many questions related to this with solutions using boto3, however I am in a position where I have to use boto, running Python 2.38.
Now I can successfully transfer my files in their folders (Not real folders I know as S3 doesn't have this concept) but I want them to be saved into a particular folder in my destination bucket
from boto.s3.connection import S3Connection
def transfer_files():
conn = S3Connection()
srcBucket = conn.get_bucket("source_bucket")
dstBucket = conn.get_bucket(bucket_name="destination_bucket")
objectlist = srcbucket.list()
for obj in objectlist:
dstBucket.copy_key(obj.key, srcBucket.name, obj.key)
My srcBucket will look like folder/subFolder/anotherSubFolder/file.txt which when transferred will land in the dstBucket like so destination_bucket/folder/subFolder/anotherSubFolder/file.txt
I would like it to end up in destination_bucket/targetFolder so the final directory structure would look like
destination_bucket/targetFolder/folder/subFolder/anotherSubFolder/file.txt
Hopefully I have explained this well enough and it makes sense
The first parameter is the name of the destination key.
Therefore, just use:
dstBucket.copy_key('targetFolder/' + obj.key, srcBucket.name, obj.key)
I am trying to write the file to S3 from the JSON structure in the Python 2.7 script. The code is as follows:
S3_bucket = s3.Bucket(__S3_BUCKET__)
result = S3_bucket.put_object(Key=__S3_BUCKET_PATH__ + 'file_prefix_' + str(int(time.time()))+'.json', Body = str(json.dumps(dict_list)).encode("utf-8"))
I end up with the S3 bucket handler is which is
s3.Bucket(name='bucket_name')
S3 file path is /file_prefix_1545039898.json
{'statusCode': s3.Object(bucket_name='bucket_name', key='/file_prefix_1545039898.json')}
But I see nothing on S3 - no files were created. I have a suspicion that I may require commit of some kind, bit all the manuals I came across are saying otherwise. Did anyone had a problem like this?
Apparently, the leading slash works not as a standard path designator - it creates an empty name directory, which has not been seen. Removing the one puts things where they belong.
I am trying to access a bucket subfolder using python's boto3.
The problem is that I cannot find anywhere how to input the subfolder information inside the boto code.
All I find is how to put the bucket name, but I do not have access to the whole bucket, just to a specific subfolder. Can anyone give me a light?
What I did so far:
BUCKET = "folder/subfolder"
conn = S3Connection(AWS_KEY, AWS_SECRET)
bucket = conn.get_bucket(BUCKET)
for key in bucket.list():
print key.name.encode('utf-8')
The error messages:
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListBuckets operation: Access Denied
I do not need to use boto for the operation, I just need to list/get the files inside this subfolder.
P.S.: I can access the files using cyberduck by putting the path folder/subfolder, which means I have access to the date.
Sincerely,
Israel
I fixed the problem using something similar vtl suggested:
I had to put the prefix in my bucket and a delimiter. The final code was something like this:
objects = s3.list_objects(Bucket=bucketName, Prefix=bucketPath+'/', Delimiter='/')
As he said, there's not folder structure, then you have to state a delimiter and also put it after the Prefix like I did.
Thanks for the reply.
Try:
for obj in bucket.objects.filter(Prefix="your_subfolder"):
do_something()
AWS doesn't actually have a directory structure - it just fakes one by putting "/"s in names. The Prefix option restricts the search to all objects whose name starts with the given prefix, which should be your "subfolder".
I am trying to traverse all objects inside a specific folder in my S3 bucket. The code I already have is like follows:
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket-name')
for obj in bucket.objects.filter(Prefix='folder/'):
do_stuff(obj)
I need to use boto3.resource and not client. This code is not getting any objects at all although I have a bunch of text files in the folder. Can someone advise?
Try adding the Delimiter attribute: Delimiter = '\' as you are filtering objects. The rest of the code looks fine.
I had to make sure to skip the first file. For some reason it thinks the folder name is the first file and that may not be what you want.
for video_item in source_bucket.objects.filter(Prefix="my-folder-name/", Delimiter='/'):
if video_item.key == 'my-folder-name/':
continue
do_something(video_item.key)
I'm developing a python script that will upload files to a specific folder in my drive, as I come to notice, the drive api provides an excellent implementation for that, but I did encountered one problem, how do I delete multiple files at once?
I tried grabbing the files I want from the drive and organize their Id's but no luck there... (a snippet below)
dir_id = "my folder Id"
file_id = "avoid deleting this file"
dFiles = []
query = ""
#will return a list of all the files in the folder
children = service.files().list(q="'"+dir_id+"' in parents").execute()
for i in children["items"]:
print "appending "+i["title"]
if i["id"] != file_id:
#two format options I tried..
dFiles.append(i["id"]) # will show as array of id's ["id1","id2"...]
query +=i["id"]+", " #will show in this format "id1, id2,..."
query = query[:-2] #to remove the finished ',' in the string
#tried both the query and str(dFiles) as arg but no luck...
service.files().delete(fileId=query).execute()
Is it possible to delete selected files (I don't see why it wouldn't be possible, after all, it's a basic operation)?
Thanks in advance!
You can batch multiple Drive API requests together. Something like this should work using the Python API Client Library:
def delete_file(request_id, response, exception):
if exception is not None:
# Do something with the exception
pass
else:
# Do something with the response
pass
batch = service.new_batch_http_request(callback=delete_file)
for file in children["items"]:
batch.add(service.files().delete(fileId=file["id"]))
batch.execute(http=http)
If you delete or trash a folder, it will recursively delete/trash all of the files contained in that folder. Therefore, your code can be vastly simplified:
dir_id = "my folder Id"
file_id = "avoid deleting this file"
service.files().update(fileId=file_id, addParents="root", removeParents=dir_id).execute()
service.files().delete(fileId=dir_id).execute()
This will first move the file you want to keep out of the folder (and into "My Drive") and then delete the folder.
Beware: if you call delete() instead of trash(), the folder and all the files within it will be permanently deleted and there is no way to recover them! So be very careful when using this method with a folder...