boto3 put_object with new key python - python

I want to save a csv file ("test.csv") in S3 using boto3.
my bucket is "outputS3Bucket" and the key is "folder/newFolder".
I want to check if "newFolder" exists and if not to create it.
import boto3
client = boto3.client('s3')
s3 = boto3.resource('s3')
bucket = s3.Bucket("outputS3Bucket")
result = client.list_objects(Bucket='outputS3Bucket',Prefix="folder/newFolder")
if len(result)==0:
key = bucket.new_key("folder/newFolder")
newKey = key + "/" + "test.csv"
client.put_object(Bucket="outputS3Bucket", Key=newKey, Body=content)
# put_object path: 's3://outputS3Bucket/folder/newFolder/test.csv'
I have few problems:
if I don't write the full key name (such as "folder/ne") and there is a "neaFo" folder instead it still says it exists.
key = bucket.new_key("folder/newFolder")
AttributeError: 's3.Bucket' object has no attribute 'new_key'

Firstly, according to boto3 documentation, it's preferred to use the new API method - list_objects_v2() instead to list a bucket's objects.
I suggest using a simple boolean function to check whether a folder exist (makes your code cleaner and more readable).
for question 1, you can check if the prefix ends with '/' character and append it if not, - this will make sure your are looking for EXACT match and not Starts With.
Sample Function:
def bucket_folder_exists(client, bucket, path_prefix):
# make path_prefix exact match and not path/to/folder*
if list(path_prefix)[-1] is not '/':
path_prefix += '/'
# check if 'Contents' key exist in response dict - if it exist it indicate the folder exists, otherwise response will be None
response = client.list_objects_v2(Bucket=bucket, Prefix=path_prefix).get('Contents')
if response:
return True
return False
Sample Implementation:
if bucket_folder_exists(client, 'outputS3Bucket', 'folder/newFolder'):
pass # Do something if folder already exist
else:
pass # Do something if folder does not exist
Regarding your second question, I added a comment - it seems your code mentions bucket variable\object used as key = bucket.new_key("folder/newFolder"), however bucket is not set anywhere in your code, -> according to the error you are getting, it looks like a s3.Bucket object, which doesn't have the the new_key attribute defined.

Related

List all directories and secrets (recursively) in Vault

I'm writing a method in Python that takes in an engine name, and lists all of the sub directories and secrets in the directory. I've been playing around with hvac and I've been able to list all of the secrets within a specific directory using the following:
client = hvac.Client()
client = hvac.Client(
url=os.environ['VAULT_URL'],
token=os.environ['VAULT_TOKEN']
)
path='myDirectory'
mount_point='myEngine'
response=client.secrets.kv.read_secret_version(
path=path,
mount_point=mount_point
)
print(response['data']['data'])
As said, this does successfully work, and it does output whats inside the path specified, but if I want to list everything inside of the mount_point, I've found answers that say I should use list_secrets, but I can't seem to get list_secrets to work without specifying a path as well.
I've tried the following with very little success.
mount_point = 'myEngine'
list_response = client.secrets.kv.v2.list_secrets(
path=mount_point
)
list_folders = list_response['data']['keys']
print(list_folders)
This obviously doesn't work, but it does work if I give it both the path and the mount_point, but that gives me the contents of the path, and I need everything in the engine.
I know giving it the mount_point for the path seems odd, but I couldn't really think of what else to do to list out the entire mount_point and I'd seen old examples that looked similar to that.
Is there a way to get the output in JSON like I'm wanting, or even a way to just list everything inside of the engine itself (recursively) and then build the json myself would be great.
You can use the client.adapter.request() function to achieve what your looking for. Pass the mount name into the following call:
client.adapter.request("GET", "v1/<mount name>/metadata/?list=1")
That will return JSON that you can parse to get the keys your looking for. Sample JSON below:
{'request_id': '16fedd1b-50b3-f46a-44c7-22d0a0c8edef', 'lease_id': '', 'renewable': False, 'lease_duration': 0, 'data': {'keys': ['test', 'test2']}, 'wrap_info': None, 'warnings': None, 'auth': None}
The output above corresponds to the following view in the Vault UI:
Vault UI view of keys
You can use the following to recursively list all the secrets using hvac.
def list_secrets(client, path=None, mount_path=None):
list_response = client.secrets.kv.v2.list_secrets(
path=path,
mount_point=mount_path
)
return list_response
def list_all_secrets(client, global_path_list, path=None, mount_path=None):
if mount_path is None:
pass
if path is None:
try:
response = list_secrets(client, mount_path=mount_path)
except VaultError:
return
else:
response = list_secrets(client, mount_path=mount_path, path=path)
keys = response['data']['keys']
for key in keys:
if key[-1] == "/":
if path is not None:
_path = path + key
else:
_path = key
list_all_secrets(client, global_path_list, mount_path=mount_path, path=_path)
else:
_temp = path + key
if _temp is None:
pass
else:
global_path_list.append(_temp)
return global_path_list
global_path_list = []
secrets_list = list_all_secrets(client, global_path_list, 'myDirectory/', 'myEngine/')
print(secrets_list)

InvalidS3ObjectException when calling the AnalyzeDocument operation:

InvalidS3ObjectException when calling the AnalyzeDocument operation: Unable to get object metadata from S3. Check object key, region and/or access permissions."
I keep getting this error. Over. And. Over. This program worked with my test cases of what I'm bringing in, the json with a {"body":"imagename.jpg"}. But the very moment I try to utilize the actual code my JS brings in, I get this error. The thing that confuses me is that I've checked the regions and they are fine. I went into my account and created users with full access to all AWS and S3 features, and utilized those logins, I've used my root account, everything. All I'm trying to do is access an image from my s3 bucket. Why won't it work? Below is my code. It works if I utilize the test case I provided above, but the moment I try and use the website it's connected to, it doesn't work.
def main(event, context):
key_map, value_map, block_map = get_kv_map(event) #Take map variables in to get the key and value map we need.
It goes to this function...
def get_kv_map(event):
filePath = event
fileExt = filePath.get('body')
s3 = boto3.resource('s3', region_name='us-east-1')
bucket = s3.Bucket('react-images-ex')
obj = bucket.Object(bucket)
client = boto3.client('textract') #We utilize boto3's textract lib
response = client.analyze_document(Document={'S3Object': {'Bucket': 'react-images-ex', 'Name': fileExt}}, FeatureTypes=['FORMS'])
# Get the text blocks
blocks=response['Blocks'] #We make a blocks variable that will be the blocks we find in the document
# get key and value maps
key_map = {}
value_map = {}
block_map = {}
for block in blocks: #Traverse the blocks found in the document
block_id = block['Id'] #Set variable for blockId to the Id's found on that block location
block_map[block_id] = block #Make the block map at that ID be the block variable
if block['BlockType'] == "KEY_VALUE_SET": #if we see that the type of block we're on is a key and value set pair, we check if it's a key or not. If it's not a key, we know it's a value. We send it to the respective map.
if 'KEY' in block['EntityTypes']:
key_map[block_id] = block
else:
value_map[block_id] = block
return key_map, value_map, block_map #Return the maps we need after they're filled.
I have been told before this code is fine, and it should work. So why exactly is it that I get this error?
Based on the comments.
The issue with body was that it was json string, not actual json object.
The solution was to parse the string into json:
fileExt = json.loads(filePath.get('body'))
Try awscli to see if you can access the image in s3:
aws s3 ls s3://react-images-ex/<some-fileExt>
Either you are parsing the fileExt wrongly, or you don't have S3 permission to access the file. The awscli command will help to verify this.

How do I get the size of a boto3 Collection?

The way I have been using is to transform the Collection into a List and query the length:
s3 = boto3.resource('s3')
bucket = s3.Bucket('my_bucket')
size = len(list(bucket.objects.all()))
However, this forces resolution of the whole collection and obviates the benefits of using a Collection in the first place. Is there a better way to do this?
There is no way to get the count of keys in a bucket without listing all the objects this is a limitation of AWS S3 (see https://forums.aws.amazon.com/thread.jspa?messageID=164220).
Getting the Object Summaries (HEAD) doesn't get the actual data so should be a relatively inexpensive operation and if you are just discarding the list then you could do:
size = sum(1 for _ in bucket.objects.all())
Which will give you the number of objects without constructing a list.
Borrowing from a similar question, one option to retrieve the complete list of object keys from a bucket + prefix is to use recursion with the list_objects_v2 method.
This method will recursively retrieve the list of object keys, 1000 keys at a time.
Each request to list_objects_v2 uses the StartAfter argument to continue listing keys after the last key from the previous request.
import boto3
if __name__ == '__main__':
client = boto3.client('s3',
aws_access_key_id = 'access_key',
aws_secret_access_key = 'secret_key'
)
def get_all_object_keys(bucket, prefix, start_after = '', keys = []):
response = client.list_objects_v2(
Bucket = bucket,
Prefix = prefix,
StartAfter = start_after
)
if 'Contents' not in response:
return keys
key_list = response['Contents']
last_key = key_list[-1]['Key']
keys.extend(key_list)
return get_all_object_keys(bucket, prefix, last_key, keys)
object_keys = get_all_object_keys('your_bucket', 'prefix/to/files')
print(len(object_keys))
For my use case, I just needed to know whether the folder is empty or not.
s3 = boto3.client('s3')
response = s3.list_objects(
Bucket='your-bucket',
Prefix='path/to/your/folder/',
)
print(len(response['Contents']))
This was enough to know whether the folder is empty. Note that a folder, if manually created in the S3 console, can count as a resource itself. In this case, if the length shown above is greater than 1, then the S3 "folder" is not empty.

Unable to set file content type in S3

How do you set content type on a file in a webhosting-enabled S3 account via the Python boto module?
I'm doing:
from boto.s3.connection import S3Connection
from boto.s3.key import Key
from boto.cloudfront import CloudFrontConnection
conn = S3Connection(access_key_id, secret_access_key)
bucket = conn.create_bucket('mybucket')
b = conn.get_bucket(bucket)
b.set_acl('public-read')
fn = 'index.html'
template = '<html>blah</html>'
k = Key(b)
k.key = fn
k.set_contents_from_string(template)
k.set_acl('public-read')
k.set_metadata('Content-Type', 'text/html')
However, when I access it from http://mybucket.s3-website-us-east-1.amazonaws.com/index.html my browser prompts me to download the file instead of simply serving it as a webpage.
Looking at the metadata in the S3 Management console shows the Content-Type has actually been set to "application/octet-stream". If I manually change it in the console, I can access the page normally, but if I run my script again, it resets it back to the wrong content type.
What am I doing wrong?
The set_metadata method is really for setting user metadata on S3 objects. Many of the standard HTTP metadata fields have first class attributes to represent them, e.g. content_type. Also, you want to set the metadata before you actually send the object to S3. Something like this should work:
import boto
conn = boto.connect_s3()
bucket = conn.get_bucket('mybucket') # Assumes bucket already exists
key = bucket.new_key('mykey')
key.content_type = 'text/html'
key.set_contents_from_string(mystring, policy='public-read')
Note that you can set canned ACL policies at the time you write the object to S3 which saves having to make another API call.
For people who need one-liner for this,
import boto3
s3 = boto3.resource('s3')
s3.Bucket('bucketName').put_object(Key='keyName', Body='content or fileData', ContentType='contentType', ACL='check below')
Supported ACL values:
'private'|'public-read'|'public-read-write'|'authenticated-read'|'aws-exec-read'|'bucket-owner-read'|'bucket-owner-full-control'
Arguments supported by put_object can be found here, https://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.put_object
I wasn't able to get the above solution to actually persist my metadata changes.
Perhaps because I was using a file and it was resetting the content type using mimetype? Also I am uploading m3u8 and ts files for HLS encoding so that could interfere as well.
Anyway, here's what worked for me.
import boto
conn = boto.connect_s3()
bucket = conn.get_bucket('mybucket')
key_m3u8 = Key(bucket_handle)
key_m3u8.key = s3folder+"/"+s3keyname
key_m3u8.metadata = {"Content-Type":"application/x-mpegURL","Cache-Control":"public,max-age=8"}
key_m3u8.set_contents_from_filename("path_to_my_file", policy="public-read")
If you use AWS S3 Bitbucket Pipelines Python add the parameter content_type:
s3_upload.py
def upload_to_s3(bucket, artefact, bucket_key, content_type):
...
def main():
...
parser.add_argument("content_type", help="Content Type File")
...
if not upload_to_s3(args.bucket, args.artefact, args.bucket_key, args.content_type):
then modify bitbucket-pipelines.yml as follow:
...
- python s3_upload.py bucket_name file key content_type
...
Where content_type param can be one of the following: MIME types (IANA media types)

Amazon S3 boto - how to delete folder?

I created a folder in s3 named "test" and I pushed "test_1.jpg", "test_2.jpg" into "test".
How can I use boto to delete folder "test"?
Here is 2018 (almost 2019) version:
s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket')
bucket.objects.filter(Prefix="myprefix/").delete()
There are no folders in S3. Instead, the keys form a flat namespace. However a key with slashes in its name shows specially in some programs, including the AWS console (see for example Amazon S3 boto - how to create a folder?).
Instead of deleting "a directory", you can (and have to) list files by prefix and delete. In essence:
for key in bucket.list(prefix='your/directory/'):
key.delete()
However the other accomplished answers on this page feature more efficient approaches.
Notice that the prefix is just searched using dummy string search. If the prefix were your/directory, that is, without the trailing slash appended, the program would also happily delete your/directory-that-you-wanted-to-remove-is-definitely-not-t‌​his-one.
For more information, see S3 boto list keys sometimes returns directory key.
I feel that it's been a while and boto3 has a few different ways of accomplishing this goal. This assumes you want to delete the test "folder" and all of its objects Here is one way:
s3 = boto3.resource('s3')
objects_to_delete = s3.meta.client.list_objects(Bucket="MyBucket", Prefix="myfolder/test/")
delete_keys = {'Objects' : []}
delete_keys['Objects'] = [{'Key' : k} for k in [obj['Key'] for obj in objects_to_delete.get('Contents', [])]]
s3.meta.client.delete_objects(Bucket="MyBucket", Delete=delete_keys)
This should make two requests, one to fetch the objects in the folder, the second to delete all objects in said folder.
https://boto3.readthedocs.org/en/latest/reference/services/s3.html#S3.Client.delete_objects
A slight improvement on Patrick's solution. As you might know, both list_objects() and delete_objects() have an object limit of 1000. This is why you have to paginate listing and delete in chunks. This is pretty universal and you can give Prefix to paginator.paginate() to delete subdirectories/paths
client = boto3.client('s3', **credentials)
paginator = client.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket=self.bucket_name)
delete_us = dict(Objects=[])
for item in pages.search('Contents'):
delete_us['Objects'].append(dict(Key=item['Key']))
# flush once aws limit reached
if len(delete_us['Objects']) >= 1000:
client.delete_objects(Bucket=bucket, Delete=delete_us)
delete_us = dict(Objects=[])
# flush rest
if len(delete_us['Objects']):
client.delete_objects(Bucket=bucket, Delete=delete_us)
You can use bucket.delete_keys() with a list of keys (with a large number of keys I found this to be an order of magnitude faster than using key.delete).
Something like this:
delete_key_list = []
for key in bucket.list(prefix='/your/directory/'):
delete_key_list.append(key)
if len(delete_key_list) > 100:
bucket.delete_keys(delete_key_list)
delete_key_list = []
if len(delete_key_list) > 0:
bucket.delete_keys(delete_key_list)
If versioning is enabled on the S3 bucket:
s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket')
bucket.object_versions.filter(Prefix="myprefix/").delete()
If one needs to filter by object contents like I did, the following is a blueprint for your logic:
def get_s3_objects_batches(s3: S3Client, **base_kwargs):
kwargs = dict(MaxKeys=1000, **base_kwargs)
while True:
response = s3.list_objects_v2(**kwargs)
# to yield each and every file: yield from response.get('Contents', [])
yield response.get('Contents', [])
if not response.get('IsTruncated'): # At the end of the list?
break
continuation_token = response.get('NextContinuationToken')
kwargs['ContinuationToken'] = continuation_token
def your_filter(b):
raise NotImplementedError()
session = boto3.session.Session(profile_name=profile_name)
s3client = session.client('s3')
for batch in get_s3_objects_batches(s3client, Bucket=bucket_name, Prefix=prefix):
to_delete = [{'Key': obj['Key']} for obj in batch if your_filter(obj)]
if to_delete:
s3client.delete_objects(Bucket=bucket_name, Delete={'Objects': to_delete})
#Deleting a Files Inside Folder S3 using boto3#
def delete_from_minio():
"""
This function is used to delete files or folder inside the another Folder
"""
try:
logger.info("Deleting from minio")
aws_access_key_id='Your_aws_acess_key'
aws_secret_access_key = 'Your_aws_Secret_key'
host = 'your_aws_endpoint'
s3 = boto3.resource('s3', aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key ,
config=boto3.session.Config(signature_version='your_version'),
region_name="your_region",
endpoint_url=host ,
verify=False)
bucket = s3.Bucket('Your_bucket_name')
for obj in bucket.objects.filter(Prefix='Directory/Sub_Directory'):
s3.Object(bucket.name, obj.key).delete()
except Exception as e:
print(f"Error Occurred while deleting from the S3,{str(e)}")
Hope this Helps :)

Categories

Resources