InvalidS3ObjectException when calling the AnalyzeDocument operation: Unable to get object metadata from S3. Check object key, region and/or access permissions."
I keep getting this error. Over. And. Over. This program worked with my test cases of what I'm bringing in, the json with a {"body":"imagename.jpg"}. But the very moment I try to utilize the actual code my JS brings in, I get this error. The thing that confuses me is that I've checked the regions and they are fine. I went into my account and created users with full access to all AWS and S3 features, and utilized those logins, I've used my root account, everything. All I'm trying to do is access an image from my s3 bucket. Why won't it work? Below is my code. It works if I utilize the test case I provided above, but the moment I try and use the website it's connected to, it doesn't work.
def main(event, context):
key_map, value_map, block_map = get_kv_map(event) #Take map variables in to get the key and value map we need.
It goes to this function...
def get_kv_map(event):
filePath = event
fileExt = filePath.get('body')
s3 = boto3.resource('s3', region_name='us-east-1')
bucket = s3.Bucket('react-images-ex')
obj = bucket.Object(bucket)
client = boto3.client('textract') #We utilize boto3's textract lib
response = client.analyze_document(Document={'S3Object': {'Bucket': 'react-images-ex', 'Name': fileExt}}, FeatureTypes=['FORMS'])
# Get the text blocks
blocks=response['Blocks'] #We make a blocks variable that will be the blocks we find in the document
# get key and value maps
key_map = {}
value_map = {}
block_map = {}
for block in blocks: #Traverse the blocks found in the document
block_id = block['Id'] #Set variable for blockId to the Id's found on that block location
block_map[block_id] = block #Make the block map at that ID be the block variable
if block['BlockType'] == "KEY_VALUE_SET": #if we see that the type of block we're on is a key and value set pair, we check if it's a key or not. If it's not a key, we know it's a value. We send it to the respective map.
if 'KEY' in block['EntityTypes']:
key_map[block_id] = block
else:
value_map[block_id] = block
return key_map, value_map, block_map #Return the maps we need after they're filled.
I have been told before this code is fine, and it should work. So why exactly is it that I get this error?
Based on the comments.
The issue with body was that it was json string, not actual json object.
The solution was to parse the string into json:
fileExt = json.loads(filePath.get('body'))
Try awscli to see if you can access the image in s3:
aws s3 ls s3://react-images-ex/<some-fileExt>
Either you are parsing the fileExt wrongly, or you don't have S3 permission to access the file. The awscli command will help to verify this.
Related
I have a lambda function that is utilized to grab a user uploaded file via a react webapp, submit it to an s3 bucket, then a lambda function using python that grabs said image from the bucket, translates it, and submits a translated version of that file back into the bucket.
The issue is, this program is only working on the test cases/in theory. To extrapolate, I bring in the file name as the "event" for the lambda function, and carry it to various other functions like so:
def get_kv_map(event):
filePath = event
fileExt = filePath.get('body')
s3 = boto3.resource('s3')
bucket = s3.Bucket('myBucket')
obj = bucket.Object(bucket)
client = boto3.client('textract') #We utilize boto3's textract
response = client.analyze_document(Document={'S3Object': {'Bucket': 'myBucket', 'Name': fileExt}}, FeatureTypes=['FORMS'])
# Get the text blocks
blocks=response['Blocks'] #We make a blocks variable that will be the blocks we find in the document
# get key and value maps
key_map = {}
value_map = {}
block_map = {}
for block in blocks: #Traverse the blocks found in the document
block_id = block['Id'] #Set variable for blockId to the Id's found on that block location
block_map[block_id] = block #Make the block map at that ID be the block variable
if block['BlockType'] == "KEY_VALUE_SET": #if we see that the type of block we're on is a key and value set pair, we check if it's a key or not. If it's not a key, we know it's a value. We send it to the respective map.
if 'KEY' in block['EntityTypes']:
key_map[block_id] = block
else:
value_map[block_id] = block
return key_map, value_map, block_map #######LINE WITH ERROR ######
Why is this line the one causing an error, and more importantly, why is it only happening on the website? When I use the program in cloud9 with my test cases, everything is perfectly fine. But the moment I try and have the website submit the data for the function to "do its work" it seems like it halts right after it uploads the file and gets to this line of all lines. I've tried checking tabs/spaces and everything. I'm very perplexed.
Thank you for any help, I've been pulling my hair out today.
So, I tried to analyze what I'm bringing (event) and what it contains. It seems that it's a permission error, but I cannot understand why. I gave myself full access to various features in AWS. Below is the error code I get when I utilize the test case as my event I see being brought in.
"errorType": "InvalidS3ObjectException",
"errorMessage": "An error occurred (InvalidS3ObjectException) when calling the AnalyzeDocument operation: Unable to get object metadata from S3. Check object key, region and/or access permissions.",
"stackTrace": [
" File \"/var/task/scrapeShow/lambda_function.py\", line 133, in main\n key_map, value_map, block_map = get_kv_map(event) #Take map variables in to get the key and value map we need.\n",
" File \"/var/task/scrapeShow/lambda_function.py\", line 39, in get_kv_map\n response = client.analyze_document(Document={'S3Object': {'Bucket': 'myBucket', 'Name': fileExt}}, FeatureTypes=['FORMS'])\n",
" File \"/var/runtime/botocore/client.py\", line 316, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File \"/var/runtime/botocore/client.py\", line 626, in _make_api_call\n raise error_class(parsed_response, operation_name)\n"
]}
I want to save a csv file ("test.csv") in S3 using boto3.
my bucket is "outputS3Bucket" and the key is "folder/newFolder".
I want to check if "newFolder" exists and if not to create it.
import boto3
client = boto3.client('s3')
s3 = boto3.resource('s3')
bucket = s3.Bucket("outputS3Bucket")
result = client.list_objects(Bucket='outputS3Bucket',Prefix="folder/newFolder")
if len(result)==0:
key = bucket.new_key("folder/newFolder")
newKey = key + "/" + "test.csv"
client.put_object(Bucket="outputS3Bucket", Key=newKey, Body=content)
# put_object path: 's3://outputS3Bucket/folder/newFolder/test.csv'
I have few problems:
if I don't write the full key name (such as "folder/ne") and there is a "neaFo" folder instead it still says it exists.
key = bucket.new_key("folder/newFolder")
AttributeError: 's3.Bucket' object has no attribute 'new_key'
Firstly, according to boto3 documentation, it's preferred to use the new API method - list_objects_v2() instead to list a bucket's objects.
I suggest using a simple boolean function to check whether a folder exist (makes your code cleaner and more readable).
for question 1, you can check if the prefix ends with '/' character and append it if not, - this will make sure your are looking for EXACT match and not Starts With.
Sample Function:
def bucket_folder_exists(client, bucket, path_prefix):
# make path_prefix exact match and not path/to/folder*
if list(path_prefix)[-1] is not '/':
path_prefix += '/'
# check if 'Contents' key exist in response dict - if it exist it indicate the folder exists, otherwise response will be None
response = client.list_objects_v2(Bucket=bucket, Prefix=path_prefix).get('Contents')
if response:
return True
return False
Sample Implementation:
if bucket_folder_exists(client, 'outputS3Bucket', 'folder/newFolder'):
pass # Do something if folder already exist
else:
pass # Do something if folder does not exist
Regarding your second question, I added a comment - it seems your code mentions bucket variable\object used as key = bucket.new_key("folder/newFolder"), however bucket is not set anywhere in your code, -> according to the error you are getting, it looks like a s3.Bucket object, which doesn't have the the new_key attribute defined.
I am trying to get a policy from boto3 client but there is no method to do so using policy name. By wrapping the create_policy method in a try-except block i can check whether a policy exists or not. Is there any way to get a policy-arn by name using boto3 except for listing all policies and iterating over it.
The ARN should be deterministic given the prefix (if any, and the name).
iam = session.client('iam')
sts = session.client('sts')
# Slow and costly if you have many pages
paginator = iam.get_paginator('list_policies')
all_policies = [policy for page in paginator.paginate() for policy in page['Policies']]
[policy_1] = [p for p in all_policies if p['PolicyName'] == policy_name]
# Fast and direct
account_id = sts.get_caller_identity()['Account']
policy_arn = f'arn:aws:iam::{account_id}:policy/{policy_name}'
policy_2 = iam.get_policy(PolicyArn=policy_arn)['Policy']
# They're equal except with the direct method you'll also get description field
all(policy_1[k] == policy_2[k] for k in policy_1.keys() & policy_2.keys())
You will need to iterate over the policies to get policy names. I am not aware of a get-policy type api that uses policy names only policy ARNs.
Is there a reason that you do not want to get a list of policies? Other than to not download the list.
The way I have been using is to transform the Collection into a List and query the length:
s3 = boto3.resource('s3')
bucket = s3.Bucket('my_bucket')
size = len(list(bucket.objects.all()))
However, this forces resolution of the whole collection and obviates the benefits of using a Collection in the first place. Is there a better way to do this?
There is no way to get the count of keys in a bucket without listing all the objects this is a limitation of AWS S3 (see https://forums.aws.amazon.com/thread.jspa?messageID=164220).
Getting the Object Summaries (HEAD) doesn't get the actual data so should be a relatively inexpensive operation and if you are just discarding the list then you could do:
size = sum(1 for _ in bucket.objects.all())
Which will give you the number of objects without constructing a list.
Borrowing from a similar question, one option to retrieve the complete list of object keys from a bucket + prefix is to use recursion with the list_objects_v2 method.
This method will recursively retrieve the list of object keys, 1000 keys at a time.
Each request to list_objects_v2 uses the StartAfter argument to continue listing keys after the last key from the previous request.
import boto3
if __name__ == '__main__':
client = boto3.client('s3',
aws_access_key_id = 'access_key',
aws_secret_access_key = 'secret_key'
)
def get_all_object_keys(bucket, prefix, start_after = '', keys = []):
response = client.list_objects_v2(
Bucket = bucket,
Prefix = prefix,
StartAfter = start_after
)
if 'Contents' not in response:
return keys
key_list = response['Contents']
last_key = key_list[-1]['Key']
keys.extend(key_list)
return get_all_object_keys(bucket, prefix, last_key, keys)
object_keys = get_all_object_keys('your_bucket', 'prefix/to/files')
print(len(object_keys))
For my use case, I just needed to know whether the folder is empty or not.
s3 = boto3.client('s3')
response = s3.list_objects(
Bucket='your-bucket',
Prefix='path/to/your/folder/',
)
print(len(response['Contents']))
This was enough to know whether the folder is empty. Note that a folder, if manually created in the S3 console, can count as a resource itself. In this case, if the length shown above is greater than 1, then the S3 "folder" is not empty.
I created a folder in s3 named "test" and I pushed "test_1.jpg", "test_2.jpg" into "test".
How can I use boto to delete folder "test"?
Here is 2018 (almost 2019) version:
s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket')
bucket.objects.filter(Prefix="myprefix/").delete()
There are no folders in S3. Instead, the keys form a flat namespace. However a key with slashes in its name shows specially in some programs, including the AWS console (see for example Amazon S3 boto - how to create a folder?).
Instead of deleting "a directory", you can (and have to) list files by prefix and delete. In essence:
for key in bucket.list(prefix='your/directory/'):
key.delete()
However the other accomplished answers on this page feature more efficient approaches.
Notice that the prefix is just searched using dummy string search. If the prefix were your/directory, that is, without the trailing slash appended, the program would also happily delete your/directory-that-you-wanted-to-remove-is-definitely-not-this-one.
For more information, see S3 boto list keys sometimes returns directory key.
I feel that it's been a while and boto3 has a few different ways of accomplishing this goal. This assumes you want to delete the test "folder" and all of its objects Here is one way:
s3 = boto3.resource('s3')
objects_to_delete = s3.meta.client.list_objects(Bucket="MyBucket", Prefix="myfolder/test/")
delete_keys = {'Objects' : []}
delete_keys['Objects'] = [{'Key' : k} for k in [obj['Key'] for obj in objects_to_delete.get('Contents', [])]]
s3.meta.client.delete_objects(Bucket="MyBucket", Delete=delete_keys)
This should make two requests, one to fetch the objects in the folder, the second to delete all objects in said folder.
https://boto3.readthedocs.org/en/latest/reference/services/s3.html#S3.Client.delete_objects
A slight improvement on Patrick's solution. As you might know, both list_objects() and delete_objects() have an object limit of 1000. This is why you have to paginate listing and delete in chunks. This is pretty universal and you can give Prefix to paginator.paginate() to delete subdirectories/paths
client = boto3.client('s3', **credentials)
paginator = client.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket=self.bucket_name)
delete_us = dict(Objects=[])
for item in pages.search('Contents'):
delete_us['Objects'].append(dict(Key=item['Key']))
# flush once aws limit reached
if len(delete_us['Objects']) >= 1000:
client.delete_objects(Bucket=bucket, Delete=delete_us)
delete_us = dict(Objects=[])
# flush rest
if len(delete_us['Objects']):
client.delete_objects(Bucket=bucket, Delete=delete_us)
You can use bucket.delete_keys() with a list of keys (with a large number of keys I found this to be an order of magnitude faster than using key.delete).
Something like this:
delete_key_list = []
for key in bucket.list(prefix='/your/directory/'):
delete_key_list.append(key)
if len(delete_key_list) > 100:
bucket.delete_keys(delete_key_list)
delete_key_list = []
if len(delete_key_list) > 0:
bucket.delete_keys(delete_key_list)
If versioning is enabled on the S3 bucket:
s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket')
bucket.object_versions.filter(Prefix="myprefix/").delete()
If one needs to filter by object contents like I did, the following is a blueprint for your logic:
def get_s3_objects_batches(s3: S3Client, **base_kwargs):
kwargs = dict(MaxKeys=1000, **base_kwargs)
while True:
response = s3.list_objects_v2(**kwargs)
# to yield each and every file: yield from response.get('Contents', [])
yield response.get('Contents', [])
if not response.get('IsTruncated'): # At the end of the list?
break
continuation_token = response.get('NextContinuationToken')
kwargs['ContinuationToken'] = continuation_token
def your_filter(b):
raise NotImplementedError()
session = boto3.session.Session(profile_name=profile_name)
s3client = session.client('s3')
for batch in get_s3_objects_batches(s3client, Bucket=bucket_name, Prefix=prefix):
to_delete = [{'Key': obj['Key']} for obj in batch if your_filter(obj)]
if to_delete:
s3client.delete_objects(Bucket=bucket_name, Delete={'Objects': to_delete})
#Deleting a Files Inside Folder S3 using boto3#
def delete_from_minio():
"""
This function is used to delete files or folder inside the another Folder
"""
try:
logger.info("Deleting from minio")
aws_access_key_id='Your_aws_acess_key'
aws_secret_access_key = 'Your_aws_Secret_key'
host = 'your_aws_endpoint'
s3 = boto3.resource('s3', aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key ,
config=boto3.session.Config(signature_version='your_version'),
region_name="your_region",
endpoint_url=host ,
verify=False)
bucket = s3.Bucket('Your_bucket_name')
for obj in bucket.objects.filter(Prefix='Directory/Sub_Directory'):
s3.Object(bucket.name, obj.key).delete()
except Exception as e:
print(f"Error Occurred while deleting from the S3,{str(e)}")
Hope this Helps :)