InvalidS3ObjectException: Unable to get object metadata from S3? - python

So I am trying to use Amazon Textract to read in multiple pdf files, with multiple pages using the StartDocumentTextDetection method as follows:
client = boto3.client('textract')
textract_bucket = s3.Bucket('my_textract_console-us-east-2')
for s3_file in textract_bucket.objects.all():
print(s3_file)
response = client.start_document_text_detection(
DocumentLocation = {
"S3Object": {
"Bucket": "my_textract_console_us-east-2",
"Name": s3_file.key,
}
},
ClientRequestToken=str(random.randint(1,1e10)))
print(response)
break
When just trying to retrieve the response object from s3, I'm able to see it printed out as:
s3.ObjectSummary(bucket_name='my_textract_console-us-east-2', key='C:\\Users\\My_User\\Documents\\Folder\\Sub_Folder\\Sub_sub_folder\\filename.PDF')
Correspondingly, I'm using that s3_file.key to access the object later. But I'm getting the following error that I can't figure out:
InvalidS3ObjectException: An error occurred (InvalidS3ObjectException) when calling the StartDocumentTextDetection operation: Unable to get object metadata from S3. Check object key, region and/or access permissions.
So far I have:
Checked the region from boto3 session, both the bucket and aws configurations settings are set to us-east-2.
Key cannot be wrong, I'm passing it directly from the object response
Permissions wise, I checked the IAM console, and have it set to AmazonS3FullAccess and AmazonTextractFullAccess.
What could be going wrong here?
[EDIT] I did rename the files so that they didn't have \\, but seems like it's still not working, that's odd..

I ran into the same issue and solved it by specifying a region in extract client. In my case I used us-east2
client = boto3.client('textract', region_name='us-east-2')
The clue to do so came from this issue: https://github.com/aws/aws-sdk-js/issues/2714

Related

Boto3 (SignatureDoesNotMatch) when calling the GetObject operation: Unknown

I'm uploading and delivering my files to an Object Storage (using Contabo as provider). Uploading works.
The problem is that I can't figure out how to A: access files without setting them to "Public" from inside my Object Storage and B: protect the uploaded files from being accessed by unauthorized users when displaying them on my webpage.
The idea is that we save the link to the file in the Object Storage inside our database. When someone wants to see the file in question, they get back the link from our database and can view it if they're logged in and have authorization. When I want to access the file via the link itself I only get an "Unauthorized" JSON response back. When trying to access the file via the boto3 get_object operation I get back the following error:
An error occurred (SignatureDoesNotMatch) when calling the GetObject operation: Unknown
The code trying to get the desired object is as follows:
client = boto3.client(
"s3",
region_name=settings.OBJECT_STORAGE_REGION_NAME,
endpoint_url=settings.OBJECT_STORAGE_ENDPOINT_URL,
aws_access_key_id=settings.AWS_ACCESS_KEY,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY
)
#https://eu2.contabostorage.com/bucket/videos/file.mp4
link = "/videos/file.mp4"
response = client.get_object(Bucket="bucket", Key=link)
data = response["Body"].read()
print(data)

The AWS Access Key Id you provided does not exist in our records. AWS

I want to upload files to the cloud storage in Wasabi, but I can't. This error comes out:
An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.
I checked the key several times, everything is correct. The strange thing is that before that I tried to create a new basket and everything worked out for me, but I can't upload the files.
import boto3
s3 = boto3.client('s3',
endpoint_url='https://s3.wasabisys.com',
aws_access_key_id="********R2PN",
aws_secret_access_key="*************zDKnnWS")
file_path = r"C:\Users\Asus\Desktop\Programming\rofls_with_node\tracks.txt"
bucket_name = "last-fm9"
key_name = "tracks.txt"
s3.put_object(Body=file_path, Bucket=bucket_name, Key=key_name)
That's it, I solved the problem, I just had to change endpoint_url to "https://s3.us-east-2.wasabisys.com" (instead of us-east-2, insert the region of your basket). Thanl

Move all files in s3 bucket from s3 account to another using boto3

I'm trying to move the contents of a bucket from account-a to a bucket in account-b which I already have the credentials for both of them.
Here's the code I'm currently using:
import boto3
SRC_AWS_KEY = 'src-key'
SRC_AWS_SECRET = 'src-secret'
DST_AWS_KEY = 'dst-key'
DST_AWS_SECRET = 'dst-secret'
srcSession = boto3.session.Session(
aws_access_key_id=SRC_AWS_KEY,
aws_secret_access_key=SRC_AWS_SECRET
)
dstSession = boto3.session.Session(
aws_access_key_id=DST_AWS_KEY,
aws_secret_access_key=DST_AWS_SECRET
)
copySource = {
'Bucket': 'src-bucket',
'Key': 'test-bulk-src'
}
srcS3 = srcSession.resource('s3')
dstS3 = dstSession.resource('s3')
dstS3.meta.client.copy(CopySource=copySource, Bucket='dst-bucket', Key='test-bulk-dst', SourceClient=srcS3.meta.client)
print('success')
The problem is that when I specify a file's name in the field Key followed by /file.csv it works really fine, but when I set it to copy the whole folder, as showed in the code, it fails and throws this exception:
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
What I need to do is to move the contents in one call, not by iterating through the contents of the src-folder, because this is time/money consuming, as I may have thousands of files to be moved.
There is no API call in Amazon S3 to copy folders. (Folders do not actually exist — the Key of each object includes its full path.)
You will need to iterate through each object and copy it.
The AWS CLI (written in Python) provides some higher-level commands that will do this iteration for you:
aws s3 cp --recursive s3://source-bucket/folder/ s3://destination-bucket/folder/
If the buckets are in different accounts, I would recommend:
Use a set of credentials for the destination account (avoids problems with object ownership)
Modify the bucket policy on the source bucket to permit access by the credentials from the destination account (avoids the need to use two sets of credentials)

Unable to validate access credentials when creating/using presigned URLs in boto3

I'm using boto3 to copy encrypted EBS snapshots from one region to another, but I've been getting Invalid presigned URL messages when I try to copy. I'm generating the presigned URL using the boto3 client method generate_presigned_url in the source region and calling the copy function in the destination region like so:
uw2_client = non_prod.client(
'ec2',
region_name="us-west-2",
config=Config(signature_version='s3v4')
)
presigned_url = uw2_client.generate_presigned_url(
ClientMethod='copy_snapshot',
Params={
'SourceSnapshotId': og_snapshot_id, # Original snapshot ID
'SourceRegion': 'us-west-2',
'DestinationRegion': 'us-east-1'
# I also tried include all parameters from copy_snapshot.
# It didn't make a difference.
# 'Description': desc,
# 'KmsKeyId': 'alias/xva-nonprod-all-amicopykey',
# 'Encrypted': True,
}
)
Here's my code to create the copy.
ue1_client = non_prod.client(
'ec2',
region_name="us-east-1",
config=Config(signature_version='s3v4')
)
response = ue1_client.copy_snapshot(
Description=desc,
KmsKeyId='alias/xva-nonprod-all-amicopykey', # Exists in us-east-1
Encrypted=True,
SourceSnapshotId=og_snapshot_id,
SourceRegion='us-west-2',
DestinationRegion='us-east-1',
PresignedUrl=pre_signed_url
)
It successfully returns the presigned URL. But if I attempt to use that presigned URL to copy a snapshot, I get the invalid URL error. If I try to validate the url:
r = requests.post(presigned_url)
print(r.status_code)
print(r.text)
I get:
<Response>
<Errors>
<Error>
<Code>AuthFailure</Code>
<Message>AWS was not able to validate the provided access credentials</Message>
</Error>
</Errors>
<RequestID>3189bb5b-54c9-4d11-ab4c-762cbea32d9a</RequestID>
</Response>
You'd think that it would an issue with my credentials, but I'm not sure how... It's the same credentials I'm using to create the pre-signed URL. And my IAM user has unfettered access to EC2.
I'm obviously doing something wrong here, but I cannot figure out what it is. Any insight would be appreciated.
EDIT
Just to confirm that it's not a permissions issue, I tried this with my personal account which has access to everything. Still getting the same error message.
As it turns out, the documentation is wrong... A signed URL is NOT required when copying encrypted snapshots within the same account (according to AWS Support).
From AWS Support:
... it's not actually necessary to create the pre-signed URL in order to copy encrypted snapshot from one region to another (within the same AWS account).
However, according to their documentation, it's not possible to copy encrypted snapshots to another account either... ¯\_(ツ)_/¯
The current boto3.EC2.Client.copy_snapshot function documentation says:
PresignedUrl (string) --
When you copy an encrypted source snapshot using the Amazon EC2 Query API, you must supply a pre-signed URL. This parameter is optional for unencrypted snapshots.
Instead, it can simply be accomplished by creating the client object in the destination region and calling the copy_snapshot() method like so:
try:
ec2 = boto3.client(
service_name='ec2',
region_name='us-east-1'
)
ec2.copy_snapshot(
SourceSnapshotId='snap-xxxxxxxxxxxx',
SourceRegion='us-west-2',
Encrypted=True,
KmsKeyId='DestinationRegionKeyId'
)
except Exception as e:
print(e)

Twython - How to update status with media url

In my app, I let users to post to twitter. Now i would like to let them update their status with media.
In twython.py i see a method update_status_with_media that reads the image from filesystem and uploads to twitter. My images are not in filesystem but on S3 bucket.
How to make this work with S3 bucket urls?
Passing the url in file_ variable, fails on IO Error, no such file or directory.
Passing StringIO fails on UnicodeDecode Error.
Passing urllib.urlopen(url).read() gives file() argument 1 must be encoded string without NULL bytes, not str.
I also tried using post method and got 403 Forbidden from twitter api, Error creating status.
Just Solved it
Bah, just got it to work, finally! Maybe it will help someone else to save a few hours it cost me.
twitter = Twython(
app_key=settings.TWITTER_CONSUMER_KEY, app_secret=settings.TWITTER_CONSUMER_SECRET,
oauth_token=token.token, oauth_token_secret=token.secret
)
img = requests.get(url=image_obj.url).content
tweet = twitter.post('statuses/update_with_media',
params={'status': msg},
files={'media': (image_obj.url,
BytesIO(img))})
Glad to see you found an answer! There's a similar problem that we handled recently in a repo issue - basically, you can do the following with StringIO and passing it directly to twitter.post like you did:
from StringIO import StringIO
from twython import Twython
t = Twython(...)
img = open('img_url').read()
t.post('/statuses/update_with_media', params = {'status': 'Testing New Status'}, files = {
'media': StringIO(img)
# 'media': ('OrThisIfYouWantToNameTheFile.lol', StringIO(img))
})
This isn't a direct answer to your question, so I'm not expecting any vote or anything, but figured it's seemingly useful to some people and somewhat related so I'd drop a note.

Categories

Resources