Boto3 (SignatureDoesNotMatch) when calling the GetObject operation: Unknown - python

I'm uploading and delivering my files to an Object Storage (using Contabo as provider). Uploading works.
The problem is that I can't figure out how to A: access files without setting them to "Public" from inside my Object Storage and B: protect the uploaded files from being accessed by unauthorized users when displaying them on my webpage.
The idea is that we save the link to the file in the Object Storage inside our database. When someone wants to see the file in question, they get back the link from our database and can view it if they're logged in and have authorization. When I want to access the file via the link itself I only get an "Unauthorized" JSON response back. When trying to access the file via the boto3 get_object operation I get back the following error:
An error occurred (SignatureDoesNotMatch) when calling the GetObject operation: Unknown
The code trying to get the desired object is as follows:
client = boto3.client(
"s3",
region_name=settings.OBJECT_STORAGE_REGION_NAME,
endpoint_url=settings.OBJECT_STORAGE_ENDPOINT_URL,
aws_access_key_id=settings.AWS_ACCESS_KEY,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY
)
#https://eu2.contabostorage.com/bucket/videos/file.mp4
link = "/videos/file.mp4"
response = client.get_object(Bucket="bucket", Key=link)
data = response["Body"].read()
print(data)

Related

Download Blob From Blob Storage Using Python

I am trying to download an excel file on a blob. However, it keeps generating the error "The specified blob does not exist". This error happens at blob_client.download_blob() although I can get the blob_client. Any idea why or other ways I can connect using managed identity?
default_credential = DefaultAzureCredential()
blob_url = BlobServiceClient('url', credential = default_credential)
container_client = blob_url.get_container_client('xx-xx-data')
blob_client = container_client.get_blob_client('TEST.xlsx')
downloaded_blob = blob_client.download_blob()
df=pd.read_excel(downloaded_blob.content_as_bytes(), sheet_name='Test',skiprows=2)
Turns out that I have to also provide 'Reader' access on top of 'Storage Blob Data Contributor' to be able to identify the blob. There was no need for SAS URL.
The reason you're getting this error is because each request to Azure Blob Storage must be an authenticated request. Only exception to this is when you're reading (downloading) a blob from a public blob container. In all likelihood, the blob container holding this blob is having a Private ACL and since you're sending an unauthenticated request, you're getting this error.
I would recommend using a Shared Access Signature (SAS) URL for the blob with Read permission instead of simple blob URL. Since a SAS URL has authorization information embedded in the URL itself (sig portion), you should be able to download the blob provided SAS is valid and has not expired.
Please see this for more information on Shared Access Signature: https://learn.microsoft.com/en-us/rest/api/storageservices/delegate-access-with-shared-access-signature.

InvalidS3ObjectException: Unable to get object metadata from S3?

So I am trying to use Amazon Textract to read in multiple pdf files, with multiple pages using the StartDocumentTextDetection method as follows:
client = boto3.client('textract')
textract_bucket = s3.Bucket('my_textract_console-us-east-2')
for s3_file in textract_bucket.objects.all():
print(s3_file)
response = client.start_document_text_detection(
DocumentLocation = {
"S3Object": {
"Bucket": "my_textract_console_us-east-2",
"Name": s3_file.key,
}
},
ClientRequestToken=str(random.randint(1,1e10)))
print(response)
break
When just trying to retrieve the response object from s3, I'm able to see it printed out as:
s3.ObjectSummary(bucket_name='my_textract_console-us-east-2', key='C:\\Users\\My_User\\Documents\\Folder\\Sub_Folder\\Sub_sub_folder\\filename.PDF')
Correspondingly, I'm using that s3_file.key to access the object later. But I'm getting the following error that I can't figure out:
InvalidS3ObjectException: An error occurred (InvalidS3ObjectException) when calling the StartDocumentTextDetection operation: Unable to get object metadata from S3. Check object key, region and/or access permissions.
So far I have:
Checked the region from boto3 session, both the bucket and aws configurations settings are set to us-east-2.
Key cannot be wrong, I'm passing it directly from the object response
Permissions wise, I checked the IAM console, and have it set to AmazonS3FullAccess and AmazonTextractFullAccess.
What could be going wrong here?
[EDIT] I did rename the files so that they didn't have \\, but seems like it's still not working, that's odd..
I ran into the same issue and solved it by specifying a region in extract client. In my case I used us-east2
client = boto3.client('textract', region_name='us-east-2')
The clue to do so came from this issue: https://github.com/aws/aws-sdk-js/issues/2714

Trouble downloading S3 bucket objects through boto3. Error 403 HeadObject: Forbidden

I'm aware there are other threads on here about this issue but am still struggling to find the right solution. I am attempting to download a set of specific objects within an S3 bucket (that I do have access to) using the following python script. When running the script, the first object successfully downloads but then this error (403) is thrown:
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
See below my code:
import csv
import boto3
import re
import logging
from botocore.exceptions import ClientError
prod_number_array_bq = []
prod_number_array_s3 = []
with open('bq-results-20191218-151637-rshujisvqrri.csv') as csv_file:
csv_reader = csv.reader(csv_file,delimiter=',')
line_count = 0
for row in csv_reader:
sliced = re.sub("[^0-9]", "", str(row))
prod_number_array_bq.append(sliced)
s3 = boto3.resource('s3')
bucket = s3.Bucket('********')
for key in bucket.objects.all():
sliced = re.sub("[^0-9]", "", str(key.key))
if((set(sliced) & set(prod_number_array_bq))!=""):
bucket.download_file(key.key,sliced + '.txt')
Help would be appreciated :)
Thanks
Typically when you see a 403 on HeadObject despite having the s3:GetObject permission, it's because the s3:ListObjects permission wasn't provided for the bucket AND your key doesn't exist. It's a security measure to prevent exposing information about what objects are or aren't in your bucket. When you have both the s3:GetObject permission for the objects in a bucket, and the s3:ListObjects permission for the bucket itself, the response for a non-existent key is a 404 "no such key" response. If you only have s3:GetObject permission and request a non-existent object, the response is a 403 "access denied".
In my case, I could read the file but couldn't download it
So I the following would have printed the file information
resp = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=origin)
print(resp)
but then this would have given botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden error
s3_client.download_file(bucket_name, origin, destination)
The problem was the model was uploaded from different AWS account. We were missing ACL on upload. so,
so we uploaded the file with the following command
s3_client.upload_file(origin,
bucket_name,
destination,
ExtraArgs={'ACL':'bucket-owner-full-control'})
and this led us to read and download the file as we expectd.

Uploading large files to Google Storage GCE from a Kubernetes pod

We get this error when uploading a large file (more than 10Mb but less than 100Mb):
403 POST https://www.googleapis.com/upload/storage/v1/b/dm-scrapes/o?uploadType=resumable: ('Response headers must contain header', 'location')
Or this error when the file is more than 5Mb
403 POST https://www.googleapis.com/upload/storage/v1/b/dm-scrapes/o?uploadType=multipart: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>)
It seems that this API is looking at the file size and trying to upload it via multi part or resumable method. I can't imagine that is something that as a caller of this API I should be concerned with. Is the problem somehow related to permissions? Does the bucket need special permission do it can accept multipart or resumable upload.
from google.cloud import storage
try:
client = storage.Client()
bucket = client.get_bucket('my-bucket')
blob = bucket.blob('blob-name')
blob.upload_from_filename(zip_path, content_type='application/gzip')
except Exception as e:
print(f'Error in uploading {zip_path}')
print(e)
We run this inside a Kubernetes pod so the permissions get picked up by storage.Client() call automatically.
We already tried these:
Can't upload with gsutil because the container is Python 3 and gsutil does not run in python 3.
Tried this example: but runs into the same error: ('Response headers must contain header', 'location')
There is also this library. But it is basically alpha quality with little activity and no commits for a year.
Upgraded to google-cloud-storage==1.13.0
Thanks in advance
The problem was indeed the credentials. Somehow the error message was very miss-leading. When we loaded the credentials explicitly the problem went away.
# Explicitly use service account credentials by specifying the private key file.
storage_client = storage.Client.from_service_account_json(
'service_account.json')
I found my node pools had been spec'd with
oauthScopes:
- https://www.googleapis.com/auth/devstorage.read_only
and changing it to
oauthScopes:
- https://www.googleapis.com/auth/devstorage.full_control
fixed the error. As described in this issue the problem is an uninformative error message.

Google Drive API: Can't upload certain filetypes

I created a form and a simple server with google appengine with which to upload arbitrary file types to my google drive. The form fails to work for certain file types and just gives this error instead:
HttpError: <HttpError 400 when requesting https://www.googleapis.com/upload/drive/v1/files?alt=json returned "Unsupported content with type: application/pdf">
Aren't pdf files supported?
The appengine code that does the upload goes somewhat like this:
def upload_to_drive(self, filestruct):
resource = {
'title': filestruct.filename,
'mimeType': filestruct.type,
}
resource = self.service.files().insert(
body=resource,
media_body=MediaInMemoryUpload(filestruct.value,
filestruct.type),
).execute()
def post(self):
creds = StorageByKeyName(Credentials, my_user_id, 'credentials').get()
self.service = CreateService('drive', 'v1', creds)
post_dict = self.request.POST
for key in post_dict.keys():
if isinstance(post_dict[key], FieldStorage):#might need to import from cgi
#upload to drive and return link
self.upload_to_drive(post_dict[key]) #TODO: there should be error handling here
I've successfully used it for MS Office documents and images. It doesn't work for textfiles too and gives this error:
HttpError: <HttpError 400 when requesting https://www.googleapis.com/upload/drive/v1/files?alt=json returned "Multipart content has too many non-media parts">
I've tried unsetting the 'mimeType' value in the resource dict to let google drive set it automatically. I also tried unsetting the mime type value in the MediaInMemoryUpload constructor. Sadly, none of both worked.
It seems to me that you are using an old version of the Python client library and referring to Drive API v1, while Drive API v2 has been available since the end of June.
Please try updating your library and check the complete Python sample at https://developers.google.com/drive/examples/python.

Categories

Resources