Signed CloudFront URL for a S3 bucket - python

I'm trying to create a signed CloudFront URL for an object in a Frankfurt S3 bucket (using the python library boto). This used to work very well with eu-west-1 buckets, but now I'm getting the following error message:
<Error>
<Code>InvalidRequest</Code>
<Message>
The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.
</Message>
...
I understand that new S3 locations need API requests to be signed using AWS4-HMAC-SHA256, but I can't find anything in the AWS documentation how this changes the creation of signed CloudFront URLs
Edit:
To clarify, the following code produces a signed URL without raising an error ... The error occurs when opening the created URL in the browser afterwards
cf = cloudfront.CloudFrontConnection(aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY)
distribution_summary = cf.get_all_distributions()[0]
distribution = distribution_summary.get_distribution()
return distribution.create_signed_url(url,
settings.CLOUDFRONT_KEY_ID,
int(time()) + expiration,
private_key_file=settings.PRIVATE_KEY_FILE)

I found the issue, it was actually the cloud front distribution itself. It seems like moving the origin of the distribution which was already existing (for a long time) from an US bucket to an EU bucket didn't work out.
I created a new one with the same settings (except a new Origin Access Identity) and it worked without any issues.

Related

How can I use AWS Textract with Python

I have tested almost every example code I can find on the Internet for Amazon Textract and I cant get it to work. I can upload and download a file to S3 from my Python client so the credentials should be OK. Lots of the errors points to some region failure but I have try every possible combinations.
Here are one of the last test call -
def test_parse_3():
# Document
s3BucketName = "xx-xxxx-xx"
documentName = "xxxx.jpg"
# Amazon Textract client
textract = boto3.client('textract')
# Call Amazon Textract
response = textract.detect_document_text(
Document={
'S3Object': {
'Bucket': s3BucketName,
'Name': documentName
}
})
print(response)
seems to be pretty easy but it generates the error -
botocore.errorfactory.InvalidS3ObjectException: An error occurred (InvalidS3ObjectException) when calling the DetectDocumentText operation: Unable to get object metadata from S3. Check object key, region and/or access permissions.
Any ideas whats wrong and dose someone have a working example (I knew the tabs are not correct in the example code)?
I have also tested a lot of permission settings in AWS. The credentials are in a hidden files created by aws sdk.
I am sure you already know, but the bucket is case sensitive. If you have verified that both the object bucket and name are correct, just make sure to add the appropriate region to your credentials.
I tested just reading from s3 without including the region in the credentials and I was able to list the objects in the bucket with no issues. I am thinking this worked because s3 is supposed to be region agnostic. However, since Textract is region specific, you must define the region in your credentials when using Textract to get the data from the s3 bucket.
I realize this was asked a few months ago, but I am hoping this sheds some light to others that face this issue in the future.

Tensorflow - S3 object does not exist

How do I set up direct private bucket access for Tensorflow?
After running
from tensorflow.python.lib.io import file_io
and running print file_io.stat('s3://my/private/bucket/file.json') I end up with an error -
NotFoundError: Object s3://my/private/bucket/file.json does not exist
However, the same line on a public object works without an error:
print file_io.stat('s3://ryft-public-sample-data/wikipedia-20150518.bin')
There appears to be an article on support here: https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/s3.md
However, I end up with the same error after exporting the variables shown.
I have awscli set up with all credentials, and boto3 can view and download the file in question. I am wondering how I can get Tensorflow to have S3 access directly when the bucket is private.
I had the same problem when trying to access files in private S3 bucket from Sagemaker notebook. The mistake I made was to try using credentials I obtained from boto3, which seem not to be valid outside.
The solution was not to specify credentials (in such case it uses the role attached to the machine), but instead just specify the region name (for some reason it didn't read it from ~/.aws/config file) as follows:
import boto3
import os
session = boto3.Session()
os.environ['AWS_REGION']=session.region_name
NOTE: when debugging this error useful was to look at CloudWatch logs, as the logs of S3 client were printed only there and not in the Jupyter notebook.
In there I have first have seen, that:
when I did specify credentials from boto3 the error was: The AWS Access Key Id you provided does not exist in our records.
When accessing without AWS_REGION env variable set I had The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint. which apparently is common when you don't specify bucket (see 301 Moved Permanently after S3 uploading)

Azure Blob Store SAS token missing Service Resource field

I created a Shared Access Signature(SAS) token on my Azure storage account using the web interface. The token looks like
?sv=xxxx-xx-xx&ss=b&srt=sco&sp=rl&se=xxxx-xx-xxTxx:xx:xxZ&st=xxxx-xx-xxTxx:xx:xxZ&spr=https&sig=xxxxxxxxxxxxxxxxxxxxxx
The SAS token here is missing the sr field for Service Resource. I have to manually prepend the sr=b to the query string to get things to work. I must be doing something wrong, because this seems extremely finicky.
from azure.storage.blob import BlockBlobService
sas_token = "?sv=xxxx-xx-xx&ss=b&srt=sco&sp=rl&se=xxxx-xx-xxTxx:xx:xxZ&st=xxxx-xx-xxTxx:xx:xxZ&spr=https&sig=xxxxxxxxxxxxxxxxxxxxxx"
sas_token = "?sr=b&" + sas_token[1:]
serv = BlockBlobService(account_name='myaccount', sas_token=sas_token)
for cont in serv.list_containers():
print cont.name
Without the sas_token = "?sr=b&" + sas_token[1:] I get the error:
sr is mandatory. Cannot be empty
And if the sr=b field is not first in the query, I get an authentication error like
Access without signed identifier cannot have time window more than 1 hour
Access without signed identifier cannot have time window more than 1 hour
Based on this error message, you may need to set expiry time less than 1 hour from now. See Windows Azure Shared Access Signature always gives: Forbidden 403.
I took your code with Python v2.7.12 and #azure-storage-python v0.34.3 (the latest version). And it worked well on my site. So, I'd recommend you upgrade to latest version and try it again.
UPDATE:
I traced the code of Azure Storage SDK for Python and here's what I found. The SDK is a REST API warpper which assumes that the SAS token looks like this:
sv=2015-04-05&ss=bfqt&srt=sco&sp=rl&se=2015-09-20T08:49Z&sip=168.1.5.60-168.1.5.70&sig=a39%2BYozJhGp6miujGymjRpN8tsrQfLo9Z3i8IRyIpnQ%3d
As you can see, the token doesn't include ?. And the SDK will append ? before the SAS token when it makes a GET request to the Azure Storage REST service.
This would cause that the key of the signed version was parsed as ?sv, then it raised the issue. So, to avoid this, we should remove the ? from the SAS token.

S3: ExpiredToken error for S3 pre-signed url within expiry period

This is how I am generating pre-signed url for an S3 object from my python script.
s3client = boto3.client("s3")
url = s3client.generate_presigned_url("get_object", Params={"Bucket": args.bucket, "Key": dated_filename}, ExpiresIn=86400)
where I am giving an expiry of 24 hours.
When I try to download the file immediately using the url from a browser, it works. But it doesn't work if I try to download it, say after 10-12 hours (I don't know the exact time after which it starts failing).
This is the error I am getting.
<Code>ExpiredToken</Code>
<Message>The provided token has expired.</Message>
Not sure if it is a bug or I am not doing it the right way. Any help would be appreciated.
Are you running under an IAM role? A presigned URL is only valid as long as the session key that was used when generating it. If you are authenticating as an IAM user with long-lived access keys, this is not a problem. But IAM roles use temporary access keys that cycle every 36 hours.
You know your session key has expired because you are getting the "The provided token has expired." error, which (as noted above) is a different error message than "Request has expired " which you get when the presigned URL reached its expiration date.
Also, presigned URLs have a hard limit of 7 days - but that doesn't seem to be your problem.
It is all about the AWS credentials that you are using see https://aws.amazon.com/premiumsupport/knowledge-center/presigned-url-s3-bucket-expiration/

Using versioning with signed urls in google cloud storage

I'm having difficulty signing GET requests for Google Cloud Storage (GCS) when specifying a 'generation' (version number) on a the object. Signing the URL without the generation works like a charm and GET requests work fine. However, when I prepend #generation to the path, the GCS server always returns "access denied" when attempting to GET the signed URL.
For example, signing this URL path works fine:
https://storage.googleapis.com/BUCKET/OBJECT
signing this URL path gives me access denied:
https://storage.googleapis.com/BUCKET/OBJECT#1360887697105000
Note that for brevity and privacy, I've omitted what the actual signed URL with Signature, Expires, GoogleAccessId parameters. Also note, that I have also verified the bucket, object, and generation are correct using gsutil.
Does GCS allow for Signed URL access to specific object versions by 'generation' number? Is the URL signing procedure different when accessing a specific version?
The URL you're using is gsutil-compatible, but the XML API requires that you denote generation with a query parameter (which would look like storage.googleapis.com/BUCKET/OBJECT?generation=1360887697105000).
Documentation is here for reference: developers.google.com/storage/docs/reference-headers#generation

Categories

Resources