How to set metadata in S3 using boto?

How to set metadata in S3 using boto? - python

I am trying to set metadata during pushing a file to S3.
This is how it looks like :
def pushFileToBucket(fileName, bucket, key_name, metadata):
full_key_name = os.path.join(fileName, key_name)
k = bucket.new_key(full_key_name)
k.set_metadata('my_key', 'value')
k.set_contents_from_filename(fileName)
For some reason this throws error at set_metadata saying :
boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
<?xml version="1.0" encoding="UTF-8"?><Error><Code>SignatureDoesNotMatch</Code></Error>
And when I remove this set_metadata part, the file is getting stored correctly.
Not sure what I am doing wrong. If the access key was invalid, then it wouldn't have saved the file anyway!

Another approach for someone using upload_file:
s3 = boto3.client('s3')
path = 'foo/bar.json'
file = 'bar.json'
bucket_name = 'foobar_bucket'
extra_args = {'CacheControl': 'max-age=86400'}
s3.upload_file(path, bucket_name, file_name, extra_args)
This would set the Cache-Control header on the file.

Got this fixed. Apparently we cannot have an underscore in the metadata key name.

Related

Change directory of xlsx file in s3 bucket using AWS Lambda

The goal of my code is to change the directory of a file every 24 hours (because every day a new one is created with another lambda function). I want to get the current file from my s3 bucket and write it to another directory in the same s3 bucket. Currently, this line of the code does not work: s3.put_object(Body=response, Bucket=bucket, Key=fileout) and I get this error: "errorMessage": "Parameter validation failed:\nInvalid type for parameter Body, "errorType": "ParamValidationError" What does the error mean and what is needed in order to be able to store the response in the history directory?
import boto3
import json
s3 = boto3.client('s3')
bucket = "some-bucket"
def lambda_handler(event, context):
file='latest/some_file.xlsx'
response = s3.get_object(Bucket=bucket, Key=file)
fileout = 'history/some_file.xlsx'
s3.put_object(Body=response, Bucket=bucket, Key=fileout)
return {
'statusCode': 200,
'body': json.dumps(data),
}

The response variable in your code stores more than just the actual xlsx file. You should get the body from the response and pass it to the put object method.
response = s3.get_object(Bucket=bucket, Key=file)['Body']

How do I set the Content-Type of an existing S3 key with boto3?

I want to update the Content-Type of an existing object in a S3 bucket, using boto3, but how do I do that, without having to re-upload the file?
file_object = s3.Object(bucket_name, key)
print file_object.content_type
# binary/octet-stream
file_object.content_type = 'application/pdf'
# AttributeError: can't set attribute
Is there a method for this I have missed in boto3?
Related questions:
How to set Content-Type on upload
How to set the content type of an S3 object via the SDK?

There doesn't seem to exist any method for this in boto3, but you can copy the file to overwrite itself.
To do this using the AWS low level API through boto3, do like this:
s3 = boto3.resource('s3')
api_client = s3.meta.client
response = api_client.copy_object(Bucket=bucket_name,
Key=key,
ContentType="application/pdf",
MetadataDirective="REPLACE",
CopySource=bucket_name + "/" + key)
The MetadataDirective="REPLACE" turns out to be required for S3 to overwrite the file, otherwise you will get an error message saying This copy request is illegal because it is trying to copy an object to itself without changing the object's metadata, storage class, website redirect location or encryption attributes.
.
Or you can use copy_from, as pointed out by Jordon Phillips in the comments:
s3 = boto3.resource("s3")
object = s3.Object(bucket_name, key)
object.copy_from(CopySource={'Bucket': bucket_name,
'Key': key},
MetadataDirective="REPLACE",
ContentType="application/pdf")

In addition to #leo's answer, be careful if you have custom metadata on your object.
To avoid side effects, I propose adding Metadata=object.metadata in the leo's code otherwise you could lose previous custom metadata:
s3 = boto3.resource("s3")
object = s3.Object(bucket_name, key)
object.copy_from(
CopySource={'Bucket': bucket_name, 'Key': key},
Metadata=object.metadata,
MetadataDirective="REPLACE",
ContentType="application/pdf"
)

You can use upload_file function from boto3 and use ExtraArgs param to specify the content type, this will overwrite the existing file with the content type, check out this reference
check this below example:
import boto3
import os
client = boto3.client("s3")
temp_file_path = "<path_of_your_file>"
client.upload_file(temp_ticket_path, <BUCKET_NAME>, temp_file_path, ExtraArgs={'ContentType': 'application/pdf'})

How to access keys from buckets with periods (.) in their names using boto3?

Context
I am trying to get an encryption status for all of my buckets for a security report. However, since encryption is on a key level basis, I want to iterate through all of the keys and get a general encryption status. For example, "yes" is all keys are encrypted, "no" if none are encrypted, and "partially" is some are encrypted.
I must use boto3 because there is a known issue with boto where the encryption status for each key always returns None. See here.
Problem
I am trying to iterate over all the keys in each of my buckets using boto3. The following code works fine until it runs into buckets with names that contain periods, such as "my.test.bucket".
from boto3.session import Session
session = Session(aws_access_key_id=<ACCESS_KEY>,
aws_secret_access_key=<SECRET_KEY>,
aws_session_token=<TOKEN>)
s3_resource = session.resource('s3')
for bucket in s3_resource.buckets.all():
for obj in bucket.objects.all():
key = s3_resource.Object(bucket.name, obj.key)
# Do some stuff with the key...
When it hits a bucket with a period in the name, it throws this exception when bucket.objects.all() is called, telling me to send all requests to a specific endpoint. This endpoint can be found in the exception object that is thrown.
for obj in bucket.objects.all():
File "/usr/local/lib/python2.7/site-packages/boto3/resources/collection.py", line 82, in __iter__
for page in self.pages():
File "/usr/local/lib/python2.7/site-packages/boto3/resources/collection.py", line 165, in pages
for page in pages:
File "/usr/lib/python2.7/dist-packages/botocore/paginate.py", line 85, in __iter__
response = self._make_request(current_kwargs)
File "/usr/lib/python2.7/dist-packages/botocore/paginate.py", line 157, in _make_request
return self._method(**current_kwargs)
File "/usr/lib/python2.7/dist-packages/botocore/client.py", line 310, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/lib/python2.7/dist-packages/botocore/client.py", line 395, in _make_api_call
raise ClientError(parsed_response, operation_name)botocore.exceptions.ClientError: An error occurred (PermanentRedirect) when calling the ListObjects operation: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
Things I have tried
Setting the endpoint_url paramter to the bucket endpoint specified in the exception response like s3_resource = session.resource('s3', endpoint_url='my.test.bucket.s3.amazonaws.com')
Specifying the region the bucket is located in like s3_resource = session.resource('s3', region_name='eu-west-1')
I believe the problem is similar to this stackoverflow question in boto, which fixes the problem by setting the calling_format parameter in the s3Connection constructor. Unfortunately, I can't use boto though (see above).
Update
Here is what ended up working for me. It is not the most elegant approach, but it works =).
from boto3.session import Session
session = Session(aws_access_key_id=<ACCESS_KEY>,
aws_secret_access_key=<SECRET_KEY>,
aws_session_token=<TOKEN>)
s3_resource = session.resource('s3')
# First get all the bucket names
bucket_names = [bucket.name for bucket in s3_resource.buckets.all()]
for bucket_name in bucket_names:
# Check each name for a "." and use a different resource if needed
if "." in bucket_name:
region = session.client('s3').get_bucket_location(Bucket=bucket_name)['LocationConstraint']
resource = session.resource('s3', region_name=region)
else:
resource = s3_resource
bucket = resource.Bucket(bucket_name)
# Continue as usual using this resource
for obj in bucket.objects.all():
key = resource.Object(bucket.name, obj.key)
# Do some stuff with the key...

Just generalizing the great answer provided from Ben.
import boto3
knownBucket = 'some.topLevel.BucketPath.withPeriods'
s3 = boto3.resource('s3')
#get region
region = s3.meta.client.get_bucket_location(Bucket=knownBucket)['LocationConstraint']
#set region in resource
s3 = boto3.resource('s3',region_name=region)

There are a couple of github issues on this. It's related to the region of the bucket. Make sure that your S3 resource is in the same region as the bucket you've created.
FWIW you can determine the region programmatically like this:
s3.meta.client.get_bucket_location(Bucket='boto3.region')

Upload to Amazon S3 using tinys3

I'm using Python and tinys3 to write files to S3, but it's not working. Here's my code:
import tinys3
conn = tinys3.Connection('xxxxxxx','xxxxxxxx',tls=True)
f = open('testing_s3.txt','rb')
print conn.upload('testing_data/testing_s3.txt',f,'testing-bucket')
print conn.get('testing_data/testing_s3.txt','testing-bucket')
That gives the output:
<Response [301]>
<Response [301]>
When I try specifying the endpoint, I get:
requests.exceptions.HTTPError: 403 Client Error: Forbidden
Any idea what I'm doing wrong?
Edit: When I try using boto, it works, so the problem isn't in the access key or secret key.

I finally figured this out. Here is the correct code:
import tinys3
conn = tinys3.Connection('xxxxxxx','xxxxxxxx',tls=True,endpoint='s3-us-west-1.amazonaws.com')
f = open('testing_s3.txt','rb')
print conn.upload('testing_data/testing_s3.txt',f,'testing-bucket')
print conn.get('testing_data/testing_s3.txt','testing-bucket')
You have to use the region endpoint, not s3.amazonaws.com. You can look up the region endpoint from here: http://docs.aws.amazon.com/general/latest/gr/rande.html. Look under the heading "Amazon Simple Storage Service (S3)."
I got the idea from this thread: https://github.com/smore-inc/tinys3/issues/5

If using an IAM user it is necessary to allow the "s3:PutObjectAcl" action.

Don't know why but this code never worked for me.
I've switched to boto, and it just uploaded file from 1 time.
AWS_ACCESS_KEY_ID = 'XXXXXXXXXXXXXXXXXXXXX'
AWS_SECRET_ACCESS_KEY = 'XXXXXXXXXXXXXXXXXXXXX/XXXXXXXXXXXXXXXXXXXXXXXXXXX'
bucket_name = 'my-bucket'
conn = boto.connect_s3(AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket('my-bucket')
print 'Uploading %s to Amazon S3 bucket %s' % \
(filename, bucket_name)
k = Key(bucket)
k.key = filename
k.set_contents_from_filename(filename,
cb=percent_cb, num_cb=10)

S3ResponseError: 403 Forbidden using boto

I have a script that copy files from one S3 account to another S3 account, It was working befoure!!!! That's for sure. Than I tried it today and it doesn't any more it gives me error S3ResponseError: 403 Forbidden. I'm 100% sure credentials are correct and I can go and download keys from both accounts manualy using aws console.
Code
def run(self):
while True:
# Remove and return an item from the queue
key_name = self.q.get()
k = Key(self.s_bucket, key_name)
d_key = Key(self.d_bucket, k.key)
if not d_key.exists() or k.etag != d_key.etag:
print 'Moving {file_name} from {s_bucket} to {d_bucket}'.format(
file_name = k.key,
s_bucket = source_bucket,
d_bucket = dest_bucket
)
# Create a new key in the bucket by copying another existing key
acl = self.s_bucket.get_acl(k)
self.d_bucket.copy_key( d_key.key, self.s_bucket.name, k.key, storage_class=k.storage_class)
d_key.set_acl(acl)
else:
print 'File exist'
self.q.task_done()
Error:
File "s3_to_s3.py", line 88, in run
self.d_bucket.copy_key( d_key.key, self.s_bucket.name, k.key, storage_class=k.storage_class)
File "/usr/lib/python2.7/dist-packages/boto/s3/bucket.py", line 689, in copy_key
response.reason, body)
S3ResponseError: S3ResponseError: 403 Forbidden
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>0729E8ADBD7A9E60</RequestId><HostId>PSbbWCLBtLAC9cjW+52X1fUSVErnZeN79/w7rliDgNbLIdCpc9V0bPi8xO9fp1od</HostId></Error>

Try this: copy key from source bucket to destination bucket using boto's Key class
source_key_name = 'image.jpg' # for example
#return Key object
source_key = source_bucket.get_key(source_key_name)
#use Key.copy
source_key.copy(destination_bucket,source_key_name)
regarding the copy function. you can set preserve_acl to True and it will be copied from the source key.
Boto's Key.copy signature:
def copy(self, dst_bucket, dst_key, metadata=None,
reduced_redundancy=False, preserve_acl=False,
encrypt_key=False, validate_dst_bucket=True):

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to set metadata in S3 using boto? - python

Another approach for someone using upload_file: s3 = boto3.client('s3') path = 'foo/bar.json' file = 'bar.json' bucket_name = 'foobar_bucket' extra_args = {'CacheControl': 'max-age=86400'} s3.upload_file(path, bucket_name, file_name, extra_args) This would set the Cache-Control header on the file.

Got this fixed. Apparently we cannot have an underscore in the metadata key name.

Related

Change directory of xlsx file in s3 bucket using AWS Lambda

How do I set the Content-Type of an existing S3 key with boto3?

How to access keys from buckets with periods (.) in their names using boto3?

Upload to Amazon S3 using tinys3

S3ResponseError: 403 Forbidden using boto

Categories

Resources