I have a script that copy files from one S3 account to another S3 account, It was working befoure!!!! That's for sure. Than I tried it today and it doesn't any more it gives me error S3ResponseError: 403 Forbidden. I'm 100% sure credentials are correct and I can go and download keys from both accounts manualy using aws console.
Code
def run(self):
while True:
# Remove and return an item from the queue
key_name = self.q.get()
k = Key(self.s_bucket, key_name)
d_key = Key(self.d_bucket, k.key)
if not d_key.exists() or k.etag != d_key.etag:
print 'Moving {file_name} from {s_bucket} to {d_bucket}'.format(
file_name = k.key,
s_bucket = source_bucket,
d_bucket = dest_bucket
)
# Create a new key in the bucket by copying another existing key
acl = self.s_bucket.get_acl(k)
self.d_bucket.copy_key( d_key.key, self.s_bucket.name, k.key, storage_class=k.storage_class)
d_key.set_acl(acl)
else:
print 'File exist'
self.q.task_done()
Error:
File "s3_to_s3.py", line 88, in run
self.d_bucket.copy_key( d_key.key, self.s_bucket.name, k.key, storage_class=k.storage_class)
File "/usr/lib/python2.7/dist-packages/boto/s3/bucket.py", line 689, in copy_key
response.reason, body)
S3ResponseError: S3ResponseError: 403 Forbidden
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>0729E8ADBD7A9E60</RequestId><HostId>PSbbWCLBtLAC9cjW+52X1fUSVErnZeN79/w7rliDgNbLIdCpc9V0bPi8xO9fp1od</HostId></Error>
Try this: copy key from source bucket to destination bucket using boto's Key class
source_key_name = 'image.jpg' # for example
#return Key object
source_key = source_bucket.get_key(source_key_name)
#use Key.copy
source_key.copy(destination_bucket,source_key_name)
regarding the copy function. you can set preserve_acl to True and it will be copied from the source key.
Boto's Key.copy signature:
def copy(self, dst_bucket, dst_key, metadata=None,
reduced_redundancy=False, preserve_acl=False,
encrypt_key=False, validate_dst_bucket=True):
Related
I have flask python rest api which is called by another flask rest api.
the input for my api is one parquet file (FileStorage object) and ECS connection and bucket details.
I want to save parquet file to ECS in a specific folder using boto or boto3
the code I have tried
def uploadFileToGivenBucket(self,inputData,file):
BucketName = inputData.ecsbucketname
calling_format = OrdinaryCallingFormat()
client = S3Connection(inputData.access_key_id, inputData.secret_key, port=inputData.ecsport,
host=inputData.ecsEndpoint, debug=2,
calling_format=calling_format)
#client.upload_file(BucketName, inputData.filename, inputData.folderpath)
bucket = client.get_bucket(BucketName,validate=False)
key = boto.s3.key.Key(bucket, inputData.filename)
fileName = NamedTemporaryFile(delete=False,suffix=".parquet")
file.save(fileName)
with open(fileName.name) as f:
key.send_file(f)
but it is not working and giving me error like...
signature_host = '%s:%d' % (self.host, port)
TypeError: %d format: a number is required, not str
I tried google but no luck Can anyone help me with this or any sample code for the same.
After a lot of hit and tried and time, I finally got the solution. I posting it for everyone else who are facing the same issue.
You need to use Boto3 and here is the code...
def uploadFileToGivenBucket(self,inputData,file):
BucketName = inputData.ecsbucketname
#bucket = client.get_bucket(BucketName,validate=False)
f = NamedTemporaryFile(delete=False,suffix=".parquet")
file.save(f)
endpointurl = "<your endpoints>"
s3_client = boto3.client('s3',endpoint_url=endpointurl, aws_access_key_id=inputData.access_key_id,aws_secret_access_key=inputData.secret_key)
try:
newkey = 'yourfolderpath/anotherfolder'+inputData.filename
response = s3_client.upload_file(f.name, BucketName,newkey)
except ClientError as e:
logging.error(e)
return False
return True
I'm trying to update the content of a file from a python script using the google client api. The problem is that I keep receiving error 403:
An error occurred: <HttpError 403 when requesting https://www.googleapis.com /upload/drive/v3/files/...?alt=json&uploadType=resumable returned "The resource body includes fields which are not directly writable.
I have tried to remove metadata fields, but didn't help.
The function to update the file is the following:
# File: utilities.py
from googleapiclient import errors
from googleapiclient.http import MediaFileUpload
from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
def update_file(service, file_id, new_name, new_description, new_mime_type,
new_filename):
"""Update an existing file's metadata and content.
Args:
service: Drive API service instance.
file_id: ID of the file to update.
new_name: New name for the file.
new_description: New description for the file.
new_mime_type: New MIME type for the file.
new_filename: Filename of the new content to upload.
new_revision: Whether or not to create a new revision for this file.
Returns:
Updated file metadata if successful, None otherwise.
"""
try:
# First retrieve the file from the API.
file = service.files().get(fileId=file_id).execute()
# File's new metadata.
file['name'] = new_name
file['description'] = new_description
file['mimeType'] = new_mime_type
file['trashed'] = True
# File's new content.
media_body = MediaFileUpload(
new_filename, mimetype=new_mime_type, resumable=True)
# Send the request to the API.
updated_file = service.files().update(
fileId=file_id,
body=file,
media_body=media_body).execute()
return updated_file
except errors.HttpError as error:
print('An error occurred: %s' % error)
return None
And here there is the whole script to reproduce the problem.
The goal is to substitute a file, retrieving its id by name.
If the file does not exist yet, the script will create it by calling insert_file (this function works as expected).
The problem is update_file, posted above.
from __future__ import print_function
from utilities import *
from googleapiclient import errors
from googleapiclient.http import MediaFileUpload
from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
def get_authenticated(SCOPES, credential_file='credentials.json',
token_file='token.json', service_name='drive',
api_version='v3'):
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
store = file.Storage(token_file)
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets(credential_file, SCOPES)
creds = tools.run_flow(flow, store)
service = build(service_name, api_version, http=creds.authorize(Http()))
return service
def retrieve_all_files(service):
"""Retrieve a list of File resources.
Args:
service: Drive API service instance.
Returns:
List of File resources.
"""
result = []
page_token = None
while True:
try:
param = {}
if page_token:
param['pageToken'] = page_token
files = service.files().list(**param).execute()
result.extend(files['files'])
page_token = files.get('nextPageToken')
if not page_token:
break
except errors.HttpError as error:
print('An error occurred: %s' % error)
break
return result
def insert_file(service, name, description, parent_id, mime_type, filename):
"""Insert new file.
Args:
service: Drive API service instance.
name: Name of the file to insert, including the extension.
description: Description of the file to insert.
parent_id: Parent folder's ID.
mime_type: MIME type of the file to insert.
filename: Filename of the file to insert.
Returns:
Inserted file metadata if successful, None otherwise.
"""
media_body = MediaFileUpload(filename, mimetype=mime_type, resumable=True)
body = {
'name': name,
'description': description,
'mimeType': mime_type
}
# Set the parent folder.
if parent_id:
body['parents'] = [{'id': parent_id}]
try:
file = service.files().create(
body=body,
media_body=media_body).execute()
# Uncomment the following line to print the File ID
# print 'File ID: %s' % file['id']
return file
except errors.HttpError as error:
print('An error occurred: %s' % error)
return None
# If modifying these scopes, delete the file token.json.
SCOPES = 'https://www.googleapis.com/auth/drive'
def main():
service = get_authenticated(SCOPES)
# Call the Drive v3 API
results = retrieve_all_files(service)
target_file_descr = 'Description of deploy.py'
target_file = 'deploy.py'
target_file_name = target_file
target_file_id = [file['id'] for file in results if file['name'] == target_file_name]
if len(target_file_id) == 0:
print('No file called %s found in root. Create it:' % target_file_name)
file_uploaded = insert_file(service, target_file_name, target_file_descr, None,
'text/x-script.phyton', target_file_name)
else:
print('File called %s found. Update it:' % target_file_name)
file_uploaded = update_file(service, target_file_id[0], target_file_name, target_file_descr,
'text/x-script.phyton', target_file_name)
print(str(file_uploaded))
if __name__ == '__main__':
main()
In order to try the example, is necessary to create a Google Drive API from https://console.developers.google.com/apis/dashboard,
then save the file credentials.js and pass its path to get_authenticated(). The file token.json will be created after the first
authentication and API authorization.
The problem is that the metadata 'id' can not be changed when updating a file, so it should not be in the body. Just delete it from the dict:
# File's new metadata.
del file['id'] # 'id' has to be deleted
file['name'] = new_name
file['description'] = new_description
file['mimeType'] = new_mime_type
file['trashed'] = True
I tried your code with this modification and it works
I also struggled a little bit with the function and found if you don't have to update the metadata then just remove them in the update function like :updated_file = service.files().update(fileId=file_id, media_body=media_body).execute()
At Least that worked for me
The problem is The resource body includes fields which are not directly writable. So try removing all of the metadata properties and then add them back one by one. The one I would be suspicious about is trashed. Even though the API docs say this is writable, it shouldn't be. Trashing a file has side effects beyond setting a boolean. Updating a file and setting it to trashed at the same time is somewhat unusual. Are you sure that's what you intend?
I am trying to set metadata during pushing a file to S3.
This is how it looks like :
def pushFileToBucket(fileName, bucket, key_name, metadata):
full_key_name = os.path.join(fileName, key_name)
k = bucket.new_key(full_key_name)
k.set_metadata('my_key', 'value')
k.set_contents_from_filename(fileName)
For some reason this throws error at set_metadata saying :
boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
<?xml version="1.0" encoding="UTF-8"?><Error><Code>SignatureDoesNotMatch</Code></Error>
And when I remove this set_metadata part, the file is getting stored correctly.
Not sure what I am doing wrong. If the access key was invalid, then it wouldn't have saved the file anyway!
Another approach for someone using upload_file:
s3 = boto3.client('s3')
path = 'foo/bar.json'
file = 'bar.json'
bucket_name = 'foobar_bucket'
extra_args = {'CacheControl': 'max-age=86400'}
s3.upload_file(path, bucket_name, file_name, extra_args)
This would set the Cache-Control header on the file.
Got this fixed. Apparently we cannot have an underscore in the metadata key name.
I'm trying to emulate the flow of my server application creating a temporary access/secret key pair for a mobile device using my own authentication. Mobile device talks to my server and end result is it gets AWS credentials.
I'm using Cognito with a custom developer backend, see documentation here.
To this end, I've made the script below, but my secret/access key credentials don't work:
import time
import traceback
from boto.cognito.identity.layer1 import CognitoIdentityConnection
from boto.sts import STSConnection
from boto.s3.connection import S3Connection
from boto.s3.key import Key
AWS_ACCESS_KEY_ID = "XXXXX"
AWS_SECRET_ACCESS_KEY = "XXXXXX"
# get token
iden_pool_id = "us-east-1:xxx-xxx-xxx-xxxx-xxxx"
role_arn = "arn:aws:iam::xxxx:role/xxxxxxx"
user_id = "xxxx"
role_session_name = "my_session_name_here"
bucket_name = 'xxxxxxxxxx'
connection = CognitoIdentityConnection(aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY)
web_identity_token = connection.get_open_id_token_for_developer_identity(
identity_pool_id=iden_pool_id,
logins={"xxxxxxxxx" : user_id},
identity_id=None,
token_duration=3600)
# use token to get credentials
sts_conn = STSConnection(aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY)
result = sts_conn.assume_role_with_web_identity(
role_arn,
role_session_name,
web_identity_token['Token'],
provider_id=None,
policy=None,
duration_seconds=3600)
print "The user now has an access ID (%s) and a secret access key (%s) and a session/security token (%s)!" % (
result.credentials.access_key, result.credentials.secret_key, result.credentials.session_token)
# just use any call that tests if these credentials work
from boto.ec2.connection import EC2Connection
ec2 = EC2Connection(result.credentials.access_key, result.credentials.secret_key, security_token=result.credentials.session_token)
wait = 1
cumulative_wait_time = 0
while True:
try:
print ec2.get_all_regions()
break
except Exception as e:
print e, traceback.format_exc()
time.sleep(2**wait)
cumulative_wait_time += 2**wait
print "Waited for:", cumulative_wait_time
wait += 1
My thought with the exponential backoff was that perhaps Cognito takes a while to propagate the new access/secret key pair, and thus I might have to wait (pretty unacceptable if so!).
However, this script runs for a 10 minutes and doesn't succeed, which leads me to believe the problem is something else.
Console print out:
The user now has an access ID (xxxxxxxx) and a secret access key (xxxxxxxxxx) and a session/security token (XX...XX)!
EC2ResponseError: 401 Unauthorized
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>AuthFailure</Code><Message>AWS was not able to validate the provided access credentials</Message></Error></Errors><RequestID>xxxxxxxxxx</RequestID></Response> Traceback (most recent call last):
File "/home/me/script.py", line 50, in <module>
print ec2.get_all_regions()
File "/home/me/.virtualenvs/venv/local/lib/python2.7/site-packages/boto/ec2/connection.py", line 3477, in get_all_regions
[('item', RegionInfo)], verb='POST')
File "/home/me/.virtualenvs/venv/local/lib/python2.7/site-packages/boto/connection.py", line 1186, in get_list
raise self.ResponseError(response.status, response.reason, body)
EC2ResponseError: EC2ResponseError: 401 Unauthorized
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>AuthFailure</Code><Message>AWS was not able to validate the provided access credentials</Message></Error></Errors><RequestID>xxxxxxxxxxxxx</RequestID></Response>
Waited for: 2
...
...
Any thoughts?
You are correctly extracting the access key and secret key from the result of the assume_role_with_web_identity call. However, when using the temporary credentials, you also need to use the security token from the result.
Here is pseudocode describing what you need to do:
http://docs.aws.amazon.com/STS/latest/UsingSTS/using-temp-creds.html#using-temp-creds-sdk
Also note the security_token parameter for EC2Connection
http://boto.readthedocs.org/en/latest/ref/ec2.html#boto.ec2.connection.EC2Connection
Hopefully this solves the problem
-Mark
I'm using Python and tinys3 to write files to S3, but it's not working. Here's my code:
import tinys3
conn = tinys3.Connection('xxxxxxx','xxxxxxxx',tls=True)
f = open('testing_s3.txt','rb')
print conn.upload('testing_data/testing_s3.txt',f,'testing-bucket')
print conn.get('testing_data/testing_s3.txt','testing-bucket')
That gives the output:
<Response [301]>
<Response [301]>
When I try specifying the endpoint, I get:
requests.exceptions.HTTPError: 403 Client Error: Forbidden
Any idea what I'm doing wrong?
Edit: When I try using boto, it works, so the problem isn't in the access key or secret key.
I finally figured this out. Here is the correct code:
import tinys3
conn = tinys3.Connection('xxxxxxx','xxxxxxxx',tls=True,endpoint='s3-us-west-1.amazonaws.com')
f = open('testing_s3.txt','rb')
print conn.upload('testing_data/testing_s3.txt',f,'testing-bucket')
print conn.get('testing_data/testing_s3.txt','testing-bucket')
You have to use the region endpoint, not s3.amazonaws.com. You can look up the region endpoint from here: http://docs.aws.amazon.com/general/latest/gr/rande.html. Look under the heading "Amazon Simple Storage Service (S3)."
I got the idea from this thread: https://github.com/smore-inc/tinys3/issues/5
If using an IAM user it is necessary to allow the "s3:PutObjectAcl" action.
Don't know why but this code never worked for me.
I've switched to boto, and it just uploaded file from 1 time.
AWS_ACCESS_KEY_ID = 'XXXXXXXXXXXXXXXXXXXXX'
AWS_SECRET_ACCESS_KEY = 'XXXXXXXXXXXXXXXXXXXXX/XXXXXXXXXXXXXXXXXXXXXXXXXXX'
bucket_name = 'my-bucket'
conn = boto.connect_s3(AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket('my-bucket')
print 'Uploading %s to Amazon S3 bucket %s' % \
(filename, bucket_name)
k = Key(bucket)
k.key = filename
k.set_contents_from_filename(filename,
cb=percent_cb, num_cb=10)