Downloading a file from a requester pays bucket in amazon s3 - python

I need to download some replay files from an API that has the files stored on an amazon s3 bucket, with requester pays enabled.
The problem is, I set up my amazon AWS account, created an AWSAccessKeyId and AWSSecretKey, but I still can't get to download a single file, since I'm getting an Access denied response.
I want to automate all this inside a Python script, so I've been trying to do this with the boto3 package. Also, I installed the Amazon AWS CLI, and set up my access ID and secret key.
The file I've been trying to download (I want to download multiple ones, but for now I'm trying with just one) is this: http://hotsapi.s3-website-eu-west-1.amazonaws.com/18e8b4df-6dad-e1f5-bfc7-48899e6e6a16.StormReplay
From what I've found so far on SO, I've tried something like this:
import boto3
import botocore
BUCKET_NAME = 'hotsapi' # replace with your bucket name
KEY = '18e8b4df-6dad-e1f5-bfc7-48899e6e6a16.StormReplay' # replace with your object key
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'test.StormReplay')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
And this:
import boto3
s3_client = boto3.Session().client('s3')
response = s3_client.get_object(Bucket='hotsapi',
Key='18e8b4df-6dad-e1f5-bfc7-48899e6e6a16.StormReplay',
RequestPayer='requester')
response_content = response['Body'].read()
with open('./B01.StormReplay', 'wb') as file:
file.write(response_content)
But I still can't manage to download the file.
Any help is welcome! Thanks!

Related

upload a csv to aws s3

def store_to_10_rss(tempDF):
s3 = boto3.resource('s3')
try:
s3.Object('abcData', 'cr/working-files/unstructured_data/news-links/10_rss.csv').load()
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print('\nThe object does not exist.')
tempDF.to_csv('10_rss.csv')
s3.Object('abcData/cr/working-files/unstructured_data/news-links/', '10_rss.csv').upload_file(Filename='10_rss.csv')
else:
print('\nSomething else has gone wrong.')
raise
else:
print('\nThe object does exist.')
In my S3, there are multiple buckets. I want to go to abcData bucket. Inside this bucket I want to go to cr folder, then to -->working-files folder, then to-->unstructured_data folder, then to-->news-links folder. In the news-links folder I want to check if a csv '10_rss' exists or not. If it doesn't exist, i want to store the passed dataframe ' tempDF' in a csv and upload this file to the path 'abcData/cr/working-files/unstructured_data/news-links/'.
But this gives me an error :
ParamValidationError: Parameter validation failed:
Invalid bucket name "sgfr01data/credit-memorandum/working-files/unstructured_data/news-links/": Bucket name must match the regex "^[a-zA-Z0-9.-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).:s3:[a-z-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9-]{1,63}$|^arn:(aws).:s3-outposts:[a-z-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9-]{1,63}$"
Any help would be appreciated!
Please correct the syntax of AWS S3 upload_file. Refer these file upload examples from AWS. Also, you can refer AWS boto3 API guide for more details about same.
In-short, in your current error and sample code, you used combination of AWS S3 bucket keys or AWS S3 folders along with with AWS S3 bucket name, which should not be case for upload_file boto3 api. AWS S3 bucket name i.e. sgfr01data must be separate parameter.

Move all files in s3 bucket from s3 account to another using boto3

I'm trying to move the contents of a bucket from account-a to a bucket in account-b which I already have the credentials for both of them.
Here's the code I'm currently using:
import boto3
SRC_AWS_KEY = 'src-key'
SRC_AWS_SECRET = 'src-secret'
DST_AWS_KEY = 'dst-key'
DST_AWS_SECRET = 'dst-secret'
srcSession = boto3.session.Session(
aws_access_key_id=SRC_AWS_KEY,
aws_secret_access_key=SRC_AWS_SECRET
)
dstSession = boto3.session.Session(
aws_access_key_id=DST_AWS_KEY,
aws_secret_access_key=DST_AWS_SECRET
)
copySource = {
'Bucket': 'src-bucket',
'Key': 'test-bulk-src'
}
srcS3 = srcSession.resource('s3')
dstS3 = dstSession.resource('s3')
dstS3.meta.client.copy(CopySource=copySource, Bucket='dst-bucket', Key='test-bulk-dst', SourceClient=srcS3.meta.client)
print('success')
The problem is that when I specify a file's name in the field Key followed by /file.csv it works really fine, but when I set it to copy the whole folder, as showed in the code, it fails and throws this exception:
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
What I need to do is to move the contents in one call, not by iterating through the contents of the src-folder, because this is time/money consuming, as I may have thousands of files to be moved.
There is no API call in Amazon S3 to copy folders. (Folders do not actually exist — the Key of each object includes its full path.)
You will need to iterate through each object and copy it.
The AWS CLI (written in Python) provides some higher-level commands that will do this iteration for you:
aws s3 cp --recursive s3://source-bucket/folder/ s3://destination-bucket/folder/
If the buckets are in different accounts, I would recommend:
Use a set of credentials for the destination account (avoids problems with object ownership)
Modify the bucket policy on the source bucket to permit access by the credentials from the destination account (avoids the need to use two sets of credentials)

Download Files From AWS S3 Without Access and Secret Keys in Python

I have a working code to download files from one of my buckets in S3 and does some conversion work through in Python. I do not embed the Access and Secret Keys in the code but the keys are in my AWS CLI configuration.
import boto3
import botocore
BUCKET_NAME = 'converted-parquet-bucket' # replace with your own bucket name
KEY = 'json-to-parquet/names.snappy.parquet' # replace with path and follow with key object
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'names.snappy.parquet') # replace the key object name
except botocore.exceptions.ClientError as e: # exception handling
if e.response['Error']['Code'] == "404":
print("The object does not exist.") # if object that you are looking for does not exist it will print this
else:
raise
# Un comment lines 21 and 22 to convert csv to parquet
# dataframe = pandas.read_csv('names.csv')
# dataframe.to_parquet('names.snappy.parquet' ,engine='auto', compression='snappy')
data = pq.read_pandas('names.snappy.parquet', columns=['Year of Birth', 'Gender', 'Ethnicity', "Child's First Name", 'Count', 'Rank']).to_pandas()
#print(data) # this code will print the ALL the data in the parquet file
print(data.loc[data['Gender'] == 'MALE']) # this code will print the data in the parquet file ONLY what is in the query (SQL query)
Could someone help me how to get this code working without having access and secret keys embedded in the code or in AWS configure
If you are running your function locally, you need to have your credentials on your local credentials/config file to interact with AWS resources.
One alternative would be to run on AWS Lambda (if your function runs periodically, you can set that up with CloudWatch Events) and use Environment Variables or AWS Security Token Service (STS) to generate temporary credentials.
If you do not want to use secret/access key, you should use roles and policies, then. Here's the deal:
Define a role (ex. RoleWithAccess) and be sure that your user (defined in your credentials) can assume this role
Set a policy for RoleWithAccess, giving read/write access to your buckets
If you are executing it in your local machine, run the necessary commands (AWS CLI) to create a profile that makes you assume RoleWithAccess (ex. ProfileWithAccess)
Execute your script using a session passing this profile as the argument, what means you need to replace:
s3 = boto3.resource('s3')
with
session = boto3.session.Session(profile_name='ProfileWithAccess')
s3 = session.resource('s3')
The upside of this approach is that if you are running it inside an EC2 instance, you can tie your instance to a specific role when you build it (ex. RoleWithAccess). In that case, you can completely ignore session, profile, all the AWS CLI hocus pocus, and just run s3 = boto3.resource('s3').
You can also use AWS Lambda, setting a role and a policy with read/write permission to your bucket.

Error while Downloading file to my local device from S3

I am trying to download a file from Amazon S3 bucket to my local device using the below code but I got an error saying "Unable to locate credentials"
Given below is the code I have written:
import boto3
import botocore
BUCKET_NAME = 'my-bucket'
KEY = 'my_image_in_s3.jpg'
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
Could anyone help me on this. Thanks in advance.
AWS use a shared credentials system for AWS CLI and all other AWS SDKs this way there is no risk of leaking your AWS credentials to some code repository, AWS security practices recommend to use a shared credentials file which is located usually on linux
~/.aws/credentials
this file contains an access key and secret key which is used by all sdk and aws cli the file the file can be created manually or automatically using this command
aws configure
it will ask few questions and create the credentials file for you, note that you need to create a user with appropiate permissions before accessing aws resources.
For more information click on the link below -:
AWS cli configuration
You are not using the session you created to download the file, you're using s3 client you created. If you want to use the client you need to specify credentials.
your_bucket.download_file('k.png', '/Users/username/Desktop/k.png')
or
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
s3.download_file('your_bucket','k.png','/Users/username/Desktop/k.png')

Python Boto3 AWS Multipart Upload Syntax

I am successfully authenticating with AWS and using the 'put_object' method on the Bucket object to upload a file. Now I want to use the multipart API to accomplish this for large files. I found the accepted answer in this question:
How to save S3 object to a file using boto3
But when trying to implement I am getting "unknown method" errors. What am I doing wrong? My code is below. Thanks!
## Get an AWS Session
self.awsSession = Session(aws_access_key_id=accessKey,
aws_secret_access_key=secretKey,
aws_session_token=session_token,
region_name=region_type)
...
# Upload the file to S3
s3 = self.awsSession.resource('s3')
s3.Bucket('prodbucket').put_object(Key=fileToUpload, Body=data) # WORKS
#s3.Bucket('prodbucket').upload_file(dataFileName, 'prodbucket', fileToUpload) # DOESNT WORK
#s3.upload_file(dataFileName, 'prodbucket', fileToUpload) # DOESNT WORK
The upload_file method has not been ported over to the bucket resource yet. For now you'll need to use the client object directly to do this:
client = self.awsSession.client('s3')
client.upload_file(...)
Libcloud S3 wrapper transparently handles all the splitting and uploading of the parts for you.
Use upload_object_via_stream method to do so:
from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver
# Path to a very large file you want to upload
FILE_PATH = '/home/user/myfile.tar.gz'
cls = get_driver(Provider.S3)
driver = cls('api key', 'api secret key')
container = driver.get_container(container_name='my-backups-12345')
# This method blocks until all the parts have been uploaded.
extra = {'content_type': 'application/octet-stream'}
with open(FILE_PATH, 'rb') as iterator:
obj = driver.upload_object_via_stream(iterator=iterator,
container=container,
object_name='backup.tar.gz',
extra=extra)
For official documentation on S3 Multipart feature, refer to AWS Official Blog.

Categories

Resources