I have a working code to download files from one of my buckets in S3 and does some conversion work through in Python. I do not embed the Access and Secret Keys in the code but the keys are in my AWS CLI configuration.
import boto3
import botocore
BUCKET_NAME = 'converted-parquet-bucket' # replace with your own bucket name
KEY = 'json-to-parquet/names.snappy.parquet' # replace with path and follow with key object
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'names.snappy.parquet') # replace the key object name
except botocore.exceptions.ClientError as e: # exception handling
if e.response['Error']['Code'] == "404":
print("The object does not exist.") # if object that you are looking for does not exist it will print this
else:
raise
# Un comment lines 21 and 22 to convert csv to parquet
# dataframe = pandas.read_csv('names.csv')
# dataframe.to_parquet('names.snappy.parquet' ,engine='auto', compression='snappy')
data = pq.read_pandas('names.snappy.parquet', columns=['Year of Birth', 'Gender', 'Ethnicity', "Child's First Name", 'Count', 'Rank']).to_pandas()
#print(data) # this code will print the ALL the data in the parquet file
print(data.loc[data['Gender'] == 'MALE']) # this code will print the data in the parquet file ONLY what is in the query (SQL query)
Could someone help me how to get this code working without having access and secret keys embedded in the code or in AWS configure
If you are running your function locally, you need to have your credentials on your local credentials/config file to interact with AWS resources.
One alternative would be to run on AWS Lambda (if your function runs periodically, you can set that up with CloudWatch Events) and use Environment Variables or AWS Security Token Service (STS) to generate temporary credentials.
If you do not want to use secret/access key, you should use roles and policies, then. Here's the deal:
Define a role (ex. RoleWithAccess) and be sure that your user (defined in your credentials) can assume this role
Set a policy for RoleWithAccess, giving read/write access to your buckets
If you are executing it in your local machine, run the necessary commands (AWS CLI) to create a profile that makes you assume RoleWithAccess (ex. ProfileWithAccess)
Execute your script using a session passing this profile as the argument, what means you need to replace:
s3 = boto3.resource('s3')
with
session = boto3.session.Session(profile_name='ProfileWithAccess')
s3 = session.resource('s3')
The upside of this approach is that if you are running it inside an EC2 instance, you can tie your instance to a specific role when you build it (ex. RoleWithAccess). In that case, you can completely ignore session, profile, all the AWS CLI hocus pocus, and just run s3 = boto3.resource('s3').
You can also use AWS Lambda, setting a role and a policy with read/write permission to your bucket.
Related
I want to upload files to the cloud storage in Wasabi, but I can't. This error comes out:
An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.
I checked the key several times, everything is correct. The strange thing is that before that I tried to create a new basket and everything worked out for me, but I can't upload the files.
import boto3
s3 = boto3.client('s3',
endpoint_url='https://s3.wasabisys.com',
aws_access_key_id="********R2PN",
aws_secret_access_key="*************zDKnnWS")
file_path = r"C:\Users\Asus\Desktop\Programming\rofls_with_node\tracks.txt"
bucket_name = "last-fm9"
key_name = "tracks.txt"
s3.put_object(Body=file_path, Bucket=bucket_name, Key=key_name)
That's it, I solved the problem, I just had to change endpoint_url to "https://s3.us-east-2.wasabisys.com" (instead of us-east-2, insert the region of your basket). Thanl
I'm trying to move the contents of a bucket from account-a to a bucket in account-b which I already have the credentials for both of them.
Here's the code I'm currently using:
import boto3
SRC_AWS_KEY = 'src-key'
SRC_AWS_SECRET = 'src-secret'
DST_AWS_KEY = 'dst-key'
DST_AWS_SECRET = 'dst-secret'
srcSession = boto3.session.Session(
aws_access_key_id=SRC_AWS_KEY,
aws_secret_access_key=SRC_AWS_SECRET
)
dstSession = boto3.session.Session(
aws_access_key_id=DST_AWS_KEY,
aws_secret_access_key=DST_AWS_SECRET
)
copySource = {
'Bucket': 'src-bucket',
'Key': 'test-bulk-src'
}
srcS3 = srcSession.resource('s3')
dstS3 = dstSession.resource('s3')
dstS3.meta.client.copy(CopySource=copySource, Bucket='dst-bucket', Key='test-bulk-dst', SourceClient=srcS3.meta.client)
print('success')
The problem is that when I specify a file's name in the field Key followed by /file.csv it works really fine, but when I set it to copy the whole folder, as showed in the code, it fails and throws this exception:
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
What I need to do is to move the contents in one call, not by iterating through the contents of the src-folder, because this is time/money consuming, as I may have thousands of files to be moved.
There is no API call in Amazon S3 to copy folders. (Folders do not actually exist — the Key of each object includes its full path.)
You will need to iterate through each object and copy it.
The AWS CLI (written in Python) provides some higher-level commands that will do this iteration for you:
aws s3 cp --recursive s3://source-bucket/folder/ s3://destination-bucket/folder/
If the buckets are in different accounts, I would recommend:
Use a set of credentials for the destination account (avoids problems with object ownership)
Modify the bucket policy on the source bucket to permit access by the credentials from the destination account (avoids the need to use two sets of credentials)
I am trying to download a file from Amazon S3 bucket to my local device using the below code but I got an error saying "Unable to locate credentials"
Given below is the code I have written:
import boto3
import botocore
BUCKET_NAME = 'my-bucket'
KEY = 'my_image_in_s3.jpg'
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
Could anyone help me on this. Thanks in advance.
AWS use a shared credentials system for AWS CLI and all other AWS SDKs this way there is no risk of leaking your AWS credentials to some code repository, AWS security practices recommend to use a shared credentials file which is located usually on linux
~/.aws/credentials
this file contains an access key and secret key which is used by all sdk and aws cli the file the file can be created manually or automatically using this command
aws configure
it will ask few questions and create the credentials file for you, note that you need to create a user with appropiate permissions before accessing aws resources.
For more information click on the link below -:
AWS cli configuration
You are not using the session you created to download the file, you're using s3 client you created. If you want to use the client you need to specify credentials.
your_bucket.download_file('k.png', '/Users/username/Desktop/k.png')
or
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
s3.download_file('your_bucket','k.png','/Users/username/Desktop/k.png')
I need to download some replay files from an API that has the files stored on an amazon s3 bucket, with requester pays enabled.
The problem is, I set up my amazon AWS account, created an AWSAccessKeyId and AWSSecretKey, but I still can't get to download a single file, since I'm getting an Access denied response.
I want to automate all this inside a Python script, so I've been trying to do this with the boto3 package. Also, I installed the Amazon AWS CLI, and set up my access ID and secret key.
The file I've been trying to download (I want to download multiple ones, but for now I'm trying with just one) is this: http://hotsapi.s3-website-eu-west-1.amazonaws.com/18e8b4df-6dad-e1f5-bfc7-48899e6e6a16.StormReplay
From what I've found so far on SO, I've tried something like this:
import boto3
import botocore
BUCKET_NAME = 'hotsapi' # replace with your bucket name
KEY = '18e8b4df-6dad-e1f5-bfc7-48899e6e6a16.StormReplay' # replace with your object key
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'test.StormReplay')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
And this:
import boto3
s3_client = boto3.Session().client('s3')
response = s3_client.get_object(Bucket='hotsapi',
Key='18e8b4df-6dad-e1f5-bfc7-48899e6e6a16.StormReplay',
RequestPayer='requester')
response_content = response['Body'].read()
with open('./B01.StormReplay', 'wb') as file:
file.write(response_content)
But I still can't manage to download the file.
Any help is welcome! Thanks!
I am trying to upload a folder in my local machine to google cloud bucket. I get an error with the credentials. Where should I be providing the credentials and what all information is needed in it.
from_dest = '/Users/xyzDocuments/tmp'
gsutil_link = 'gs://bucket-1991'
from google.cloud import storage
try:
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print('File {} uploaded to {}.'.format(source_file_name,destination_blob_name))
except Exception as e:
print e
The error is
could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://developers.google.com/accounts/do`cs/application-default-credentials.
You need to acquire the application default credentials for your project and set them as an environmental variable:
Go to the Create service account key page in the GCP Console.
From the Service account drop-down list, select New service account.
Enter a name into the Service account name field.
From the Role drop-down list, select Project > Owner.
Click Create. A JSON file that contains your key downloads to your computer.
Then, set an environmental variable which will provide the application credentials to your application when it runs locally:
$ export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"
This error message is usually thrown when the application is not being authenticated correctly due to several reasons such as missing files, invalid credential paths, incorrect environment variables assignations, among other causes. Keep in mind that when you set an environment variable value in a session, it is reset every time the session is dropped.
Based on this, I recommend you to validate that the credential file and file path are being correctly assigned, as well as follow the Obtaining and providing service account credentials manually guide, in order to explicitly specify your service account file directly into your code; In this way, you will be able to set it permanently and verify if you are passing the service credentials correctly.
Passing the path to the service account key in code example:
def explicit():
from google.cloud import storage
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json('service_account.json')
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)