I am new to AWS S3. I did a lot of googling around how to connect S3 using python and found everyone is using Boto so its what I am using as the client. I used powershell to login and create the .aws/credentials . In that file, I was able to get the aws_access_key_id, aws_secret_access_key AND aws_session_token needed to establish the session. I understand that session is only about 8 hours so the next day when my python script runs to connect to S3 obviously the session is expired. How can i overcome this and how can I establish a new session daily? Below is my code.
s3_client = boto3.client(
"s3",
aws_access_key_id=id_,
aws_secret_access_key=secret,
aws_session_token=token,
region_name='r'
)
# Test it on a service (yours may be different)
# s3 = session.resource('s3')
# Print out bucket names
for bucket in s3.buckets.all():
# print(bucket.name)
bucket = 'automated-reports' # already created on S3
csv_buffer = StringIO()
all_active_scraper_counts_df.to_csv(csv_buffer, index=False)
# s3_resource = boto3.resource('s3')
put_response = s3_client.put_object(Bucket=bucket, Key="all_active_scrapers.csv", Body=csv_buffer.getvalue())
status = put_response.get("ResponseMetadata", {}).get("HTTPStatusCode")
if status == 200:
print(f"Successful S3 put_object response. Status - {status}")
else:
print(f"Unsuccessful S3 put_object response. Status - {status}")
You have three options:
Run a script that updates the credentials before you run your Python script. You can use AWS CLI sts assume-role to get a new set of credentials.
Add a try/catch statement inside your code to handle credentials expired error. Then, generate new credentials and re-initialize the S3 client.
Instead of using a Role, use a User. (IAM Identities). User credentials can be valid forever. You won't need to update the credentials in this case.
Related
I am facing something similar to How to load file from custom hosted Minio s3 bucket into pandas using s3 URL format?
however, I already have an initialized s3 session (from boto3).
How can I get the credentials returned from it to feed these directly to pandas?
I.e. how can I extract the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from the initialized boto3 s3 client?
You can use session.get_credentials
import boto3
session = boto3.Session()
credentials = session.get_credentials()
AWS_ACCESS_KEY_ID = credentials.access_key
AWS_SECRET_ACCESS_KEY = credentials.secret_key
AWS_SESSION_TOKEN = credentials.token
If you only have access to boto client (like the S3 client), you can find the credentials hidden here:
client = boto3.client("s3")
client._request_signer._credentials.access_key
client._request_signer._credentials.secret_key
client._request_signer._credentials.token
If you don't want to handle credentials (I assume you're using the SSO here), you can load the S3 object directly with pandas: pd.read_csv(s3_client.get_object(Bucket='Bucket', Key ='FileName').get('Body'))
I have a working code to download files from one of my buckets in S3 and does some conversion work through in Python. I do not embed the Access and Secret Keys in the code but the keys are in my AWS CLI configuration.
import boto3
import botocore
BUCKET_NAME = 'converted-parquet-bucket' # replace with your own bucket name
KEY = 'json-to-parquet/names.snappy.parquet' # replace with path and follow with key object
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'names.snappy.parquet') # replace the key object name
except botocore.exceptions.ClientError as e: # exception handling
if e.response['Error']['Code'] == "404":
print("The object does not exist.") # if object that you are looking for does not exist it will print this
else:
raise
# Un comment lines 21 and 22 to convert csv to parquet
# dataframe = pandas.read_csv('names.csv')
# dataframe.to_parquet('names.snappy.parquet' ,engine='auto', compression='snappy')
data = pq.read_pandas('names.snappy.parquet', columns=['Year of Birth', 'Gender', 'Ethnicity', "Child's First Name", 'Count', 'Rank']).to_pandas()
#print(data) # this code will print the ALL the data in the parquet file
print(data.loc[data['Gender'] == 'MALE']) # this code will print the data in the parquet file ONLY what is in the query (SQL query)
Could someone help me how to get this code working without having access and secret keys embedded in the code or in AWS configure
If you are running your function locally, you need to have your credentials on your local credentials/config file to interact with AWS resources.
One alternative would be to run on AWS Lambda (if your function runs periodically, you can set that up with CloudWatch Events) and use Environment Variables or AWS Security Token Service (STS) to generate temporary credentials.
If you do not want to use secret/access key, you should use roles and policies, then. Here's the deal:
Define a role (ex. RoleWithAccess) and be sure that your user (defined in your credentials) can assume this role
Set a policy for RoleWithAccess, giving read/write access to your buckets
If you are executing it in your local machine, run the necessary commands (AWS CLI) to create a profile that makes you assume RoleWithAccess (ex. ProfileWithAccess)
Execute your script using a session passing this profile as the argument, what means you need to replace:
s3 = boto3.resource('s3')
with
session = boto3.session.Session(profile_name='ProfileWithAccess')
s3 = session.resource('s3')
The upside of this approach is that if you are running it inside an EC2 instance, you can tie your instance to a specific role when you build it (ex. RoleWithAccess). In that case, you can completely ignore session, profile, all the AWS CLI hocus pocus, and just run s3 = boto3.resource('s3').
You can also use AWS Lambda, setting a role and a policy with read/write permission to your bucket.
I am attempting to pull information about an S3 bucket using boto3. Here is the setup (bucketname is set to a valid S3 bucket name):
import boto3
s3 = boto3.client('s3')
result = s3.get_bucket_acl(Bucket=bucketname)
When I try, I get this error:
ClientError: An error occurred (InvalidRequest) when calling the
GetBucketAcl operation: S3 Transfer Acceleration is not configured on
this bucket
So, I attempt to enable transfer acceleration:
s3.put_bucket_accelerate_configuration(Bucket=bucketname, AccelerateConfiguration={'Status': 'Enabled'})
But, I get this error, which seems silly, since the line above is attempting to configure the bucket. I do have IAM rights (Allow: *) to modify the bucket too:
ClientError: An error occurred (InvalidRequest) when calling the
PutBucketAccelerateConfiguration operation: S3 Transfer Acceleration
is not configured on this bucket
Does anyone have any ideas on what I'm missing here?
Although I borrowed the code in the original question from the boto3 documentation, this construct is not complete and did not provide the connectivity that I expected:
s3 = boto3.client('s3')
What is really needed are fully-initialized session and client handlers, like this (assuming that the profile variable is set correctly in the ~/.aws/config file and bucketname is a valid S3 bucket):
from boto3 import Session
session = Session(profile_name=profile)
client = session.client('s3')
result = client.get_bucket_acl(Bucket=bucketname)
After doing this (duh), I was able to connect with or without transfer acceleration.
Thanks to the commenters, since those comments led me to the solution.
I'm using a container that simulate a S3 server running on http://127.0.0.1:4569 (with no authorization or credentials needed)
and I'm trying to simply connect and print a list of all the bucket names using python and boto3
here's my docker-compose:
s3:
image: andrewgaul/s3proxy
environment:
S3PROXY_AUTHORIZATION: none
hostname: s3
ports:
- 4569:80
volumes:
- ./data/s3:/data
here's my code:
s3 = boto3.resource('s3', endpoint_url='http://127.0.0.1:4569')
for bucket in s3.buckets.all():
print(bucket.name)enter code here
here's the error message that I received:
botocore.exceptions.NoCredentialsError: Unable to locate credentials
I tried this solution => How do you use an HTTP/HTTPS proxy with boto3?
but still not working, I don't understand what I'm doing wrong
First, boto3 always try to handshake with S3 server with AWS API key. Even your simulation server don't need password, you still need to specify them either in your .aws/credentials or inside your program. e.g.
[default]
aws_access_key_id = x
aws_secret_access_key = x
hardcoded dummy access key example
import boto3
session = boto3.session(
aws_access_key_id = 'x',
aws_secret_access_key = 'x')
s3 = session.resource('s3', endpoint_url='http://127.0.0.1:4569')
Second, I don't know how reliable and what kind of protocol is implemented by your "s3 simulation container". To make life easier, I always suggest anyone that wants to simulate S3 load test or whatever to use fake-s3
I am attempting to connect to an S3 bucket (A 3rd party is the owner, so I cannot access through AWS console). Using CyberDuck, I can connect and upload files no problem. However I have tried several libraries to connect to the bucket all of which return a 403 forbidden. I am posting here in hopes that someone can spot what I am doing incorrectly.
def send_to_s3(file_name):
csv = open("/tmp/" + file_name, 'rb')
conn = tinys3.Connection("SECRET",
"SECRET",
tls=True,
endpoint="s3.amazonaws.com")
conn.upload("MDA-Data-Ingest/input/" + file_name, csv, bucket="gsext-69qlakrroehhgr0f47bhffnwct")
def send_via_ftp(file_name):
cnopts = pysftp.CnOpts()
cnopts.hostkeys = None
srv = pysftp.Connection(host="gsext-69qlakrroehhgr0f47bhffnwct.s3.amazonaws.com",
username="SECRET",
password="SECRET",
port=443,
cnopts=cnopts)
with srv.cd('\MDA-Data-Ingest\input'):
srv.put('\\tmp\\'+file_name)
# Closes the connection
srv.close()
def send_via_boto(file_name):
access_key = 'SECRET'
secret_key = 'SECRET'
conn = boto.connect_s3(
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
host='s3.amazonaws.com',
# is_secure=False, # uncomment if you are not using ssl
calling_format=boto.s3.connection.OrdinaryCallingFormat(),
)
All of these functions return a 403 forbidden as shown bellow:
HTTPError: 403 Client Error: Forbidden for url: https://gsext-69qlakrroehhgr0f47bhffnwct.s3.amazonaws.com/MDA-Data-Ingest/input/accounts.csv
However when I use CyberDuck I can connect just fine:
The easiest method would be to use the AWS Command-Line Interface (CLI), which uses boto3 to access AWS services.
For example:
aws s3 ls s3://bucket-name --region us-west-2
aws s3 cp s3://gsext-69qlakrroehhgr0f47bhffnwct/MDA-Data-Ingest/input/accounts.csv accounts.csv
You would first run aws configure to provide your credentials and a default region, but the syntax above allows you to specify the particular region that the bucket is located. (It is possible that your Python code failed due to calling the wrong region.)