I have a process that is supposed to run forever and needs to updates data on a S3 bucket on AWS. I am initializing the session using boto3:
session = boto3.session.Session()
my_s3 = session.resource(
"s3",
region_name=my_region_name,
aws_access_key_id=my_aws_access_key_id,
aws_secret_access_key=my_aws_secret_access_key,
aws_session_token=my_aws_session_token,
)
Since the process is supposed to run for days, I am wondering how I can make sure that the session is kept alive and working. Do I need to re-initialize the session sometimes?
Note: not sure if it is useful, but I have actually multiple threads each using its own session.
Thanks!
There is no concept of a 'session'. The Session() is simply an in-memory object that contains information about how to connect to AWS.
It does not actually involve any calls to AWS until an action is performed (eg ListBuckets). Actions are RESTful, and return results immediately. They do not keep open a connection.
A Session is not normally required. If you have stored your AWS credentials in a config file using the AWS CLI aws configure command, you can simply use:
import boto3
s3_resource = boto3.resource('s3')
The Session, however, is useful if you are using temporary credentials returned by an AssumeRole() command, rather than permanent credentials. In such a case, please note that credentials returned by AWS Security Token Service (STS) such as AssumeRole() have time limitations. This, however, is unrelated to the concept of a boto3 Session.
Related
I'm using the boto3 library to put objects in Amazon S3. I want to make a python service on my server, which is connected to the bucket in AWS, and whenever I send it a file path, it puts that in the bucket:
s3_resource = boto3.resource(
's3',
endpoint_url='...',
aws_access_key_id='...',
aws_secret_access_key='...'
)
bucket = s3_resource.Bucket('name')
for uploading, I send my requests to this method:
def upload(path):
bucket.put_object(...)
The connection to the bucket should be persistent so that whenever I call upload method, it quickly puts the object in the bucket and does not need to connect to the bucket every time.
How can I enable long-lived connections on my s3_resource?
Edit
The SDK tries to be an abstraction from the underlying API calls.
Whenever you want to put an object into an S3 bucket, that results in an API call. API calls are sent over the network to AWS and that requires establishing a connection to an AWS server. This connection can be kept open for a longer time, so it doesn't need to be re-established every time you want to make an API call. This helps reduce the network overhead, since establishing connections is relatively costly.
From your perspective these should be implementation details, you shouldn't have to worry about, since the SDK (boto3) takes care of that for you. There are some options to tweak how it handles things, but these are considered advanced options and you should know what you're doing ;-)
The lifecycle of the resources in boto3 is more or less independent from the underlying network connection. The way you will see this impact you, is through higher latencies, when there is no pre-existing connection that can be repurposed.
What you're looking for are the keep-alive options in boto3.
There is two levels on which these can be enabled:
TCP
You can set the tcp_keepalive option in the SDK config, which is set to false by default.
More detail on that can be found in the documentation.
HTTP
For HTTP-Keep alive, there is nothing you need to do explicitly - the underlying library handles that implicitly. There is a common optimization suggestion when using aws-sdk-js to mess with this, but the SDKs behave differently, that's not necessary in Python. There is a long discussion about this in a Github issue.
If you want to set configure the setting explicitly, you can use the event system to do that as this reply suggests:
def set_connection_header(request, operation_name, **kwargs):
request.headers['Connection'] = 'keep-alive'
ddb = boto3.client('dynamodb')
ddb.meta.events.register('request-created.dynamodb', set_connection_header)
I am looking for a way to perform the equivalent of the AWS CLI's method aws configure get varname [--profile profile-name] using boto3 in python. Does anyone know if this possible without either:
Parsing the AWS config file myself
Somehow interacting with the AWS CLI itself from my python script
For more context, I am writing a python cli tool that will interact with AWS APIs using boto3. The python tool uses an AWS session token stored in a profile in the ~/.aws/credentials file. I am using the saml2aws cli to fetch AWS credentials from my company's identity provider, which writes the aws_access_key_id, aws_secret_access_key, aws_session_token, aws_security_token, x_principal_arn, and x_security_token_expires parameters to the ~/.aws/credentials file like so:
[saml]
aws_access_key_id = #REMOVED#
aws_secret_access_key = #REMOVED#
aws_session_token = #REMOVED#
aws_security_token = #REMOVED#
x_principal_arn = arn:aws:sts::000000000123:assumed-role/MyAssumedRole
x_security_token_expires = 2019-08-19T15:00:56-06:00
By the nature of my python cli tool, sometimes the tool will execute past the expiration time of the AWS session token, which are enforced to be quite short by my company. I want the python cli tool to check the expiration time before it starts its critical task to verify that it has enough time to complete its task, and if not, alerting the user to refresh their session token.
Using the AWS CLI, I can fetch the expiration time of the AWS session token from the ~/.aws/credentials file using like this:
$ aws configure get x_security_token_expires --profile saml
2019-08-19T15:00:56-06:00
and I am curious if boto3 has a mechanism I was unable to find to do something similar.
As an alternate solution, given an already generated AWS session token, is it possible to fetch the expiration time of it? However, given the lack of answers on questions such as Ways to find out how soon the AWS session expires?, I would guess not.
Since the official AWS CLI is powered by boto3, I was able to dig into the source to find out how aws configure get is implemented. It's possible to read the profile configuration through the botocore Session object. Here is some code to get the config profile and value used in your example:
import botocore.session
# Create an empty botocore session directly
session = botocore.session.Session()
# Get config of desired profile. full_config is a standard python dictionary.
profiles_config = session.full_config.get("profiles", {})
saml_config = profiles_config.get("saml", {})
# Get config value. This will be None if the setting doesn't exist.
saml_security_token_expires = saml_config.get("x_security_token_expires")
I'm using code similar to the above as part of a transparent session cache. It checks for a profile's role_arn so I can identify a cached session to load if one exists and hasn't expired.
As far as the alternate question of knowing how long a given session has before expiring, you are correct in that there is currently no API call that can tell you this. Session expiration is only given when the session is created, either through STS get_session_token or assume_role API calls. You have to hold onto the expiration info yourself after that.
I have a Python script that I want to run and text me a notification if a certain condition is met. I'm using Twilio, so I have a Twilio API token and I want to keep it secret. I have it successfully running locally, and now I'm working on getting it running on an EC2 instance.
Regarding AWS steps, I've created an IAM user with permissions, launched the EC2 instance (and saved the ssh keys), and created some parameters in the AWS SSM Parameter store. Then I ssh'd into the instance and installed boto3. When I try to use boto3 to grab a parameter, I'm unable to locate the credentials:
# test.py
import boto3
ssm = boto3.client('ssm', region_name='us-west-1')
secret = ssm.get_parameter(Name='/test/cli-parameter')
print(secret)
# running the file in the console
>> python test.py
...
raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
I'm pretty sure this means it can't find the credentials that were created when I ran aws configure and it created the .aws/credentials file. I believe the reason for this is because I ran aws configure on my local machine, rather than running it while ssh'd into the instance. I did this to keep my AWS ID and secret key off of my EC2 instance, because I thought I'm supposed to keep that private and not put tokens/keys on my EC2 instance. I think I can solve the issue by running aws configure while ssh'd into my instance, but I want to understand what happens if there's a .aws/credentials file on my actual EC2 instance, and whether or not this is dangerous. I'm just not sure how this is all supposed to be structured, or what is a safe/correct way of running my script and accessing secret variables.
Any insight at all is helpful!
I suspect the answer you're looking for looks something like:
Create an IAM policy which allows access to the SSM parameter (why not use the SecretStore?)
Attach that IAM policy to a role.
Attach the role to your EC2 instance (instance profile)
boto3 will now automatically collect an AWS secret key, etc.. from the meta data service when it needs to talk to the parameter store.
I have a webapp served with apache2 running python-flask in the backend. The app is hosted on Linode and heavily relies on their S3 Object Storage. I'm using boto3 to interact with the S3 storage. My issue is regarding the generate_presigned_url method when used in production. It returns the following structure:
{
'url': 'https://eu-central-1.linodeobjects.com/my-s3-bucket',
'fields': {
'ACL': 'private',
'key': 'foo.bar',
'AWSAccessKeyId': 'FOOBAR',
'policy': 'base64longhash...',
'signature': 'foobar'
}
}
Everytime I use this method on the same python session the policy key returns a longer value (about 1.5x increase in length for every subsequent request). After a few requests the size of the policy gets really large (tens of MB) and the app breaks. If I restart the python service the policy size gets reset.
After digging in the boto3 documentation and some threads in GitHub and here I couldn't find anything that helped me in regards to resetting the S3 connection without having to restart the whole python session. To keep restarting the apache2 service periodically is not a good approach, so my solution was to call the generate_presigned_url from a standalone script using subprocess and parse the string output back to json before using it, which is not ideal, as I wish I didn't have to keep calling bash scripts from inside apache. The main functions I use follow bellow:
AWS_BUCKET_PARAMS = {'ACL': 'private'}
# connect to my linode's s3 bucket
def awsSign():
return boto3.client('s3', aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY, endpoint_url=AWS_ENDPOINT_URL)
# generate presigned post object for uploading files
def awsPostForm(file_path):
s3 = awsSign()
return s3.generate_presigned_post(AWS_BUCKET, file_path, AWS_BUCKET_PARAMS, [AWS_BUCKET_PARAMS], 1800)
# generate post object from external script
def awsPostFormTerminal(file_path):
from subprocess import Popen, PIPE
cmd = [ 'python3', '-c', f'from utils import awsPostForm; print(awsPostForm("{file_path}"))' ]
output = Popen( cmd, stdout=PIPE ).communicate()[0]
return json.loads(output.decode('utf-8').replace('\n', '').replace("'", '"'))
The problem happens regardless of calling awsSign() one or many times for a list of files.
In short, I wish for a better way of retrieving subsequent post forms from generate_presigned_url in the same python session, without increasing the policy on every new request. If there is a proper way to restart the boto3 connection, provide some parameters that I missed when setting the API calls or maybe it's something particular to the Linode's S3 object storage service.
If anyone can point me at the right direction I'll appreciate!
Well, turns out it was a rookie mistake - got the hint from the linode's Q&A. So, answering my own question:
turns out the AWS_BUCKET_PARAMS variable was being updated by reference after passing through generate_presigned_post. Copying the global variable inside the function's scope before sending the request solved the issue.
I have my .aws/credentials set as
[default]
aws_access_key_id = [key]
aws_secret_access_key = [secret! Shh!]
and .aws/config
[profile elevated]
role_arn = [elevated role arn]
source_profile = default
mfa_serial = [my device arn]
With the credentials and config files set up like that, boto3 will
automatically make the corresponding AssumeRole calls to AWS STS on your behalf. It will handle in
memory caching as well as refreshing credentials as needed
so that when I use something like
session = boto3.Session(profile_name = "elevated")
in a longer function, all I have to do is input my MFA code immediately after hitting "enter" and everything runs and credentials are managed independent of my input. This is great. I like that when I need to assume a role in another AWS account, boto3 handles all of the calls to sts and all I have to do is babysit.
What about when I don't want to assume another role? If I want to do things directly as my user as a member of the group to which my user is assigned? Is there a way to let boto3 automatically handle the credentials aspect of that?
I see that I can hard-code into a fx my aws_access_key_id and ..._secret_... , but is there a way to force boto3 into handling the session tokens by just using the config and credentials files?
Method 2 in this answer looked promising but it also seems to rely on using the AWS CLI to input and store the keys/session token prior to running a Python script and still requires hard-coding variables into a CLI.
Is there a way to make this automatic by using the config and credentials files that doesn't require having to manually input AWS access keys and handle session tokens?
If you are running the application on EC2, you can attach roles via EC2 Roles.
On your code, you may dynamically get the credentials depending on which role you attach.
session = boto3.
credentials = session.get_credentials().get_frozen_credentials()
access_key = credentials.access_key
secret_key = credentials.secret_key
token = credentials.token
you may also want to use botocore.credentials.RefreshableCredentials to refresh your token once in a while