Docker Python API - Tagging Containers

Docker Python API - Tagging Containers - python

I'm using Docker with AWS ECR repo. One of the steps they instruct you to do is to run "docker tag" to tag a built image with a tag that includes a "fully-ish qualified" location of where the image is going to be stored in ECR.
I was working on migrating a script I had to Python API (instead of doing shell calls to the docker client). I'm unable to find the equivalent of "docker tag" in the API docs at https://docker-py.readthedocs.io/en/stable/images.html.
Can somebody point me in the right direction?

For those of you using ECR/ECS in AWS, here is an example of how you go about this.
Amazon provides instructions like this in ECR to push your images:
aws ecr get-login --no-include-email --region us-west-2
docker build -t myproj .
docker tag calclab:latest XXXXXXXXXX.dkr.ecr.us-west-2.amazonaws.com/myproj:latest
docker push XXXXXXXXXX.dkr.ecr.us-west-2.amazonaws.com/myproj:latest
Here is the rough equivalent using the Docker Python API and Boto (AWS's Python library). It includes tagging the image twice in ECR, so that I can track each image's version number while tracking what the latest is (so my ECS Task can, by default, always grab the most current image)
import docker
import boto3
def ecrDemo(version_number):
# The ECR Repository URI
repo = XXXXXXXXXX.dkr.ecr.us-west-2.amazonaws.com/myproj
# The name of the [profile] stored in .aws/credentials
profile = "sandbox"
# The region your ECR repo is in
region = "us-west-2"
# How you want to tag your project locally
local_tag = "myproj"
#Set up a session
session = boto3.Session(profile_name=profile, region_name=region)
ecr = session.client('ecr')
docker_api = docker.APIClient()
print "Building image " + local_tag
for line in docker_api.build(path='.', tag=local_tag, stream=True, \
dockerfile='./Dockerfile.myproj'):
process_docker_api_line(line)
# Make auth call and parse out results
auth = ecr.get_authorization_token()
token = auth["authorizationData"][0]["authorizationToken"]
username, password = b64decode(token).split(':')
endpoint = auth["authorizationData"][0]["proxyEndpoint"]
# print "Make authentication call"
# docker_api.login(username=user, password=password, \
# registry=endpoint, reauth=True)
auth_config_payload = {'username': username, 'password': password}
version_tag = repo + ':latest'
latest_tag = repo + ':' + version_number
print "Tagging version " + version_tag
if docker_api.tag(local_tag, version_tag) is False:
raise RuntimeError("Tag appeared to fail: " + version_tag)
print "Tagging latest " + latest_tag
if docker_api.tag(local_tag, latest_tag) is False:
raise RuntimeError("Tag appeared to fail: " + tag_latest)
print "Pushing to repo " + version_tag
for line in docker_api.push(version_tag, stream=True, auth_config=auth_config_payload):
self.process_docker_api_line(line)
print "Pushing to repo " + latest_tag
for line in docker_api.push(latest_tag, stream=True, auth_config=auth_config_payload):
self.process_docker_api_line(line)
print "Removing taged deployment images"
# You will still have the local_tag image if you need to troubleshoot
docker_api.remove_image(version_tag, force=True)
docker_api.remove_image(latest_tag, force=True)
def process_docker_api_line(payload):
""" Process the output from API stream, throw an Exception if there is an error """
# Sometimes Docker sends to "{}\n" blocks together...
for segment in payload.split('\n'):
line = segment.strip()
if line:
try:
line_payload = json.loads(line)
except ValueError as ex:
print "Could not decipher payload from API: " + ex.message
if line_payload:
if "errorDetail" in line_payload:
error = line_payload["errorDetail"]
sys.stderr.write(error["message"])
raise RuntimeError("Error on build - code " + `error["code"]`)
elif "stream" in line_payload:
sys.stdout.write(line_payload["stream"])

These are the steps you can use to build and tag an image.
import docker
tag = 'latest' # or whatever you want
client = docker.from_env()
# identifier of the image on your system
dockername = "%s:%s" % (<name of the image on your system>, <tag>)
# the target identifier
target = "%s:%d/%s" % (<registry address>, <registry_port>, <id or name of the image>)
# the url is usually unix://var/run/docker.sock' but depends on your environment
cli = docker.APIClient(base_url="<the daemon\'s url or socket>")
# build the image
cli.build(path=..., tag=dockername, pull=..., buildargs=...)
# tag it
image = client.images.get(dockername)
image.tag(target, tag=tag)

This answer helped me a great deal, thank you!
When I was developing my flow, I was confused by the terminology on the docker-py page, which I think would benefit by more examples.
I was not sure during development if I was correctly building, or if I was having issues with authentication or authorization.
I needed to carefully monitor my build results using the Docker CLI, and capture and analyze the output from the different build, tag and push functions.
Another caveat with getting output from these functions is a warning that is clearly stated for the docker-py pull() function but not the others: if you request that the function provide a generator for output from the operation you must consume that generator. I was able to get my flow working with a debug level of verbosity.
Unfortunately, when I toggled off the verbosity in my code and did not consume the generators for build() and push() (tag() only has a boolean result), my flow only appeared to work: it was not throwing errors, but it also was not building or pushing the code! It is better to either not turn on the streaming output if you're not in debug mode, or leave it on and use deque() to consume the output without processing it.
To summarize the differences in how tags are used:
build() takes a 'local tag', which is just the name of the build, e.g. 'myproj'
tag() applies a 'version tag' to a 'local tag' you just built with build(), the version tag will include the registry and a version label, e.g., myregistry.mydomain.com/myname/myproj:latest
push() will push the image in a 'version tag' to the registry designated in the version tag. So in this case, the image you tagged as myregistry.mydomain.com/myname/myproj:latest will be pushed to the registry myregistry.mydomain.com.

Related

Google artifact registry list_tags in python - contains invalid argument

I'm a code newbie so please be kind :) I've got my google credentials set and I've checked the documentation for the python library for artifact registry (v1), but clearly doing something daft, as I'm receiving this response back - google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument.
Here's my code:
from google.cloud import artifactregistry_v1
# Using Application Default Credentails is easiest
# export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
client = artifactregistry_v1.ArtifactRegistryClient()
# Best obtained from the environment
project = "**REDACTED**"
location = "europe-west2"
repository = "test"
parent = f"projects/{project}/locations/{location}/repositories/{repository}"
request = artifactregistry_v1.ListTagsRequest(parent=parent)
page_result = client.list_tags(request=request)
for response in page_result:
print(response)
Here's the official docs, I can't work out what I've done wrong: https://cloud.google.com/python/docs/reference/artifactregistry/latest/google.cloud.artifactregistry_v1.services.artifact_registry.ArtifactRegistryClient#google_cloud_artifactregistry_v1_services_artifact_registry_ArtifactRegistryClient_list_tags
EDIT
Just seen on the Google docs for the Class ListTagsRequest (here) it says parent expects a string (which is what i've done), but noticed in PyCharm it's highlighted and telling me expected type 'Dict', and got 'str' instead....

I was finally able to find the correct code :
from google.cloud import artifactregistry_v1
def get_package_tags_from_artifact_registry():
# Using Application Default Credentails is easiest
# export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
client = artifactregistry_v1.ArtifactRegistryClient()
# Best obtained from the environment
project = "{project_id}"
location = "{location}"
repository = "{repository}"
package_name = "{package_name}"
parent = f"projects/{project}/locations/{location}/repositories/{repository}/packages/{package_name}"
request = artifactregistry_v1.ListTagsRequest(
parent=parent
)
page_result = client.list_tags(
request=request
)
for response in page_result:
print(response)
return page_result
if __name__ == '__main__':
result = get_package_tags_from_artifact_registry()
I checked from this doc and tested with a parent path including the package name.

Parsing a waf log to get the host ip to prevent dos attack

I'm trying to create a report where if a ip request have attacked the server more then 1000 in one minute it is an dos attack. Aws waf is logging the logs in s3 and using lambda we will check if certain ip crosses the threshold.
import urllib
import boto3
import gzip
s3=boto3.client('s3')
def lambda_handler(event, context):
# Main configuration variables
requests_limit = 100
# Parsing the required information out of the event structure
bucket_name = event['Records'][0]['s3']['bucket']['name']
file_name = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
response = s3.get_object(Bucket=bucket_name, Key=file_name)
target_bucket='blocketrequest'
FILE='Filtered' + file_name
text = response["Body"].read().decode()
e = text.split("\n")
# Parsing IPs out of the access log file
suspicious_ips = {}
for each in e:
try:
loaded_data = json.loads(each)
ip = loaded_data['httpRequest']['clientIp']
if ip in suspicious_ips.keys():
suspicious_ips[ip] += 1
else:
suspicious_ips[ip] = 1
except Exception as err:
print(f"Problem with line:{str(err)}")
break
# Filtering IPs that exceeded the limit and preparing inserts to WAF
updates_list = []
for ip in suspicious_ips.keys():
if suspicious_ips[ip] < requests_limit:
continue
updates_list.append({
'Action': 'INSERT',
'IPSetDescriptor': {
'Type': 'IPV4',
'Value': "%s/32"%ip
}
})
# Exit if there are no malicious IPs
if updates_list == []:
return
s3.put_object(Body=updates_list,Bucket=target_bucket,Key=FILE)
print('transferred')
In this code I'm getting error of Intendention on line 44 can some one help

You can do syntax checks a lot of ways. I love using Visual Studio Code with the python plugin.
You can also ask python to compile you code without running it to check the file.
Python3 shows no error with your file
$ python3 -m py_compile 61327893.py
$
I assume your not using 2.7 but here is the same command.
$ python2.7 -m py_compile 61327893.py
File "61327893.py", line 35
print(f"TotalRecords:{len(e)}")
^
SyntaxError: invalid syntax
Another great non Microsoft option is this online pep8 checker.
http://pep8online.com/
Can you post the stacktrace you are seeing? The error might be in imported code.

This is probably counter-productive but have you looked into Amazon Athena? It allows you to query the log easily in SQL. I think there's Athena SDK for Python as well.

You can use the Rate-based rule provided by AWS WAF.

Boto3 not copying snapshot to other regions, other options?

[Very new to AWS]
Hi,
I am trying to move my EBS volume snapshot copies across regions. I have been trying to use Boto3 to move the snapshots. My objective is to move the latest snapshot from us-east-2 region to us-east-1 region automatically on a daily basis.
I have used aws configure command in terminal to setup my security credentials and set region to us-east-2.
I am using pandas to acquire the most recent snapshot-id using this code:
import boto3
import pandas as pd
from pandas.io.json.normalize import nested_to_record
import boto.ec2
client = boto3.client('ec2')
aws_api_response = client.describe_snapshots(OwnerIds=['self'])
flat = nested_to_record(aws_api_response)
df = pd.DataFrame.from_dict(flat)
df= df['Snapshots'].apply(pd.Series)
insert_snap = df.loc[df['StartTime'] == max(df['StartTime']),'SnapshotId']
insert_snap = insert_snap.reset_index(drop=True)
insert_snap returns a snapshot id something like snap-1234ABCD
I am try to use this code to move the snap shot from us-east-2 to us-east-1:
client.copy_snapshot(SourceSnapshotId='%s' %insert_snap[0],
SourceRegion='us-east-2',
DestinationRegion='us-east-1',
Description='This is my copied snapshot.')
The snapshot is copying in the same region using the above line.
I have also tried switching regions through aws configure command in terminal, with the same issue occurring where snapshot is being copied in the same region.
There is a bug in Boto3 that is skipping the destination parameter in the copy_snapshot() code. Information found here: https://github.com/boto/boto3/issues/886
I have tried inserting this code with into the lambda manager but keep getting error "errorMessage": "Unable to import module 'lambda_function'":
region = 'us-east-2'
ec = boto3.client('ec2',region_name=region)
def lambda_handler(event, context):
response=ec.copy_snapshot(SourceSnapshotId='snap-xxx',
SourceRegion=region,
DestinationRegion='us-east-1',
Description='copied from Ohio')
print (response)
I am out of options, what I can do to automate the transfer of snapshots in aws?

As per CopySnapshot - Amazon Elastic Compute Cloud:
CopySnapshot sends the snapshot copy to the regional endpoint that you send the HTTP request to, such as ec2.us-east-1.amazonaws.com (in the AWS CLI, this is specified with the --region parameter or the default region in your AWS configuration file).
Therefore, you should send the copy_snapshot() command to us-east-1, with the Source Region set to us-east-2.
If you wish to move the most recent snapshot, you could run:
import boto3
SOURCE_REGION = 'us-east-2'
DESTINATION_REGION = 'us-east-1'
# Connect to EC2 in Source region
source_client = boto3.client('ec2', region_name=SOURCE_REGION)
# Get a list of all snapshots, then sort them
snapshots = source_client.describe_snapshots(OwnerIds=['self'])
snapshots_sorted = sorted([(s['SnapshotId'], s['StartTime']) for s in snapshots['Snapshots']], key=lambda k: k[1])
latest_snapshot = snapshots_sorted[-1][0]
print ('Latest Snapshot ID is ' + latest_snapshot)
# Connect to EC2 in Destination region
destination_client = boto3.client('ec2', region_name=DESTINATION_REGION)
# Copy the snapshot
response = destination_client.copy_snapshot(
SourceSnapshotId=latest_snapshot,
SourceRegion=SOURCE_REGION,
Description='This is my copied snapshot'
)
print ('Copied Snapshot ID is ' + response['SnapshotId'])

python - Mount EBS volume using boto3

I want to use AWS Spot instances to train Neural Networks. To prevent loss of the model when the spot instance is terminated, I plan to create a snapshot of the EBS volume, make a new volume and attach it to a reserved instance. How can I mount, or make the EBS volume available using python & boto3.
These are the steps used to make the volume available on Linux, but I want to automate the process so that I don't need to SSH into the instance every time. Here is the code I use to attach the volume -
import boto3
ec2 = boto3.resource('ec2')
spot = ec2.Instance('i-9a8f5082')
res = ec2.Instance('i-86e65a13')
snapshot = ec2.create_snapshot(VolumeId="vol-5315f7db", Description="testing spot instances")
volume = ec2.create_volume(SnapshotId=snapshot.id, AvailabilityZone='us-west-2a')
res.attach_volume(VolumeId="vol-5315f7db", Device='/dev/sdy')
snapshot.delete()

You need to run mount command on instance. 2 way for it. One is the sending command with a ssh connection like #mootmoot wrote. The other one is the sending command with AWS SSM service like #Mark B wrote. Here is the detailed SSM solution sample, you can ignore unnecessary parts for you:
Send bash command to instances using AWS SSM:
# Amazon EC2 Systems Manager requires
# 1. An IAM role for EC2 instances that will process commands. There should be a system manager role and the instance should use this role ! (Did it while creation instance)
# 2. And a separate role for users executing commands. Aws IAM user that has access and secret keys should have ssm permission. (i.e. AmazonSSMFullAccess)
# http://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-configuring-access-policies.html
def execute_commands_on_linux_instances(commands, instance_ids):
client = boto3.client('ssm', **conn_args) # Need your credentials here
all_ssm_enabled_instances, ssm_enabled_instances, not_worked_instances = [],[],[]
not_worked_instances = instance_ids.copy()
all_ssm_enabled_instances = list()
outputs = list({})
not_executed = list()
# Select only the Instances that have an active ssm agent.
if len(client.describe_instance_information()['InstanceInformationList']) > 0:
resp = client.describe_instance_information(MaxResults=20)['InstanceInformationList']
for ins in resp:
all_ssm_enabled_instances.append(ins['InstanceId'])
ssm_enabled_instances = list(set(all_ssm_enabled_instances).intersection(instance_ids))
not_worked_instances = list(set(instance_ids).difference(all_ssm_enabled_instances))
# Now, send the command !
resp = client.send_command(
DocumentName="AWS-RunShellScript",
Parameters={'commands': [commands]},
InstanceIds=ssm_enabled_instances,
)
# get the command id generated by the send_command
com_id = resp['Command']['CommandId']
# Wait until all the commands status are out of Pending and InProgress
list_comm = client.list_commands( CommandId=com_id)
while True:
list_comm = client.list_commands( CommandId=com_id)
if (list_comm['Commands'][0]['Status'] == 'Pending'or list_comm['Commands'][0]['Status'] == 'InProgress'):
continue
else:
# Commands on all Instances were executed
break
# Get the responses the instances gave to this command. (stdoutput and stderror)
# Althoug the command could arrive to instance, if it couldn't be executed by the instance (response -1) it will ignore.
for i in ssm_enabled_instances:
resp2 = client.get_command_invocation(CommandId=com_id, InstanceId=i)
if resp2['ResponseCode'] == -1:
not_executed.append(i)
else:
outputs.append({'ins_id': i, 'stdout': resp2['StandardOutputContent'],
'stderr': resp2['StandardErrorContent']})
# Remove the instance that couldn't execute the command ever, add it to not_worked_instances
ssm_enabled_instances = list(set(ssm_enabled_instances).difference(not_executed))
not_worked_instances.extend(not_executed)
return ssm_enabled_instances, not_worked_instances, outputs
else:
print("There is no any available instance that has a worked SSM service!")
return ssm_enabled_instances, not_worked_instances, outputs
Create Instances with required IAM Instance profile that has required role that has required policy. As a result of this instance creation, instances have running SSM agents:
def create_ec2_instance(node_type):
# define userdata to be run at instance launch
userdata = """#cloud-config
runcmd:
- cd /tmp
- sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
"""
ec2_r = boto3.resource('ec2', **conn_args)
rolename = "amazonec2ssmrole"
i_pro_name = "ins_pro_for_ssm"
# Create an iam instance profile and add required role to this instance profile.
# Create a role and attach a policy to it if not exist.
# Instances will have this role to build ssm (ec2 systems manager) connection.
iam = boto3.resource('iam', **conn_args)
try:
response= iam.meta.client.get_instance_profile(InstanceProfileName=i_pro_name)
except:
iam.create_instance_profile(InstanceProfileName=i_pro_name)
try:
response = iam.meta.client.get_role(RoleName=rolename)
except:
iam.create_role(
AssumeRolePolicyDocument='{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":["ec2.amazonaws.com"]},"Action":["sts:AssumeRole"]}]}',
RoleName=rolename)
role = iam.Role(rolename)
role.attach_policy(PolicyArn='arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM')
iam.meta.client.add_role_to_instance_profile(InstanceProfileName=i_pro_name, RoleName=rolename)
iam_ins_profile = {'Name': i_pro_name}
if node_type == "Medium":
instance = ec2_r.create_instances(
ImageId='ami-aa5ebdd2',
MinCount=1,
MaxCount=1,
UserData=userdata,
InstanceType='t2.medium',
KeyName=key_pair_name,
IamInstanceProfile=iam_ins_profile,
BlockDeviceMappings=[{"DeviceName": "/dev/xvda", "Ebs": {"VolumeSize": 20}}])
elif node_type == "Micro":
instance = ec2_r.create_instances(
ImageId='ami-aa5ebdd2',
MinCount=1,
MaxCount=1,
UserData=userdata,
InstanceType='t2.micro',
KeyName=key_pair_name,
IamInstanceProfile=iam_ins_profile,
BlockDeviceMappings=[{"DeviceName": "/dev/xvda", "Ebs": {"VolumeSize": 10}}])
else:
print("Node Type Error")
return -1
# Wait for the instance state, default --> one wait is 15 seconds, 40 attempts
print('Waiting for instance {0} to switch to running state'.format(instance[0].id))
waiter = ec2_r.meta.client.get_waiter('instance_running')
waiter.wait(InstanceIds=[instance[0].id])
instance[0].reload()
print('Instance is running, public IP: {0}'.format(instance[0].public_ip_address))
return instance[0].id
Don't forget giving ssm permission. (i.e. AmazonSSMFullAccess) to the Aws IAM user that has access and secret keys.
By the way, conn_args can be defined as follows:
conn_args = {
'aws_access_key_id': Your_Access_Key,
'aws_secret_access_key': Your_Secret_Key,
'region_name': 'us-west-2'
}

You have to perform those steps in the operating system. You can't perform those steps via the AWS API (Boto3). Your best bet is to script those steps and then kick off the script somehow via Boto3, possibly using the AWS SSM service.

What's wrong with sending and execute ssh script remotely? Assume you are using ubuntu , i.e.
ssh -i your.pem ubuntu#ec2_name_or_ip 'sudo bash -s' < mount_script.sh
If you attach tag to those resources, you can later use boto3 to inquired the resources by universal tag name, instead tied to the specific static id.

Getting started with secure AWS CloudFront streaming with Python

I have created a S3 bucket, uploaded a video, created a streaming distribution in CloudFront. Tested it with a static HTML player and it works. I have created a keypair through the account settings. I have the private key file sitting on my desktop at the moment. That's where I am.
My aim is to get to a point where my Django/Python site creates secure URLs and people can't access the videos unless they've come from one of my pages. The problem is I'm allergic to the way Amazon have laid things out and I'm just getting more and more confused.
I realise this isn't going to be the best question on StackOverflow but I'm certain I can't be the only fool out here that can't make heads or tails out of how to set up a secure CloudFront/S3 situation. I would really appreciate your help and am willing (once two days has passed) give a 500pt bounty to the best answer.
I have several questions that, once answered, should fit into one explanation of how to accomplish what I'm after:
In the documentation (there's an example in the next point) there's lots of XML lying around telling me I need to POST things to various places. Is there an online console for doing this? Or do I literally have to force this up via cURL (et al)?
How do I create a Origin Access Identity for CloudFront and bind it to my distribution? I've read this document but, per the first point, don't know what to do with it. How does my keypair fit into this?
Once that's done, how do I limit the S3 bucket to only allow people to download things through that identity? If this is another XML jobby rather than clicking around the web UI, please tell me where and how I'm supposed to get this into my account.
In Python, what's the easiest way of generating an expiring URL for a file. I have boto installed but I don't see how to get a file from a streaming distribution.
Are there are any applications or scripts that can take the difficulty of setting this garb up? I use Ubuntu (Linux) but I have XP in a VM if it's Windows-only. I've already looked at CloudBerry S3 Explorer Pro - but it makes about as much sense as the online UI.

You're right, it takes a lot of API work to get this set up. I hope they get it in the AWS Console soon!
UPDATE: I have submitted this code to boto - as of boto v2.1 (released 2011-10-27) this gets much easier. For boto < 2.1, use the instructions here. For boto 2.1 or greater, get the updated instructions on my blog: http://www.secretmike.com/2011/10/aws-cloudfront-secure-streaming.html Once boto v2.1 gets packaged by more distros I'll update the answer here.
To accomplish what you want you need to perform the following steps which I will detail below:
Create your s3 bucket and upload some objects (you've already done this)
Create a Cloudfront "Origin Access Identity" (basically an AWS account to allow cloudfront to access your s3 bucket)
Modify the ACLs on your objects so that only your Cloudfront Origin Access Identity is allowed to read them (this prevents people from bypassing Cloudfront and going direct to s3)
Create a cloudfront distribution with basic URLs and one which requires signed URLs
Test that you can download objects from basic cloudfront distribution but not from s3 or the signed cloudfront distribution
Create a key pair for signing URLs
Generate some URLs using Python
Test that the signed URLs work
1 - Create Bucket and upload object
The easiest way to do this is through the AWS Console but for completeness I'll show how using boto. Boto code is shown here:
import boto
#credentials stored in environment AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
s3 = boto.connect_s3()
#bucket name MUST follow dns guidelines
new_bucket_name = "stream.example.com"
bucket = s3.create_bucket(new_bucket_name)
object_name = "video.mp4"
key = bucket.new_key(object_name)
key.set_contents_from_filename(object_name)
2 - Create a Cloudfront "Origin Access Identity"
For now, this step can only be performed using the API. Boto code is here:
import boto
#credentials stored in environment AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
cf = boto.connect_cloudfront()
oai = cf.create_origin_access_identity(comment='New identity for secure videos')
#We need the following two values for later steps:
print("Origin Access Identity ID: %s" % oai.id)
print("Origin Access Identity S3CanonicalUserId: %s" % oai.s3_user_id)
3 - Modify the ACLs on your objects
Now that we've got our special S3 user account (the S3CanonicalUserId we created above) we need to give it access to our s3 objects. We can do this easily using the AWS Console by opening the object's (not the bucket's!) Permissions tab, click the "Add more permissions" button, and pasting the very long S3CanonicalUserId we got above into the "Grantee" field of a new. Make sure you give the new permission "Open/Download" rights.
You can also do this in code using the following boto script:
import boto
#credentials stored in environment AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
s3 = boto.connect_s3()
bucket_name = "stream.example.com"
bucket = s3.get_bucket(bucket_name)
object_name = "video.mp4"
key = bucket.get_key(object_name)
#Now add read permission to our new s3 account
s3_canonical_user_id = "<your S3CanonicalUserID from above>"
key.add_user_grant("READ", s3_canonical_user_id)
4 - Create a cloudfront distribution
Note that custom origins and private distributions are not fully supported in boto until version 2.0 which has not been formally released at time of writing. The code below pulls out some code from the boto 2.0 branch and hacks it together to get it going but it's not pretty. The 2.0 branch handles this much more elegantly - definitely use that if possible!
import boto
from boto.cloudfront.distribution import DistributionConfig
from boto.cloudfront.exception import CloudFrontServerError
import re
def get_domain_from_xml(xml):
results = re.findall("<DomainName>([^<]+)</DomainName>", xml)
return results[0]
#custom class to hack this until boto v2.0 is released
class HackedStreamingDistributionConfig(DistributionConfig):
def __init__(self, connection=None, origin='', enabled=False,
caller_reference='', cnames=None, comment='',
trusted_signers=None):
DistributionConfig.__init__(self, connection=connection,
origin=origin, enabled=enabled,
caller_reference=caller_reference,
cnames=cnames, comment=comment,
trusted_signers=trusted_signers)
#override the to_xml() function
def to_xml(self):
s = '<?xml version="1.0" encoding="UTF-8"?>\n'
s += '<StreamingDistributionConfig xmlns="http://cloudfront.amazonaws.com/doc/2010-07-15/">\n'
s += ' <S3Origin>\n'
s += ' <DNSName>%s</DNSName>\n' % self.origin
if self.origin_access_identity:
val = self.origin_access_identity
s += ' <OriginAccessIdentity>origin-access-identity/cloudfront/%s</OriginAccessIdentity>\n' % val
s += ' </S3Origin>\n'
s += ' <CallerReference>%s</CallerReference>\n' % self.caller_reference
for cname in self.cnames:
s += ' <CNAME>%s</CNAME>\n' % cname
if self.comment:
s += ' <Comment>%s</Comment>\n' % self.comment
s += ' <Enabled>'
if self.enabled:
s += 'true'
else:
s += 'false'
s += '</Enabled>\n'
if self.trusted_signers:
s += '<TrustedSigners>\n'
for signer in self.trusted_signers:
if signer == 'Self':
s += ' <Self/>\n'
else:
s += ' <AwsAccountNumber>%s</AwsAccountNumber>\n' % signer
s += '</TrustedSigners>\n'
if self.logging:
s += '<Logging>\n'
s += ' <Bucket>%s</Bucket>\n' % self.logging.bucket
s += ' <Prefix>%s</Prefix>\n' % self.logging.prefix
s += '</Logging>\n'
s += '</StreamingDistributionConfig>\n'
return s
def create(self):
response = self.connection.make_request('POST',
'/%s/%s' % ("2010-11-01", "streaming-distribution"),
{'Content-Type' : 'text/xml'},
data=self.to_xml())
body = response.read()
if response.status == 201:
return body
else:
raise CloudFrontServerError(response.status, response.reason, body)
cf = boto.connect_cloudfront()
s3_dns_name = "stream.example.com.s3.amazonaws.com"
comment = "example streaming distribution"
oai = "<OAI ID from step 2 above like E23KRHS6GDUF5L>"
#Create a distribution that does NOT need signed URLS
hsd = HackedStreamingDistributionConfig(connection=cf, origin=s3_dns_name, comment=comment, enabled=True)
hsd.origin_access_identity = oai
basic_dist = hsd.create()
print("Distribution with basic URLs: %s" % get_domain_from_xml(basic_dist))
#Create a distribution that DOES need signed URLS
hsd = HackedStreamingDistributionConfig(connection=cf, origin=s3_dns_name, comment=comment, enabled=True)
hsd.origin_access_identity = oai
#Add some required signers (Self means your own account)
hsd.trusted_signers = ['Self']
signed_dist = hsd.create()
print("Distribution with signed URLs: %s" % get_domain_from_xml(signed_dist))
5 - Test that you can download objects from cloudfront but not from s3
You should now be able to verify:
stream.example.com.s3.amazonaws.com/video.mp4 - should give AccessDenied
signed_distribution.cloudfront.net/video.mp4 - should give MissingKey (because the URL is not signed)
basic_distribution.cloudfront.net/video.mp4 - should work fine
The tests will have to be adjusted to work with your stream player, but the basic idea is that only the basic cloudfront url should work.
6 - Create a keypair for CloudFront
I think the only way to do this is through Amazon's web site. Go into your AWS "Account" page and click on the "Security Credentials" link. Click on the "Key Pairs" tab then click "Create a New Key Pair". This will generate a new key pair for you and automatically download a private key file (pk-xxxxxxxxx.pem). Keep the key file safe and private. Also note down the "Key Pair ID" from amazon as we will need it in the next step.
7 - Generate some URLs in Python
As of boto version 2.0 there does not seem to be any support for generating signed CloudFront URLs. Python does not include RSA encryption routines in the standard library so we will have to use an additional library. I've used M2Crypto in this example.
For a non-streaming distribution, you must use the full cloudfront URL as the resource, however for streaming we only use the object name of the video file. See the code below for a full example of generating a URL which only lasts for 5 minutes.
This code is based loosely on the PHP example code provided by Amazon in the CloudFront documentation.
from M2Crypto import EVP
import base64
import time
def aws_url_base64_encode(msg):
msg_base64 = base64.b64encode(msg)
msg_base64 = msg_base64.replace('+', '-')
msg_base64 = msg_base64.replace('=', '_')
msg_base64 = msg_base64.replace('/', '~')
return msg_base64
def sign_string(message, priv_key_string):
key = EVP.load_key_string(priv_key_string)
key.reset_context(md='sha1')
key.sign_init()
key.sign_update(str(message))
signature = key.sign_final()
return signature
def create_url(url, encoded_signature, key_pair_id, expires):
signed_url = "%(url)s?Expires=%(expires)s&Signature=%(encoded_signature)s&Key-Pair-Id=%(key_pair_id)s" % {
'url':url,
'expires':expires,
'encoded_signature':encoded_signature,
'key_pair_id':key_pair_id,
}
return signed_url
def get_canned_policy_url(url, priv_key_string, key_pair_id, expires):
#we manually construct this policy string to ensure formatting matches signature
canned_policy = '{"Statement":[{"Resource":"%(url)s","Condition":{"DateLessThan":{"AWS:EpochTime":%(expires)s}}}]}' % {'url':url, 'expires':expires}
#now base64 encode it (must be URL safe)
encoded_policy = aws_url_base64_encode(canned_policy)
#sign the non-encoded policy
signature = sign_string(canned_policy, priv_key_string)
#now base64 encode the signature (URL safe as well)
encoded_signature = aws_url_base64_encode(signature)
#combine these into a full url
signed_url = create_url(url, encoded_signature, key_pair_id, expires);
return signed_url
def encode_query_param(resource):
enc = resource
enc = enc.replace('?', '%3F')
enc = enc.replace('=', '%3D')
enc = enc.replace('&', '%26')
return enc
#Set parameters for URL
key_pair_id = "APKAIAZCZRKVIO4BQ" #from the AWS accounts page
priv_key_file = "cloudfront-pk.pem" #your private keypair file
resource = 'video.mp4' #your resource (just object name for streaming videos)
expires = int(time.time()) + 300 #5 min
#Create the signed URL
priv_key_string = open(priv_key_file).read()
signed_url = get_canned_policy_url(resource, priv_key_string, key_pair_id, expires)
#Flash player doesn't like query params so encode them
enc_url = encode_query_param(signed_url)
print(enc_url)
8 - Try out the URLs
Hopefully you should now have a working URL which looks something like this:
video.mp4%3FExpires%3D1309979985%26Signature%3DMUNF7pw1689FhMeSN6JzQmWNVxcaIE9mk1x~KOudJky7anTuX0oAgL~1GW-ON6Zh5NFLBoocX3fUhmC9FusAHtJUzWyJVZLzYT9iLyoyfWMsm2ylCDBqpy5IynFbi8CUajd~CjYdxZBWpxTsPO3yIFNJI~R2AFpWx8qp3fs38Yw_%26Key-Pair-Id%3DAPKAIAZRKVIO4BQ
Put this into your js and you should have something which looks like this (from the PHP example in Amazon's CloudFront documentation):
var so_canned = new SWFObject('http://location.domname.com/~jvngkhow/player.swf','mpl','640','360','9');
so_canned.addParam('allowfullscreen','true');
so_canned.addParam('allowscriptaccess','always');
so_canned.addParam('wmode','opaque');
so_canned.addVariable('file','video.mp4%3FExpires%3D1309979985%26Signature%3DMUNF7pw1689FhMeSN6JzQmWNVxcaIE9mk1x~KOudJky7anTuX0oAgL~1GW-ON6Zh5NFLBoocX3fUhmC9FusAHtJUzWyJVZLzYT9iLyoyfWMsm2ylCDBqpy5IynFbi8CUajd~CjYdxZBWpxTsPO3yIFNJI~R2AFpWx8qp3fs38Yw_%26Key-Pair-Id%3DAPKAIAZRKVIO4BQ');
so_canned.addVariable('streamer','rtmp://s3nzpoyjpct.cloudfront.net/cfx/st');
so_canned.write('canned');
Summary
As you can see, not very easy! boto v2 will help a lot setting up the distribution. I will find out if it's possible to get some URL generation code in there as well to improve this great library!

In Python, what's the easiest way of generating an expiring URL for a file. I have boto installed but I don't see how to get a file from a streaming distribution.
You can generate a expiring signed-URL for the resource. Boto3 documentation has a nice example solution for that:
import datetime
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.primitives.asymmetric import padding
from botocore.signers import CloudFrontSigner
def rsa_signer(message):
with open('path/to/key.pem', 'rb') as key_file:
private_key = serialization.load_pem_private_key(
key_file.read(),
password=None,
backend=default_backend()
)
signer = private_key.signer(padding.PKCS1v15(), hashes.SHA1())
signer.update(message)
return signer.finalize()
key_id = 'AKIAIOSFODNN7EXAMPLE'
url = 'http://d2949o5mkkp72v.cloudfront.net/hello.txt'
expire_date = datetime.datetime(2017, 1, 1)
cloudfront_signer = CloudFrontSigner(key_id, rsa_signer)
# Create a signed url that will be valid until the specfic expiry date
# provided using a canned policy.
signed_url = cloudfront_signer.generate_presigned_url(
url, date_less_than=expire_date)
print(signed_url)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.