I have created a S3 bucket, uploaded a video, created a streaming distribution in CloudFront. Tested it with a static HTML player and it works. I have created a keypair through the account settings. I have the private key file sitting on my desktop at the moment. That's where I am.
My aim is to get to a point where my Django/Python site creates secure URLs and people can't access the videos unless they've come from one of my pages. The problem is I'm allergic to the way Amazon have laid things out and I'm just getting more and more confused.
I realise this isn't going to be the best question on StackOverflow but I'm certain I can't be the only fool out here that can't make heads or tails out of how to set up a secure CloudFront/S3 situation. I would really appreciate your help and am willing (once two days has passed) give a 500pt bounty to the best answer.
I have several questions that, once answered, should fit into one explanation of how to accomplish what I'm after:
In the documentation (there's an example in the next point) there's lots of XML lying around telling me I need to POST things to various places. Is there an online console for doing this? Or do I literally have to force this up via cURL (et al)?
How do I create a Origin Access Identity for CloudFront and bind it to my distribution? I've read this document but, per the first point, don't know what to do with it. How does my keypair fit into this?
Once that's done, how do I limit the S3 bucket to only allow people to download things through that identity? If this is another XML jobby rather than clicking around the web UI, please tell me where and how I'm supposed to get this into my account.
In Python, what's the easiest way of generating an expiring URL for a file. I have boto installed but I don't see how to get a file from a streaming distribution.
Are there are any applications or scripts that can take the difficulty of setting this garb up? I use Ubuntu (Linux) but I have XP in a VM if it's Windows-only. I've already looked at CloudBerry S3 Explorer Pro - but it makes about as much sense as the online UI.
You're right, it takes a lot of API work to get this set up. I hope they get it in the AWS Console soon!
UPDATE: I have submitted this code to boto - as of boto v2.1 (released 2011-10-27) this gets much easier. For boto < 2.1, use the instructions here. For boto 2.1 or greater, get the updated instructions on my blog: http://www.secretmike.com/2011/10/aws-cloudfront-secure-streaming.html Once boto v2.1 gets packaged by more distros I'll update the answer here.
To accomplish what you want you need to perform the following steps which I will detail below:
Create your s3 bucket and upload some objects (you've already done this)
Create a Cloudfront "Origin Access Identity" (basically an AWS account to allow cloudfront to access your s3 bucket)
Modify the ACLs on your objects so that only your Cloudfront Origin Access Identity is allowed to read them (this prevents people from bypassing Cloudfront and going direct to s3)
Create a cloudfront distribution with basic URLs and one which requires signed URLs
Test that you can download objects from basic cloudfront distribution but not from s3 or the signed cloudfront distribution
Create a key pair for signing URLs
Generate some URLs using Python
Test that the signed URLs work
1 - Create Bucket and upload object
The easiest way to do this is through the AWS Console but for completeness I'll show how using boto. Boto code is shown here:
import boto
#credentials stored in environment AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
s3 = boto.connect_s3()
#bucket name MUST follow dns guidelines
new_bucket_name = "stream.example.com"
bucket = s3.create_bucket(new_bucket_name)
object_name = "video.mp4"
key = bucket.new_key(object_name)
key.set_contents_from_filename(object_name)
2 - Create a Cloudfront "Origin Access Identity"
For now, this step can only be performed using the API. Boto code is here:
import boto
#credentials stored in environment AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
cf = boto.connect_cloudfront()
oai = cf.create_origin_access_identity(comment='New identity for secure videos')
#We need the following two values for later steps:
print("Origin Access Identity ID: %s" % oai.id)
print("Origin Access Identity S3CanonicalUserId: %s" % oai.s3_user_id)
3 - Modify the ACLs on your objects
Now that we've got our special S3 user account (the S3CanonicalUserId we created above) we need to give it access to our s3 objects. We can do this easily using the AWS Console by opening the object's (not the bucket's!) Permissions tab, click the "Add more permissions" button, and pasting the very long S3CanonicalUserId we got above into the "Grantee" field of a new. Make sure you give the new permission "Open/Download" rights.
You can also do this in code using the following boto script:
import boto
#credentials stored in environment AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
s3 = boto.connect_s3()
bucket_name = "stream.example.com"
bucket = s3.get_bucket(bucket_name)
object_name = "video.mp4"
key = bucket.get_key(object_name)
#Now add read permission to our new s3 account
s3_canonical_user_id = "<your S3CanonicalUserID from above>"
key.add_user_grant("READ", s3_canonical_user_id)
4 - Create a cloudfront distribution
Note that custom origins and private distributions are not fully supported in boto until version 2.0 which has not been formally released at time of writing. The code below pulls out some code from the boto 2.0 branch and hacks it together to get it going but it's not pretty. The 2.0 branch handles this much more elegantly - definitely use that if possible!
import boto
from boto.cloudfront.distribution import DistributionConfig
from boto.cloudfront.exception import CloudFrontServerError
import re
def get_domain_from_xml(xml):
results = re.findall("<DomainName>([^<]+)</DomainName>", xml)
return results[0]
#custom class to hack this until boto v2.0 is released
class HackedStreamingDistributionConfig(DistributionConfig):
def __init__(self, connection=None, origin='', enabled=False,
caller_reference='', cnames=None, comment='',
trusted_signers=None):
DistributionConfig.__init__(self, connection=connection,
origin=origin, enabled=enabled,
caller_reference=caller_reference,
cnames=cnames, comment=comment,
trusted_signers=trusted_signers)
#override the to_xml() function
def to_xml(self):
s = '<?xml version="1.0" encoding="UTF-8"?>\n'
s += '<StreamingDistributionConfig xmlns="http://cloudfront.amazonaws.com/doc/2010-07-15/">\n'
s += ' <S3Origin>\n'
s += ' <DNSName>%s</DNSName>\n' % self.origin
if self.origin_access_identity:
val = self.origin_access_identity
s += ' <OriginAccessIdentity>origin-access-identity/cloudfront/%s</OriginAccessIdentity>\n' % val
s += ' </S3Origin>\n'
s += ' <CallerReference>%s</CallerReference>\n' % self.caller_reference
for cname in self.cnames:
s += ' <CNAME>%s</CNAME>\n' % cname
if self.comment:
s += ' <Comment>%s</Comment>\n' % self.comment
s += ' <Enabled>'
if self.enabled:
s += 'true'
else:
s += 'false'
s += '</Enabled>\n'
if self.trusted_signers:
s += '<TrustedSigners>\n'
for signer in self.trusted_signers:
if signer == 'Self':
s += ' <Self/>\n'
else:
s += ' <AwsAccountNumber>%s</AwsAccountNumber>\n' % signer
s += '</TrustedSigners>\n'
if self.logging:
s += '<Logging>\n'
s += ' <Bucket>%s</Bucket>\n' % self.logging.bucket
s += ' <Prefix>%s</Prefix>\n' % self.logging.prefix
s += '</Logging>\n'
s += '</StreamingDistributionConfig>\n'
return s
def create(self):
response = self.connection.make_request('POST',
'/%s/%s' % ("2010-11-01", "streaming-distribution"),
{'Content-Type' : 'text/xml'},
data=self.to_xml())
body = response.read()
if response.status == 201:
return body
else:
raise CloudFrontServerError(response.status, response.reason, body)
cf = boto.connect_cloudfront()
s3_dns_name = "stream.example.com.s3.amazonaws.com"
comment = "example streaming distribution"
oai = "<OAI ID from step 2 above like E23KRHS6GDUF5L>"
#Create a distribution that does NOT need signed URLS
hsd = HackedStreamingDistributionConfig(connection=cf, origin=s3_dns_name, comment=comment, enabled=True)
hsd.origin_access_identity = oai
basic_dist = hsd.create()
print("Distribution with basic URLs: %s" % get_domain_from_xml(basic_dist))
#Create a distribution that DOES need signed URLS
hsd = HackedStreamingDistributionConfig(connection=cf, origin=s3_dns_name, comment=comment, enabled=True)
hsd.origin_access_identity = oai
#Add some required signers (Self means your own account)
hsd.trusted_signers = ['Self']
signed_dist = hsd.create()
print("Distribution with signed URLs: %s" % get_domain_from_xml(signed_dist))
5 - Test that you can download objects from cloudfront but not from s3
You should now be able to verify:
stream.example.com.s3.amazonaws.com/video.mp4 - should give AccessDenied
signed_distribution.cloudfront.net/video.mp4 - should give MissingKey (because the URL is not signed)
basic_distribution.cloudfront.net/video.mp4 - should work fine
The tests will have to be adjusted to work with your stream player, but the basic idea is that only the basic cloudfront url should work.
6 - Create a keypair for CloudFront
I think the only way to do this is through Amazon's web site. Go into your AWS "Account" page and click on the "Security Credentials" link. Click on the "Key Pairs" tab then click "Create a New Key Pair". This will generate a new key pair for you and automatically download a private key file (pk-xxxxxxxxx.pem). Keep the key file safe and private. Also note down the "Key Pair ID" from amazon as we will need it in the next step.
7 - Generate some URLs in Python
As of boto version 2.0 there does not seem to be any support for generating signed CloudFront URLs. Python does not include RSA encryption routines in the standard library so we will have to use an additional library. I've used M2Crypto in this example.
For a non-streaming distribution, you must use the full cloudfront URL as the resource, however for streaming we only use the object name of the video file. See the code below for a full example of generating a URL which only lasts for 5 minutes.
This code is based loosely on the PHP example code provided by Amazon in the CloudFront documentation.
from M2Crypto import EVP
import base64
import time
def aws_url_base64_encode(msg):
msg_base64 = base64.b64encode(msg)
msg_base64 = msg_base64.replace('+', '-')
msg_base64 = msg_base64.replace('=', '_')
msg_base64 = msg_base64.replace('/', '~')
return msg_base64
def sign_string(message, priv_key_string):
key = EVP.load_key_string(priv_key_string)
key.reset_context(md='sha1')
key.sign_init()
key.sign_update(str(message))
signature = key.sign_final()
return signature
def create_url(url, encoded_signature, key_pair_id, expires):
signed_url = "%(url)s?Expires=%(expires)s&Signature=%(encoded_signature)s&Key-Pair-Id=%(key_pair_id)s" % {
'url':url,
'expires':expires,
'encoded_signature':encoded_signature,
'key_pair_id':key_pair_id,
}
return signed_url
def get_canned_policy_url(url, priv_key_string, key_pair_id, expires):
#we manually construct this policy string to ensure formatting matches signature
canned_policy = '{"Statement":[{"Resource":"%(url)s","Condition":{"DateLessThan":{"AWS:EpochTime":%(expires)s}}}]}' % {'url':url, 'expires':expires}
#now base64 encode it (must be URL safe)
encoded_policy = aws_url_base64_encode(canned_policy)
#sign the non-encoded policy
signature = sign_string(canned_policy, priv_key_string)
#now base64 encode the signature (URL safe as well)
encoded_signature = aws_url_base64_encode(signature)
#combine these into a full url
signed_url = create_url(url, encoded_signature, key_pair_id, expires);
return signed_url
def encode_query_param(resource):
enc = resource
enc = enc.replace('?', '%3F')
enc = enc.replace('=', '%3D')
enc = enc.replace('&', '%26')
return enc
#Set parameters for URL
key_pair_id = "APKAIAZCZRKVIO4BQ" #from the AWS accounts page
priv_key_file = "cloudfront-pk.pem" #your private keypair file
resource = 'video.mp4' #your resource (just object name for streaming videos)
expires = int(time.time()) + 300 #5 min
#Create the signed URL
priv_key_string = open(priv_key_file).read()
signed_url = get_canned_policy_url(resource, priv_key_string, key_pair_id, expires)
#Flash player doesn't like query params so encode them
enc_url = encode_query_param(signed_url)
print(enc_url)
8 - Try out the URLs
Hopefully you should now have a working URL which looks something like this:
video.mp4%3FExpires%3D1309979985%26Signature%3DMUNF7pw1689FhMeSN6JzQmWNVxcaIE9mk1x~KOudJky7anTuX0oAgL~1GW-ON6Zh5NFLBoocX3fUhmC9FusAHtJUzWyJVZLzYT9iLyoyfWMsm2ylCDBqpy5IynFbi8CUajd~CjYdxZBWpxTsPO3yIFNJI~R2AFpWx8qp3fs38Yw_%26Key-Pair-Id%3DAPKAIAZRKVIO4BQ
Put this into your js and you should have something which looks like this (from the PHP example in Amazon's CloudFront documentation):
var so_canned = new SWFObject('http://location.domname.com/~jvngkhow/player.swf','mpl','640','360','9');
so_canned.addParam('allowfullscreen','true');
so_canned.addParam('allowscriptaccess','always');
so_canned.addParam('wmode','opaque');
so_canned.addVariable('file','video.mp4%3FExpires%3D1309979985%26Signature%3DMUNF7pw1689FhMeSN6JzQmWNVxcaIE9mk1x~KOudJky7anTuX0oAgL~1GW-ON6Zh5NFLBoocX3fUhmC9FusAHtJUzWyJVZLzYT9iLyoyfWMsm2ylCDBqpy5IynFbi8CUajd~CjYdxZBWpxTsPO3yIFNJI~R2AFpWx8qp3fs38Yw_%26Key-Pair-Id%3DAPKAIAZRKVIO4BQ');
so_canned.addVariable('streamer','rtmp://s3nzpoyjpct.cloudfront.net/cfx/st');
so_canned.write('canned');
Summary
As you can see, not very easy! boto v2 will help a lot setting up the distribution. I will find out if it's possible to get some URL generation code in there as well to improve this great library!
In Python, what's the easiest way of generating an expiring URL for a file. I have boto installed but I don't see how to get a file from a streaming distribution.
You can generate a expiring signed-URL for the resource. Boto3 documentation has a nice example solution for that:
import datetime
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.primitives.asymmetric import padding
from botocore.signers import CloudFrontSigner
def rsa_signer(message):
with open('path/to/key.pem', 'rb') as key_file:
private_key = serialization.load_pem_private_key(
key_file.read(),
password=None,
backend=default_backend()
)
signer = private_key.signer(padding.PKCS1v15(), hashes.SHA1())
signer.update(message)
return signer.finalize()
key_id = 'AKIAIOSFODNN7EXAMPLE'
url = 'http://d2949o5mkkp72v.cloudfront.net/hello.txt'
expire_date = datetime.datetime(2017, 1, 1)
cloudfront_signer = CloudFrontSigner(key_id, rsa_signer)
# Create a signed url that will be valid until the specfic expiry date
# provided using a canned policy.
signed_url = cloudfront_signer.generate_presigned_url(
url, date_less_than=expire_date)
print(signed_url)
Related
I am trying to use python DNS module (dnspython) to create (add) new DNS record.
Documentation specifies how to create update http://www.dnspython.org/examples.html :
import dns.tsigkeyring
import dns.update
import sys
keyring = dns.tsigkeyring.from_text({
'host-example.' : 'XXXXXXXXXXXXXXXXXXXXXX=='
})
update = dns.update.Update('dyn.test.example', keyring=keyring)
update.replace('host', 300, 'a', sys.argv[1])
But it does not precise, how to actually generate keyring string that can be passed to dns.tsigkeyring.from_text() method in the first place.
What is the correct way to generate the key? I am using krb5 at my organization.
Server is running on Microsoft AD DNS with GSS-TSIG.
TSIG and GSS-TSIG are different beasts – the former uses a static preshared key that can be simply copied from the server, but the latter uses Kerberos (GSSAPI) to negotiate a session key for every transaction.
At the time when this thread was originally posted, dnspython 1.x did not have any support for GSS-TSIG whatsoever.
(The handshake does not result in a static key that could be converted to a regular TSIG keyring; instead the GSSAPI library itself must be called to build an authenticator – dnspython 1.x could not do that, although dnspython 2.1 finally can.)
If you are trying to update an Active Directory DNS server, BIND's nsupdate command-line tool supports GSS-TSIG (and sometimes it even works). You should be able to run it through subprocess and simply feed the necessary updates via stdin.
cmds = [f'zone {dyn_zone}\n',
f'del {fqdn}\n',
f'add {fqdn} 60 TXT "{challenge}"\n',
f'send\n']
subprocess.run(["nsupdate", "-g"],
input="".join(cmds).encode(),
check=True)
As with most Kerberos client applications, nsupdate expects the credentials to be already present in the environment (that is, you need to have already obtained a TGT using kinit beforehand; or alternatively, if a recent version of MIT Krb5 is used, you can point $KRB5_CLIENT_KTNAME to the keytab containing the client credentials).
Update: dnspython 2.1 finally has the necessary pieces for GSS-TSIG, but creating the keyring is currently a very manual process – you have to call the GSSAPI library and process the TKEY negotiation yourself. The code for doing so is included at the bottom.
(The Python code below can be passed a custom gssapi.Credentials object, but otherwise it looks for credentials in the environment just like nsupdate does.)
import dns.rdtypes.ANY.TKEY
import dns.resolver
import dns.update
import gssapi
import socket
import time
import uuid
def _build_tkey_query(token, key_ring, key_name):
inception_time = int(time.time())
tkey = dns.rdtypes.ANY.TKEY.TKEY(dns.rdataclass.ANY,
dns.rdatatype.TKEY,
dns.tsig.GSS_TSIG,
inception_time,
inception_time,
3,
dns.rcode.NOERROR,
token,
b"")
query = dns.message.make_query(key_name,
dns.rdatatype.TKEY,
dns.rdataclass.ANY)
query.keyring = key_ring
query.find_rrset(dns.message.ADDITIONAL,
key_name,
dns.rdataclass.ANY,
dns.rdatatype.TKEY,
create=True).add(tkey)
return query
def _probe_server(server_name, zone):
gai = socket.getaddrinfo(str(server_name),
"domain",
socket.AF_UNSPEC,
socket.SOCK_DGRAM)
for af, sf, pt, cname, sa in gai:
query = dns.message.make_query(zone, "SOA")
res = dns.query.udp(query, sa[0], timeout=2)
return sa[0]
def gss_tsig_negotiate(server_name, server_addr, creds=None):
# Acquire GSSAPI credentials
gss_name = gssapi.Name(f"DNS#{server_name}",
gssapi.NameType.hostbased_service)
gss_ctx = gssapi.SecurityContext(name=gss_name,
creds=creds,
usage="initiate")
# Name generation tips: https://tools.ietf.org/html/rfc2930#section-2.1
key_name = dns.name.from_text(f"{uuid.uuid4()}.{server_name}")
tsig_key = dns.tsig.Key(key_name, gss_ctx, dns.tsig.GSS_TSIG)
key_ring = {key_name: tsig_key}
key_ring = dns.tsig.GSSTSigAdapter(key_ring)
token = gss_ctx.step()
while not gss_ctx.complete:
tkey_query = _build_tkey_query(token, key_ring, key_name)
response = dns.query.tcp(tkey_query, server_addr, timeout=5)
if not gss_ctx.complete:
# Original comment:
# https://github.com/rthalley/dnspython/pull/530#issuecomment-658959755
# "this if statement is a bit redundant, but if the final token comes
# back with TSIG attached the patch to message.py will automatically step
# the security context. We dont want to excessively step the context."
token = gss_ctx.step(response.answer[0][0].key)
return key_name, key_ring
def gss_tsig_update(zone, update_msg, creds=None):
# Find the SOA of our zone
answer = dns.resolver.resolve(zone, "SOA")
soa_server = answer.rrset[0].mname
server_addr = _probe_server(soa_server, zone)
# Get the GSS-TSIG key
key_name, key_ring = gss_tsig_negotiate(soa_server, server_addr, creds)
# Dispatch the update
update_msg.use_tsig(keyring=key_ring,
keyname=key_name,
algorithm=dns.tsig.GSS_TSIG)
response = dns.query.tcp(update_msg, server_addr)
return response
Amazon provides iOS, Android, and Javascript Cognito SDKs that offer a high-level authenticate-user operation.
For example, see Use Case 4 here:
https://github.com/aws/amazon-cognito-identity-js
However, if you are using python/boto3, all you get are a pair of primitives: cognito.initiate_auth and cognito.respond_to_auth_challenge.
I am trying to use these primitives along with the pysrp lib authenticate with the USER_SRP_AUTH flow, but what I have is not working.
It always fails with "An error occurred (NotAuthorizedException) when calling the RespondToAuthChallenge operation: Incorrect username or password." (The username/password pair work find with the JS SDK.)
My suspicion is I'm constructing the challenge response wrong (step 3), and/or passing Congito hex strings when it wants base64 or vice versa.
Has anyone gotten this working? Anyone see what I'm doing wrong?
I am trying to copy the behavior of the authenticateUser call found in the Javascript SDK:
https://github.com/aws/amazon-cognito-identity-js/blob/master/src/CognitoUser.js#L138
but I'm doing something wrong and can't figure out what.
#!/usr/bin/env python
import base64
import binascii
import boto3
import datetime as dt
import hashlib
import hmac
# http://pythonhosted.org/srp/
# https://github.com/cocagne/pysrp
import srp
bytes_to_hex = lambda x: "".join("{:02x}".format(ord(c)) for c in x)
cognito = boto3.client('cognito-idp', region_name="us-east-1")
username = "foobar#foobar.com"
password = "123456"
user_pool_id = u"us-east-1_XXXXXXXXX"
client_id = u"XXXXXXXXXXXXXXXXXXXXXXXXXX"
# Step 1:
# Use SRP lib to construct a SRP_A value.
srp_user = srp.User(username, password)
_, srp_a_bytes = srp_user.start_authentication()
srp_a_hex = bytes_to_hex(srp_a_bytes)
# Step 2:
# Submit USERNAME & SRP_A to Cognito, get challenge.
response = cognito.initiate_auth(
AuthFlow='USER_SRP_AUTH',
AuthParameters={ 'USERNAME': username, 'SRP_A': srp_a_hex },
ClientId=client_id,
ClientMetadata={ 'UserPoolId': user_pool_id })
# Step 3:
# Use challenge parameters from Cognito to construct
# challenge response.
salt_hex = response['ChallengeParameters']['SALT']
srp_b_hex = response['ChallengeParameters']['SRP_B']
secret_block_b64 = response['ChallengeParameters']['SECRET_BLOCK']
secret_block_bytes = base64.standard_b64decode(secret_block_b64)
secret_block_hex = bytes_to_hex(secret_block_bytes)
salt_bytes = binascii.unhexlify(salt_hex)
srp_b_bytes = binascii.unhexlify(srp_b_hex)
process_challenge_bytes = srp_user.process_challenge(salt_bytes,
srp_b_bytes)
timestamp = unicode(dt.datetime.utcnow().strftime("%a %b %d %H:%m:%S +0000 %Y"))
hmac_obj = hmac.new(process_challenge_bytes, digestmod=hashlib.sha256)
hmac_obj.update(user_pool_id.split('_')[1].encode('utf-8'))
hmac_obj.update(username.encode('utf-8'))
hmac_obj.update(secret_block_bytes)
hmac_obj.update(timestamp.encode('utf-8'))
challenge_responses = {
"TIMESTAMP": timestamp.encode('utf-8'),
"USERNAME": username.encode('utf-8'),
"PASSWORD_CLAIM_SECRET_BLOCK": secret_block_hex,
"PASSWORD_CLAIM_SIGNATURE": hmac_obj.hexdigest()
}
# Step 4:
# Submit challenge response to Cognito.
response = cognito.respond_to_auth_challenge(
ClientId=client_id,
ChallengeName='PASSWORD_VERIFIER',
ChallengeResponses=challenge_responses)
There are many errors in your implementation. For example:
pysrp uses SHA1 algorithm by default. It should be set to SHA256.
_ng_const length should be 3072 bits and it should be copied from amazon-cognito-identity-js
There is no hkdf function in pysrp.
The response should contain secret_block_b64, not secret_block_hex.
Wrong timestamp format. %H:%m:%S means "hour:month:second" and +0000 should be replaced by UTC.
Has anyone gotten this working?
Yes. It's implemented in the warrant.aws_srp module.
https://github.com/capless/warrant/blob/master/warrant/aws_srp.py
from warrant.aws_srp import AWSSRP
USERNAME='xxx'
PASSWORD='yyy'
POOL_ID='us-east-1_zzzzz'
CLIENT_ID = '12xxxxxxxxxxxxxxxxxxxxxxx'
aws = AWSSRP(username=USERNAME, password=PASSWORD, pool_id=POOL_ID,
client_id=CLIENT_ID)
tokens = aws.authenticate_user()
id_token = tokens['AuthenticationResult']['IdToken']
refresh_token = tokens['AuthenticationResult']['RefreshToken']
access_token = tokens['AuthenticationResult']['AccessToken']
token_type = tokens['AuthenticationResult']['TokenType']
authenticate_user method supports only PASSWORD_VERIFIER challenge. If you want to respond to other challenges, just look into the authenticate_user and boto3 documentation.
Unfortunately it's a hard problem since you don't get any hints from the service with regards to the computations (it mainly says not authorized as you mentioned).
We are working on improving the developer experience when users are trying to implement SRP on their own in languages where we don't have an SDK. Also, we are trying to add more SDKs.
As daunting as it sounds, what I would suggest is to take the Javascript or the Android SDK, fix the inputs (SRP_A, SRP_B, TIMESTAMP) and add console.log statements at various points in the implementation to make sure your computations are similar. Then you would run these computations in your implementation and make sure you are getting the same. As you have suggested, the password claim signature needs to be passed as a base64 encoded string to the service so that might be one of the issues.
Some of the issues I encountered while implementing this was related to BigInteger library differences (the way they do byte padding and transform negative numbers to byte arrays and inversely).
I use Python Social Auth - Django to log in my users.
My backend is Microsoft, so I can use Microsoft Graph but I don't think that it is relevant.
Python Social Auth deals with authentication but now I want to call the API and for that, I need a valid access token.
Following the use cases I can get to this:
social = request.user.social_auth.get(provider='azuread-oauth2')
response = self.get_json('https://graph.microsoft.com/v1.0/me',
headers={'Authorization': social.extra_data['token_type'] + ' '
+ social.extra_data['access_token']})
But the access token is only valid for 3600 seconds and so I need to refresh, I guess I can do it manually but there must be a better solution.
How can I get an access_token refreshed?
.get_access_token(strategy) refresh the token automatically if it's expired. You can use it like that:
from social_django.utils import load_strategy
#...
social = request.user.social_auth.get(provider='google-oauth2')
access_token = social.get_access_token(load_strategy())
Using load_strategy() at social.apps.django_app.utils:
social = request.user.social_auth.get(provider='azuread-oauth2')
strategy = load_strategy()
social.refresh_token(strategy)
Now the updated access_token can be retrieved from social.extra_data['access_token'].
The best approach is probably to check if it needs to be updated (customized for AzureAD Oauth2):
def get_azuread_oauth2_token(user):
social = user.social_auth.get(provider='azuread-oauth2')
if social.extra_data['expires_on'] <= int(time.time()):
strategy = load_strategy()
social.refresh_token(strategy)
return social.extra_data['access_token']
This is based on the method get_auth_tokenfrom AzureADOAuth2. I don't think this method is accessible outside the pipeline, please answer this question if there is any way to do it.
Updates
Update 1 - 20/01/2017
Following an Issue to request an extra data parameter with the time of the access token refresh, it is now possible to check if the access_token needs to be updated in every backend.
In future versions (>0.2.1 for the social-auth-core) there will be a new field in extra data:
'auth_time': int(time.time())
And so this works:
def get_token(user, provider):
social = user.social_auth.get(provider=provider)
if (social.extra_data['auth_time'] + social.extra_data['expires']) <= int(time.time()):
strategy = load_strategy()
social.refresh_token(strategy)
return social.extra_data['access_token']
Note: According to OAuth 2 RFC all responses should (it's a RECOMMENDED param) provide an expires_in but for most backends (including the azuread-oauth2) this value is being saved as expires. Be careful to understand how your backend behaves!
An Issue on this exists and I will be update the answer with the relevant info when it exists.
Update 2 - 17/02/17
Additionally, there is a method in UserMixin called access_token_expired (code) that can be used to assert if the token is valid or not (note: this method doesn't work for race conditions, as pointed out in this anwser by #SCasey).
Update 3 - 31/05/17
In Python Social Auth - Core v1.3.0 get_access_token(self, strategy) was introduced in storage.py.
So now:
from social_django.utils import load_strategy
social = request.user.social_auth.get(provider='azuread-oauth2')
response = self.get_json('https://graph.microsoft.com/v1.0/me',
headers={'Authorization': '%s %s' % (social.extra_data['token_type'],
social.get_access_token(load_strategy())}
Thanks #damio for pointing it out.
#NBajanca's update is almost correct for version 1.0.1.
extra_data['expires_in']
is now
extra_data['expires']
So the code is:
def get_token(user, provider):
social = user.social_auth.get(provider=provider)
if (social.extra_data['auth_time'] + social.extra_data['expires']) <= int(time.time()):
strategy = load_strategy()
social.refresh_token(strategy)
return social.extra_data['access_token']
I'd also recommend subtracting an arbitrary amount of time from that calc, so that we don't run into a race situation where we've checked the token 0.01s before expiry and then get an error because we sent the request after expiry. I like to add 10 seconds just to be safe, but it's probably overkill:
def get_token(user, provider):
social = user.social_auth.get(provider=provider)
if (social.extra_data['auth_time'] + social.extra_data['expires'] - 10) <= int(time.time()):
strategy = load_strategy()
social.refresh_token(strategy)
return social.extra_data['access_token']
EDIT
#NBajanca points out that expires_in is technically correct per the Oauth2 docs. It seems that for some backends, this may work. The code above using expires is what works with provider="google-oauth2" as of v1.0.1
I'm going to write a Python program to check if a file is in certain folder of my Google Cloud Storage, the basic idea is to get the list of all objects in a folder, a file name list, then check if the file abc.txt is in the file name list.
Now the problem is, it looks Google only provide the one way to get obj list, which is uri.get_bucket(), see below code which is from https://developers.google.com/storage/docs/gspythonlibrary#listing-objects
uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE)
for obj in uri.get_bucket():
print '%s://%s/%s' % (uri.scheme, uri.bucket_name, obj.name)
print ' "%s"' % obj.get_contents_as_string()
The defect of uri.get_bucket() is, it looks it is getting all of the object first, this is what I don't want, I just need get the obj name list of particular folder(e.g gs//mybucket/abc/myfolder) , which should be much quickly.
Could someone help answer? Appreciate every answer!
Update: the below is true for the older "Google API Client Libraries" for Python, but if you're not using that client, prefer the newer "Google Cloud Client Library" for Python ( https://googleapis.dev/python/storage/latest/index.html ). For the newer library, the equivalent to the below code is:
from google.cloud import storage
client = storage.Client()
for blob in client.list_blobs('bucketname', prefix='abc/myfolder'):
print(str(blob))
Answer for older client follows.
You may find it easier to work with the JSON API, which has a full-featured Python client. It has a function for listing objects that takes a prefix parameter, which you could use to check for a certain directory and its children in this manner:
from apiclient import discovery
# Auth goes here if necessary. Create authorized http object...
client = discovery.build('storage', 'v1') # add http=whatever param if auth
request = client.objects().list(
bucket="mybucket",
prefix="abc/myfolder")
while request is not None:
response = request.execute()
print json.dumps(response, indent=2)
request = request.list_next(request, response)
Fuller documentation of the list call is here: https://developers.google.com/storage/docs/json_api/v1/objects/list
And the Google Python API client is documented here:
https://code.google.com/p/google-api-python-client/
This worked for me:
client = storage.Client()
BUCKET_NAME = 'DEMO_BUCKET'
bucket = client.get_bucket(BUCKET_NAME)
blobs = bucket.list_blobs()
for blob in blobs:
print(blob.name)
The list_blobs() method will return an iterator used to find blobs in the bucket.
Now you can iterate over blobs and access every object in the bucket. In this example I just print out the name of the object.
This documentation helped me alot:
https://googleapis.github.io/google-cloud-python/latest/storage/blobs.html
https://googleapis.github.io/google-cloud-python/latest/_modules/google/cloud/storage/client.html#Client.bucket
I hope I could help!
You might also want to look at gcloud-python and documentation.
from gcloud import storage
connection = storage.get_connection(project_name, email, private_key_path)
bucket = connection.get_bucket('my-bucket')
for key in bucket:
if key.name == 'abc.txt':
print 'Found it!'
break
However, you might be better off just checking if the file exists:
if 'abc.txt' in bucket:
print 'Found it!'
Install python package google-cloud-storage by pip or pycharm and use below code
from google.cloud import storage
client = storage.Client()
for blob in client.list_blobs(BUCKET_NAME, prefix=FOLDER_NAME):
print(str(blob))
I know this is an old question, but I stumbled over this because I was looking for the exact same answer. Answers from Brandon Yarbrough and Abhijit worked for me, but I wanted to get into more detail.
When you run this:
from google.cloud import storage
storage_client = storage.Client()
blobs = list(storage_client.list_blobs(bucket_name, prefix=PREFIX, fields="items(name)"))
You will get Blob objects, with just the name field of all files in the given bucket, like this:
[<Blob: BUCKET_NAME, PREFIX, None>,
<Blob: xml-BUCKET_NAME, [PREFIX]claim_757325.json, None>,
<Blob: xml-BUCKET_NAME, [PREFIX]claim_757390.json, None>,
...]
If you are like me and you want to 1) filter out the first item in the list because it does NOT represent a file - its just the prefix, 2) just get the name string value, and 3) remove the PREFIX from the file name, you can do something like this:
blob_names = [blob_name.name[len(PREFIX):] for blob_name in blobs if blob_name.name != folder_name]
Complete code to get just the string files names from a storage bucket:
from google.cloud import storage
storage_client = storage.Client()
blobs = list(storage_client.list_blobs(bucket_name, prefix=PREFIX, fields="items(name)"))
blob_names = [blob_name.name[len(PREFIX):] for blob_name in blobs if blob_name.name != folder_name]
print(f"blob_names = {blob_names}")
Can you produce a Python example of how to download a Google Sheets spreadsheet given its key and worksheet ID (gid)? I can't.
I've scoured versions 1, 2 and 3 of the API. I'm having no luck, I can't figure out their compilcated ATOM-like feeds API, the gdata.docs.service.DocsService._DownloadFile private method says that I'm unauthorized, and I don't want to write an entire Google Login authentication system myself. I'm about to stab myself in the face due to frustration.
I have a few spreadsheets and I want to access them like so:
username = 'mygooglelogin#gmail.com'
password = getpass.getpass()
def get_spreadsheet(key, gid=0):
... (help!) ...
for row in get_spreadsheet('5a3c7f7dcee4b4f'):
cell1, cell2, cell3 = row
...
Please save my face.
Update 1: I've tried the following, but no combination of Download() or Export() seems to work. (Docs for DocsService here)
import gdata.docs.service
import getpass
import os
import tempfile
import csv
def get_csv(file_path):
return csv.reader(file(file_path).readlines())
def get_spreadsheet(key, gid=0):
gd_client = gdata.docs.service.DocsService()
gd_client.email = 'xxxxxxxxx#gmail.com'
gd_client.password = getpass.getpass()
gd_client.ssl = False
gd_client.source = "My Fancy Spreadsheet Downloader"
gd_client.ProgrammaticLogin()
file_path = tempfile.mktemp(suffix='.csv')
uri = 'http://docs.google.com/feeds/documents/private/full/%s' % key
try:
entry = gd_client.GetDocumentListEntry(uri)
# XXXX - The following dies with RequestError "Unauthorized"
gd_client.Download(entry, file_path)
return get_csv(file_path)
finally:
try:
os.remove(file_path)
except OSError:
pass
The https://github.com/burnash/gspread library is a newer, simpler way to interact with Google Spreadsheets, rather than the old answers to this that suggest the gdata library which is not only too low-level, but is also overly-complicated.
You will also need to create and download (in JSON format) a Service Account key: https://console.developers.google.com/apis/credentials/serviceaccountkey
Here's an example of how to use it:
import csv
import gspread
from oauth2client.service_account import ServiceAccountCredentials
scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('credentials.json', scope)
docid = "0zjVQXjJixf-SdGpLKnJtcmQhNjVUTk1hNTRpc0x5b9c"
client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(docid)
for i, worksheet in enumerate(spreadsheet.worksheets()):
filename = docid + '-worksheet' + str(i) + '.csv'
with open(filename, 'wb') as f:
writer = csv.writer(f)
writer.writerows(worksheet.get_all_values())
In case anyone comes across this looking for a quick fix, here's another (currently) working solution that doesn't rely on the gdata client library:
#!/usr/bin/python
import re, urllib, urllib2
class Spreadsheet(object):
def __init__(self, key):
super(Spreadsheet, self).__init__()
self.key = key
class Client(object):
def __init__(self, email, password):
super(Client, self).__init__()
self.email = email
self.password = password
def _get_auth_token(self, email, password, source, service):
url = "https://www.google.com/accounts/ClientLogin"
params = {
"Email": email, "Passwd": password,
"service": service,
"accountType": "HOSTED_OR_GOOGLE",
"source": source
}
req = urllib2.Request(url, urllib.urlencode(params))
return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]
def get_auth_token(self):
source = type(self).__name__
return self._get_auth_token(self.email, self.password, source, service="wise")
def download(self, spreadsheet, gid=0, format="csv"):
url_format = "https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=%s&exportFormat=%s&gid=%i"
headers = {
"Authorization": "GoogleLogin auth=" + self.get_auth_token(),
"GData-Version": "3.0"
}
req = urllib2.Request(url_format % (spreadsheet.key, format, gid), headers=headers)
return urllib2.urlopen(req)
if __name__ == "__main__":
import getpass
import csv
email = "" # (your email here)
password = getpass.getpass()
spreadsheet_id = "" # (spreadsheet id here)
# Create client and spreadsheet objects
gs = Client(email, password)
ss = Spreadsheet(spreadsheet_id)
# Request a file-like object containing the spreadsheet's contents
csv_file = gs.download(ss)
# Parse as CSV and print the rows
for row in csv.reader(csv_file):
print ", ".join(row)
You might try using the AuthSub method described in the Exporting Spreadsheets section of the documentation.
Get a separate login token for the spreadsheets service and substitue that for the export. Adding this to the get_spreadsheet code worked for me:
import gdata.spreadsheet.service
def get_spreadsheet(key, gid=0):
# ...
spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
spreadsheets_client.email = gd_client.email
spreadsheets_client.password = gd_client.password
spreadsheets_client.source = "My Fancy Spreadsheet Downloader"
spreadsheets_client.ProgrammaticLogin()
# ...
entry = gd_client.GetDocumentListEntry(uri)
docs_auth_token = gd_client.GetClientLoginToken()
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
gd_client.Export(entry, file_path)
gd_client.SetClientLoginToken(docs_auth_token) # reset the DocList auth token
Notice I also used Export, as Download seems to give only PDF files.
(Jul 2016) All other answers are pretty much outdated or will be, either because they use GData ("Google Data") Protocol, ClientLogin, or AuthSub, all of which have been deprecated. The same is true for all code or libraries that use the Google Sheets API v3 or older.
Modern Google API access occurs using API keys (for accessing public data), OAuth2 client IDs (for accessing data owned by users), or service accounts (for accessing data owned by applications/in the cloud) primarily with the Google Cloud client libraries for GCP APIs and Google APIs Client Libraries for non-GCP APIs. For this task, it would be the latter for Python.
To make it happen your code needs authorized access to the Google Drive API, perhaps to query for specific Sheets to download, and then to perform the actual export(s). Since this is likely a common operation, I wrote a blogpost sharing a code snippet that does this for you. If you wish to pursue this even more, I've got another pair of posts along with a video that outlines how to upload files to and download files from Google Drive.
Note that there is also a Google Sheets API v4, but it's primarily for spreadsheet-oriented operations, i.e., inserting data, reading spreadsheet rows, cell formatting, creating charts, adding pivot tables, etc., not file-based request like exporting where the Drive API is the correct one to use.
I wrote a blog post that demos exporting a Google Sheet as CSV from Drive. The core part of the script:
# setup
FILENAME = 'inventory'
SRC_MIMETYPE = 'application/vnd.google-apps.spreadsheet'
DST_MIMETYPE = 'text/csv'
DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))
# query for file to export
files = DRIVE.files().list(
q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE), orderBy='modifiedTime desc,name').execute().get('files', [])
# export 1st match (if found)
if files:
fn = '%s.csv' % os.path.splitext(files[0]['name'].replace(' ', '_'))[0]
print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end='')
data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()
if data:
with open(fn, 'wb') as f:
f.write(data)
print('DONE')
To learn more about using Google Sheets with Python, see my answer for a similar question. You can also download a Sheet in XLSX and other formats supported by Drive.
If you're completely new to Google APIs, then you need to take a further step back and review these videos first:
How to use Google APIs & create API projects -- the UI has changed but the concepts are still the same
Walkthrough of authorization boilerplate code (Python) -- you can use any supported language to access Google APIs; if you don't do Python, use it as pseudocode to help get you started
Listing your files in Google Drive and code deep dive post
If you already have experience with Google Workspace (formerly G Suite, Google Apps, Google "Docs") APIs and want to see more videos on using both APIs:
Sheets API video library
Drive API video library
Google Workspace (G Suite) Dev Show video series I produced
This no longer works as of gdata 2.0.1.4:
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
Instead, you have to do:
gd_client.SetClientLoginToken(gdata.gauth.ClientLoginToken(spreadsheets_client.GetClientLoginToken()))
I wrote pygsheets as an alternative to gspread, but using google api v4. It has an export method to export spreadsheet.
import pygsheets
gc = pygsheets.authorize()
# Open spreadsheet and then workseet
sh = gc.open('my new ssheet')
wks = sh.sheet1
#export as csv
wks.export(pygsheets.ExportType.CSV)
The following code works in my case (Ubuntu 10.4, python 2.6.5 gdata 2.0.14)
import gdata.docs.service
import gdata.spreadsheet.service
gd_client = gdata.docs.service.DocsService()
gd_client.ClientLogin(email,password)
spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
spreadsheets_client.ClientLogin(email,password)
#...
file_path = file_path.strip()+".xls"
docs_token = gd_client.auth_token
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
gd_client.Export(entry, file_path)
gd_client.auth_token = docs_token
I've simplified #Cameron's answer even further, by removing the unnecessary object orientation. This makes the code smaller and easier to understand. I also edited the url, which might work better.
#!/usr/bin/python
import re, urllib, urllib2
def get_auth_token(email, password):
url = "https://www.google.com/accounts/ClientLogin"
params = {
"Email": email, "Passwd": password,
"service": 'wise',
"accountType": "HOSTED_OR_GOOGLE",
"source": 'Client'
}
req = urllib2.Request(url, urllib.urlencode(params))
return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]
def download(spreadsheet, worksheet, email, password, format="csv"):
url_format = 'https://docs.google.com/spreadsheets/d/%s/export?exportFormat=%s#gid=%s'
headers = {
"Authorization": "GoogleLogin auth=" + get_auth_token(email, password),
"GData-Version": "3.0"
}
req = urllib2.Request(url_format % (spreadsheet, format, worksheet), headers=headers)
return urllib2.urlopen(req)
if __name__ == "__main__":
import getpass
import csv
spreadsheet_id = "" # (spreadsheet id here)
worksheet_id = '' # (gid here)
email = "" # (your email here)
password = getpass.getpass()
# Request a file-like object containing the spreadsheet's contents
csv_file = download(spreadsheet_id, worksheet_id, email, password)
# Parse as CSV and print the rows
for row in csv.reader(csv_file):
print ", ".join(row)
I'm using this:
curl 'https://docs.google.com/spreadsheets/d/1-lqLuYJyHAKix-T8NR8wV8ZUUbVOJrZTysccid2-ycs/gviz/tq?tqx=out:csv' on a sheet that is set to publicly readable.
So you would need a python version of curl, if you can work with public sheets.
If you have a sheet with some tabs you don't want to reveal, create a new sheet, and import the ranges you want to publish into tabs on it.
Downloading a spreadsheet from google doc is pretty simple using sheets.
You can follow the detailed documentation on
https://pypi.org/project/gsheets/
or follow the below-given steps. I recommend reading through the documentation for better coverage.
pip install gsheets
Log in to the Google Developers Console with the Google account whose spreadsheets you want to access. Create (or select) a project and enable the Drive API and Sheets API (under Google Apps APIs).
Go to the Credentials for your project and create New credentials > OAuth client ID > of type Other. In the list of your OAuth 2.0 client IDs click Download JSON for the Client ID you just created. Save the file as client_secrets.json in your home directory (user directory).
Use the following code snippet.
from gsheets import Sheets
sheets = Sheets.from_files('client_secret.json')
print(sheets) # will ensure authenticate connection
s = sheets.get("{SPREADSHEET_URL}")
print(s) # will ensure your file is accessible
s.sheets[1].to_csv('Spam.csv', encoding='utf-8', dialect='excel') # will download the file as csv
This isn't a complete answer, but Andreas Kahler wrote up an interesting CMS solution using Google Docs + Google App Engline + Python. Not having any experience in the area, I cannot see exactly what portion of the code may be of use to you, but check it out. I know it interfaces with a Google Docs account and plays with files, so I have a feeling you'll recognize what's going on. It should at least point you in the right direction.
Google AppEngine + Google Docs + Some Python = Simple CMS
Gspread is indeed a big improvement over GoogleCL and Gdata (both of which I've used and thankfully phased out in favor of Gspread). I think that this code is even quicker than the earlier answer to get the contents of the sheet:
username = 'sdfsdfsds#gmail.com'
password = 'sdfsdfsadfsdw'
sheetname = "Sheety Sheet"
client = gspread.login(username, password)
spreadsheet = client.open(sheetname)
worksheet = spreadsheet.sheet1
contents = []
for rows in worksheet.get_all_values():
contents.append(rows)
(Mar 2019, Python 3) My data is usually not sensitive and I use usually table format similar to CSV.
In such case, one can simply publish to the web the sheet and than use it as a CSV file on a server.
(One publishes it using File -> Publish to the web ... -> Sheet 1 -> Comma separated values (.csv) -> Publish).
import csv
import io
import requests
url = "https://docs.google.com/spreadsheets/d/e/<GOOGLE_ID>/pub?gid=0&single=true&output=csv" # you can get the whole link in the 'Publish to the web' dialog
r = requests.get(url)
r.encoding = 'utf-8'
csvio = io.StringIO(r.text, newline="")
data = []
for row in csv.DictReader(csvio):
data.append(row)