Convert shell script to Python - python

aws ec2 describe-snapshots --owner-ids $AWS_ACCOUNT_ID --query
"Snapshots[?(StartTime<='$dtt')].[SnapshotId]" --output text | tr '\t'
'\n' | sort
I have this shell script which I want to convert to python.
I tried looking at the boto3 documentation and came up with this
client = boto3.client('ec2')
client.describe_snapshots(OwnerIds = [os.environ['AWS_ACCOUNT_ID']], )
But I can't figure out how to change that --query tag in python.
I couldn't find it in the documentation.
What am I missing here?

You should ignore the --query portion and everything after it, and process that within Python instead.
First, store the result of the call in a variable:
ec2_client = boto3.client('ec2')
response = ec2_client.describe_snapshots(OwnerIds = ['self'])
It will return something like:
{
'NextToken': '',
'Snapshots': [
{
'Description': 'This is my snapshot.',
'OwnerId': '012345678910',
'Progress': '100%',
'SnapshotId': 'snap-1234567890abcdef0',
'StartTime': datetime(2014, 2, 28, 21, 28, 32, 4, 59, 0),
'State': 'completed',
'VolumeId': 'vol-049df61146c4d7901',
'VolumeSize': 8,
},
],
'ResponseMetadata': {
'...': '...',
},
}
Therefore, you can use response['Snapshots'] to extract your desired results, for example:
for snapshot in response['Snapshots']:
if snapshot['StartTime'] < datetime(2022, 6, 1):
print(snapshot['SnapshotId'])
It's really all Python at that point.

Related

AWS Config Python Unicode Mess

I am running into issue with trying to pull out usable items from this output. I am just trying to pull a single value from this string of Unicode and it has been super fun.
my print(response) returns this: FYI this is way longer than this little snippet.
{u'configurationItems': [{u'configurationItemCaptureTime': datetime.datetime(2020, 6, 4, 21, 56, 31, 134000, tzinfo=tzlocal()), u'resourceCreationTime': datetime.datetime(2020, 5, 22, 16, 32, 55, 162000, tzinfo=tzlocal()), u'availabilityZone': u'Not Applicable', u'awsRegion': u'us-east-1', u'tags': {u'brassmonkeynew': u'tomtagnew'}, u'resourceType': u'AWS::DynamoDB::Table', u'resourceId': u'tj-test2', u'configurationStateId': u'1591307791134', u'relatedEvents': [], u'relationships': [], u'arn': u'arn:aws:dynamodb:us-east-1:896911201517:table/tj-test2', u'version': u'1.3', u'configurationItemMD5Hash': u'', u'supplementaryConfiguration': {u'ContinuousBackupsDescription': u'{"continuousBackupsStatus":"ENABLED","pointInTimeRecoveryDescription":{"pointInTimeRecoveryStatus":"DISABLED"}}', u'Tags': u'[{"key":"brassmonkeynew","value":"tomtagnew"}]'}, u'resourceName': u'tj-test2', u'configuration': u'{"attributeDefinitions":[{"attributeName":"tj-test2","attributeType":"S"}],"tableName":"tj-test2","keySchema":[{"attributeName":"tj-test2","keyType":"HASH"}],"tableStatus":"ACTIVE","creationDateTime":1590165175162,"provisionedThroughput":{"numberOfDecreasesToday":0,"readCapacityUnits":5,"writeCapacityUnits":5},"tableArn":"arn:aws:dynamodb:us-east-1:896911201517:table/tj-test2","tableId":"816956d7-95d1-4d31-8d18-f11b18de4643"}', u'configurationItemStatus': u'OK', u'accountId': u'896911201517'}, {u'configurationItemCaptureTime': datetime.datetime(2020, 6, 1, 16, 27, 21, 316000, tzinfo=tzlocal()), u'resourceCreationTime': datetime.datetime(2020, 5, 22, 16, 32, 55, 162000, tzinfo=tzlocal()), u'availabilityZone': u'Not Applicable', u'awsRegion': u'us-east-1', u'tags': {u'brassmonkeynew': u'tomtagnew', u'backup-schedule': u'daily'}, u'resourceType': u'AWS::DynamoDB::Table', u'resourceId': u'tj-test2', u'configurationStateId': u'1591028841316', u'relatedEvents': [], u'relationships': [], u'arn': u'arn:aws:dynamodb:us-east-1:896911201517:table/tj-test2', u'version': u'1.3', u'configurationItemMD5Hash': u'', u'supplementaryConfiguration': {u'ContinuousBackupsDescription': u'{"continuousBackupsStatus":"ENABLED","pointInTimeRecoveryDescription":{"pointInTimeRecoveryStatus":"DISABLED"}}', u'Tags': u'[{"key":"brassmonkeynew","value":"tomtagnew"},{"key":"backup-schedule","value":"daily"}]'}, u'resourceName': u'tj-test2', u'configuration': u'{"attributeDefinitions":[{"attributeName":"tj-test2","attributeType":"S"}],"tableName":"tj-test2","keySchema":[{"attributeName":"tj-
and so on. I have tried a few different ways of getting this info but every time I get a key error:
I also tried converting this into JSON and but since i have Date/time at the top it gives me this error:
“TypeError: [] is not JSON serializable
Failed attempts:
# print(response[0]["tableArn"])
print(response2)
print(response2['tableArn'])
print(response2.arn)
print(response2['configurationItems'][0]['tableArn'])
print(response2['configurationItems']['tableArn'])
print(response.configurationItems[0])
arn = response.configurationItems[0].arn
def lambda_handler(event, context):
# print("Received event: " + json.dumps(event, indent=2))
message = event['Records'][0]['Sns']['Message']
print("From SNS: " + message)
response = client.get_resource_config_history(
resourceType='AWS::DynamoDB::Table',
resourceId = message
)
response2 = dict(response)
print(response)
return message
Here's some Python3 code that shows how to access the elements:
import boto3
import json
import pprint
config_client = boto3.client('config')
response = config_client.get_resource_config_history(
resourceType='AWS::DynamoDB::Table',
resourceId = 'stack-table'
)
for item in response['configurationItems']:
configuration = item['configuration'] # Returns a JSON string
config = json.loads(configuration) # Convert to Python object
pprint.pprint(config) # Show what's in it
print(config['tableArn']) # Access elements in object
The trick is that the configuration field contains a JSON string that needs to be converted into a Python object for easy access.

How do you add KeyManager to a kms key mocked using moto

I want to create a key that's managed by AWS. So far this is what I have
#mock_kms
def test_mocking_getting_keys(self):
session = boto3.Session(profile_name=profile)
client = session.client('kms', 'us-east-2')
key = client.create_key(
Policy='string',
Description='string',
KeyUsage='SIGN_VERIFY',
CustomerMasterKeySpec='RSA_2048',
Origin='AWS_KMS',
CustomKeyStoreId='string',
BypassPolicyLockoutSafetyCheck=True,
Tags=[
{
'TagKey': 'string',
'TagValue': 'string'
},
]
)
print(key)
But the key doesn't seem to have KeyManager field:
{'KeyMetadata': {'AWSAccountId': '012345678912', 'KeyId': '7fc3e676-0d1c-4526-9161-41b27a776033', 'Arn': 'arn:aws:kms:us-east-2:012345678912:key/7fc3e676-0d1c-4526-9161-41b27a776033', 'CreationDate': datetime.datetime(2020, 1, 3, 13, 31, 17, tzinfo=tzutc()), 'Enabled': True, 'Description': 'string', 'KeyUsage': 'SIGN_VERIFY', 'KeyState': 'Enabled'}, 'ResponseMetadata': {'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'amazon.com'}, 'RetryAttempts': 0}}
I tried adding KeyManager as a param during create_key call but that didn't work either.
Seems like moto doens't return the KeyManager field. Is there a way to mock that return value specifically but not change the behavior of the dictionary.get method for the rest of the params?
i.e.
key['KeyMetadata']['AWSAccountId'] would return the mocked value and then
key['KeyMetadata']['KeyManager'] would return a another mocked value that I could specify.
The KeyManager attribute is currently not returned by Moto, you can either open an Issue on the Moto GitHub, or add it yourself (either locally, or PR'ed to upstream)

boto3 attached device name

I am trying to get the device name where an EBS volume is attached. I can pull the attach data, but would just like to isolate the device name.
import boto3
from botocore.client import ClientError
ec2 = boto3.resource('ec2')
filters = [{
'Name': 'tag:app',
'Values': [ 'splunklogidx' ],
'Name': 'tag:splunksnap',
'Values': [ 'yes' ]
}]
#volumes = ec2.volumes.all() # list all volumes
volumes = ec2.volumes.filter(Filters=filters) # list all volumes
for v in volumes:
print "v.id = %s" % v.id
#print "v.tags = %s" % v.tags
print "v.attachments = %s" % v.attachments
#for atDev in v.attachments:
#if atDev['Key'] == 'Device':
#print "device = %s" % atDev['Value']
And example of the output is below, and I would like to just be able to pull the /dev/sda1 from that.
Thank You, K.
v.attachments = [
{u'AttachTime': datetime.datetime(2017, 8, 11, 10, 35, 8, tzinfo=tzutc()),
u'InstanceId': 'i-004821f294d20085d',
u'VolumeId': 'vol-069d956bf5076c68d',
u'State': 'attached',
u'DeleteOnTermination': True,
u'Device': '/dev/sda1'}
]

How to find non-retweets in a MongoDB collection of tweets?

I have a collection of about 1.4 million tweets in a MongoDB collection. I want to find all that are NOT retweets, and am using Python. The structure of a document is as follows:
{
'_id': ObjectId('59388c046b0c1901172555b9'),
'coordinates': None,
'created_at': datetime.datetime(2016, 8, 18, 17, 17, 12),
'geo': None,
'is_quote': False,
'lang': 'en',
'text': b'Adam Cole Praises Kevin Owens + A Preview For Next Week\xe2\x80\x99s',
'tw_id': 766323071976247296,
'user_id': 2231233110,
'user_lang': 'en',
'user_loc': 'main; #Kan1shk3',
'user_name': 'sheezy0',
'user_timezone': 'Chennai'
}
I can write a query that works to find the particular tweet from above:
twitter_mongo_collection.find_one({
'text': b'Adam Cole Praises Kevin Owens + A Preview For Next Week\xe2\x80\x99s'
})
But when I try to find retweets, my code doesn't work, for example I try to find any tweets that start like this:
'text': b'RT some tweet'
Using this query:
find_one( {'text': {'$regex': "/^RT/" } } )
It doesn't return an error, but it doesn't find anything. I suspect it has something to do with that 'b' at the beginning before the text starts. I know I also need to put '$not:' in there somewhere but am not sure where.
Thanks!
It looks like your regex search is trying to match the string
b'RT'
but you want to match strings like
b'RT some text afterwards'
try using this regex instead
find_one( {'text': {'$regex': "/^RT.*/" } } )
I had to decode the 'text' field that was encoded as binary. Then I was able to use
twitter_mongo_collection.find_one( { {'text': { '$not': re.compile("^RT.*") } } )
to find all the documents that did not start with "RT".

How to improve performance of pymongo queries

I inherited an old Mongo database. Let's focus on the following two collections (removed most of their content for better readability):
Collection user
db.user.find_one({"email": "user#host.com"})
{'lastUpdate': datetime.datetime(2016, 9, 2, 11, 40, 13, 160000),
'creationTime': datetime.datetime(2016, 6, 23, 7, 19, 10, 6000),
'_id': ObjectId('576b8d6ee4b0a37270b742c7'),
'email': 'user#host.com' }
Collections entry (one user to many entries):
db.entry.find_one({"userId": _id})
{'date_entered': datetime.datetime(2015, 2, 7, 0, 0),
'creationTime': datetime.datetime(2015, 2, 8, 14, 41, 50, 701000),
'lastUpdate': datetime.datetime(2015, 2, 9, 3, 28, 2, 115000),
'_id': ObjectId('54d775aee4b035e584287a42'),
'userId': '576b8d6ee4b0a37270b742c7',
'data': 'test'}
As you can see, there is no DBRef between the two.
What I would like to do is to count the total number of entries, and the number of entries updated after a given date.
To do this I used Python's pymongo library. The code below gets me what I need, but it is painfully slow.
from pymongo import MongoClient
client = MongoClient('mongodb://foobar/')
db = client.userdata
# First I need to fetch all user ids. Otherwise db cursor will time out after some time.
user_ids = [] # build a list of tuples (email, id)
for user in db.user.find():
user_ids.append( (user['email'], str(user['_id'])) )
date = datetime(2016, 1, 1)
for user_id in user_ids:
email, _id = user_id
t0 = time.time()
query = {"userId": _id}
no_of_all_entries = db.entry.find(query).count()
query = {"userId": _id, "lastUpdate": {"$gte": date}}
no_of_entries_this_year = db.entry.find(query).count()
t1 = time.time()
print("delay ", round(t1 - t0, 2))
print(email, no_of_all_entries, no_of_entries_this_year)
It takes around 0.83 second to run both db.entry.find queries on my laptop, and 0.54 on an AWS server (not the MongoDB server).
Having ~20000 users it takes painful 3 hours to get all the data.
Is that the kind of latency you'd expect to see in Mongo ? What can I do to improve this ? Bear in mind that MongoDB is fairly new to me.
Instead of running two aggregates for all users separately you can just get both aggregates for all users with db.collection.aggregate().
And instead of a (email, userId) tuples we make it a dictionary as it is easier to use to get the corresponding email.
user_emails = {str(user['_id']): user['email'] for user in db.user.find()}
date = datetime(2016, 1, 1)
entry_counts = db.entry.aggregate([
{"$group": {
"_id": "$userId",
"count": {"$sum": 1},
"count_this_year": {
"$sum": {
"$cond": [{"$gte": ["$lastUpdate", date]}, 1, 0]
}
}
}}
])
for entry in entry_counts:
print(user_emails.get(entry['_id']),
entry['count'],
entry['count_this_year'])
I'm pretty sure getting the user's email address into the result could be done but I'm not a mongo expert either.

Categories

Resources