EC2 are not starting using ec2.start_instance()
import boto3
ec2 = boto3.client('ec2')
# Define the tag key and values
tag_key = "Environment"
tag_values = \["MT1", "UAT2"\]
exclude_tag_key = "Exclude"
exclude_tag_value = "Yes"
def lambda_handler(event, context):
# Get all the instances with either of the specified tag values
response = ec2.describe_instances(
'Name': 'tag:' + tag_key,
'Values': tag_values
'Name': 'instance-state-name',
'Values': ['stopped']
# {
# 'Name': 'tag:' + exclude_tag_key,
# 'Values': [exclude_tag_value],
# 'Operator': 'NOT'
# }
# Extract the instance IDs from the response
instance_ids = [instance['InstanceId'] for reservation in response['Reservations'] for instance in reservation['Instances']]
\#Start the instances
We are able to get the filtered instances in "instance_ids", but none of the stopped instances goes to starting.
I'm looking to find instances that is not equal to platform "Windows" and tag them with specific tags.
For now i have this script that is tagging the instances that are equal to platform "Windows":
import boto3
ec2 = boto3.client('ec2')
response = ec2.describe_instances(Filters=[{'Name' : 'platform', 'Values' : ['windows']}])
instances = response['Reservations']
for each_res in response['Reservations']:
for each_inst in each_res['Instances']:
for instance in instances:
response = ec2.create_tags(
Tags = [
'Key' : 'test',
'Value': 'test01'
I need help to add a block to this script that will add another tag only to EC2 instance that is NOT equal to platform "Windows".
Try this. Working for me. Also, Running create_tags inside the for loop, you are executing one API for each resource. Whereas create_tags supports multiple resource as input. Reference :
import boto3
#Initialize an empty list to store non windows instance IDs.
list_nonwindows = []
ec2 = boto3.client("ec2", region_name="us-east-1")
response = ec2.describe_instances()
instances = response["Reservations"]
for each_res in response["Reservations"]:
for each_inst in each_res["Instances"]:
if each_inst.get('Platform') == None:
instance_s = each_inst.get('InstanceId')
response = ec2.create_tags(
Tags = [
'Key' : 'test',
'Value': 'test01'
Just remove the filter and iterate over all the instances and inside the code add an if condition on the platform key.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import boto3
ec2 = boto3.client("ec2", region_name="eu-central-1")
response = ec2.describe_instances()
instances = response["Reservations"]
for each_res in response["Reservations"]:
for each_inst in each_res["Instances"]:
platform = each_inst.get('Plaform')
instance_id = each_inst.get('InstanceId')
if platform == 'Windows':
response = ec2.create_tags(
Tags = [
'Key' : 'test',
'Value': 'test01'
print(f'found non windows intance: {instance_id}')
response = ec2.create_tags(
Tags = [
'Key' : 'nonwindow',
'Value': 'nonwindowvalue'
As per the API docs
The value is Windows for Windows instances; otherwise blank.
Code is working correctly I tested:
$ python3
found non windows intance: i-0ba1a62801c895
Response structure received from the describe_instnaces call
'Reservations': [
'Groups': [
'GroupName': 'string',
'GroupId': 'string'
'Instances': [
'AmiLaunchIndex': 123,
'ImageId': 'string',
'InstanceId': 'string',
'Platform': 'Windows',
'PrivateDnsName': 'string',
'PrivateIpAddress': 'string',
'ProductCodes': [
def get_search(companyId,name):
resp = client.get_item(
'companyId': { 'S': companyId },
'name': { 'S': name }
item = resp.get('Item')
if not item:
return jsonify({'error': 'Company does not exist'}), 404
return jsonify({
'companyId': item.get('companyId').get('S'),
'name': item.get('name').get('S'),
'region': item.get('region').get('S')
The response from a DynamoDB resource object looks doesn't require me to parse the low level data structure from DynamoDB, but when I use the boto3 client I have to do that, why is that?
response = table.scan(
item = response['Items']
import pdb;pdb.set_trace()
if not item:
return jsonify({'error': 'Company does not exist'}), 404
return jsonify({
'companyId': item.get('companyId'),
'name': item.get('name'),
'region': item.get('region')
In general the resource API in boto3 is a higher level abstraction from the underlying client API. It tries to hide some of the implementation details of the underlying client calls, but comes at a performance cost.
You can also use the deserializer that comes with boto3 to turn the values from client.get_item() into a Python object.
from boto3.dynamodb.types import TypeDeserializer
def main():
dynamodb_item = {
"PK": {
"S": "key"
"SK": {
"S": "value"
deserializer = TypeDeserializer()
deserialized = {
key: deserializer.deserialize(dynamodb_item[key])
for key in dynamodb_item.keys()
print(deserialized) # {'PK': 'key', 'SK': 'value'}
if __name__ == "__main__":
I want to execute spark submit job on AWS EMR cluster based on the file upload event on S3. I am using AWS Lambda function to capture the event but I have no idea how to submit spark submit job on EMR cluster from Lambda function.
Most of the answers that i searched talked about adding a step in the EMR cluster. But I do not know if I can add add any step to fire "spark submit --with args" in the added step.
You can, I had to same thing last week!
Using boto3 for Python (other languages would definitely have a similar solution) you can either start a cluster with the defined step, or attach a step to an already up cluster.
Defining the cluster with the step
def lambda_handler(event, context):
conn = boto3.client("emr")
cluster_id = conn.run_job_flow(
'InstanceGroups': [
'Name': 'Master nodes',
'Market': 'ON_DEMAND',
'InstanceRole': 'MASTER',
'InstanceType': 'm3.xlarge',
'InstanceCount': 1,
'Name': 'Slave nodes',
'Market': 'ON_DEMAND',
'InstanceRole': 'CORE',
'InstanceType': 'm3.xlarge',
'InstanceCount': 2,
'Ec2KeyName': 'key-name',
'KeepJobFlowAliveWhenNoSteps': False,
'TerminationProtected': False
'Name': 'Spark'
'Name': 'Install',
'ScriptBootstrapAction': {
'Path': 's3://path/to/bootstrap.script'
'Name': 'StepName',
'ActionOnFailure': 'TERMINATE_CLUSTER',
'HadoopJarStep': {
'Jar': 's3n://elasticmapreduce/libs/script-runner/script-runner.jar',
'Args': [
"/usr/bin/spark-submit", "--deploy-mode", "cluster",
's3://path/to/code.file', '-i', 'input_arg',
'-o', 'output_arg'
return "Started cluster {}".format(cluster_id)
Attaching a step to an already running cluster
As per here
def lambda_handler(event, context):
conn = boto3.client("emr")
# chooses the first cluster which is Running or Waiting
# possibly can also choose by name or already have the cluster id
clusters = conn.list_clusters()
# choose the correct cluster
clusters = [c["Id"] for c in clusters["Clusters"]
if c["Status"]["State"] in ["RUNNING", "WAITING"]]
if not clusters:
sys.stderr.write("No valid clusters\n")
# take the first relevant cluster
cluster_id = clusters[0]
# code location on your emr master node
CODE_DIR = "/home/hadoop/code/"
# spark configuration example
step_args = ["/usr/bin/spark-submit", "--spark-conf", "your-configuration",
CODE_DIR + "", '--your-parameters', 'parameters']
step = {"Name": "what_you_do-" + time.strftime("%Y%m%d-%H:%M"),
'ActionOnFailure': 'CONTINUE',
'HadoopJarStep': {
'Jar': 's3n://elasticmapreduce/libs/script-runner/script-runner.jar',
'Args': step_args
action = conn.add_job_flow_steps(JobFlowId=cluster_id, Steps=[step])
return "Added step: %s"%(action)
AWS Lambda function python code if you want to execute Spark jar using spark submit command:
from botocore.vendored import requests
import json
def lambda_handler(event, context):
headers = { "content-type": "application/json" }
url = 'http://ip-address.ec2.internal:8998/batches'
payload = {
'file' : 's3://Bucket/Orchestration/RedshiftJDBC41.jar
'className' : 'Main Class Name',
'args' : [event.get('rootPath')]
res =, data = json.dumps(payload), headers = headers, verify = False)
json_data = json.loads(res.text)
return json_data.get('id')
To follow up on this question:
Filter CloudWatch Logs to extract Instance ID
I think it leaves the question incomplete because it does not say how to access the event object with python.
My goal is to:
read the instance that was triggered by a change in running state
get a tag value associated with the instance
start all other instances that have the same tag
The Cloudwatch trigger event is:
"source": [
"detail-type": [
"EC2 Instance State-change Notification"
"detail": {
"state": [
I can see examples like this:
def lambda_handler(event, context):
# here I want to get the instance tag value
# and set the tag filter based on the instance that
# triggered the event
filters = [{
'Name': 'tag:StartGroup',
'Values': ['startgroup1']
'Name': 'instance-state-name',
'Values': ['running']
instances = ec2.instances.filter(Filters=filters)
I can see the event object but I don't see how to drill down into the tag of the instance that had it's state changed to running.
Please, what is the object attribute through which I can get a tag from the triggered instance?
I suspect it is something like:
myTag = event.details.instance-id.tags["startgroup1"]
The event data passed to Lambda contains the Instance ID.
You then need to call describe_tags() to retrieve a dictionary of the tags.
import boto3
client = boto3.client('ec2')
'Name': 'resource-id',
'Values': [
In the Details Section of the Event, you will get the instance Id's. Using the instance id and AWS SDK you can query the tags. The following is the sample event
"version": "0",
"id": "ee376907-2647-4179-9203-343cfb3017a4",
"detail-type": "EC2 Instance State-change Notification",
"source": "aws.ec2",
"account": "123456789012",
"time": "2015-11-11T21:30:34Z",
"region": "us-east-1",
"resources": [
"detail": {
"instance-id": "i-abcd1111",
"state": "running"
This is what I came up with...
Please let me know how it can be done better. Thanks for the help.
# StartMeUp_Instances_byOne
# This lambda script is triggered by a CloudWatch Event, startGroupByInstance.
# Every evening a separate lambda script is launched on a schedule to stop
# all non-essential instances.
# This script will turn on all instances with a LaunchGroup tag that matches
# a single instance which has been changed to the running state.
# To start all instances in a LaunchGroup,
# start one of the instances in the LaunchGroup and wait about 5 minutes.
# Costs to run: approx. $0.02/month
# 150 executions per month * 128 MB Memory * 60000 ms Execution Time
# Problems: talk to chrisj
# ======================================
# test system
# this is what the event object looks like (see below)
# it is configured in the test event object with a specific instance-id
# change that to test a different instance-id with a different LaunchGroup
# { "version": "0",
# "id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
# "detail-type": "EC2 Instance State-change Notification",
# "source": "aws.ec2",
# "account": "999999999999999",
# "time": "2015-11-11T21:30:34Z",
# "region": "us-east-1",
# "resources": [
# "arn:aws:ec2:us-east-1:123456789012:instance/i-abcd1111"
# ],
# "detail": {
# "instance-id": "i-0aad9474", # <---------- chg this
# "state": "running"
# }
# }
# ======================================
import boto3
import logging
import json
ec2 = boto3.resource('ec2')
def get_instance_LaunchGroup(iid):
# When given an instance ID as str e.g. 'i-1234567',
# return the instance LaunchGroup.
ec2 = boto3.resource('ec2')
ec2instance = ec2.Instance(iid)
thisTag = ''
for tags in ec2instance.tags:
if tags["Key"] == 'LaunchGroup':
thisTag = tags["Value"]
return thisTag
# this is the entry point for the cloudwatch trigger
def lambda_handler(event, context):
# get the instance id that triggered the event
thisInstanceID = event['detail']['instance-id']
print("instance-id: " + thisInstanceID)
# get the LaunchGroup tag value of the thisInstanceID
thisLaunchGroup = get_instance_LaunchGroup(thisInstanceID)
print("LaunchGroup: " + thisLaunchGroup)
if thisLaunchGroup == '':
print("No LaunchGroup associated with this InstanceID - ending lambda function")
# set the filters
filters = [{
'Name': 'tag:LaunchGroup',
'Values': [thisLaunchGroup]
'Name': 'instance-state-name',
'Values': ['stopped']
# get the instances based on the filter, thisLaunchGroup and stopped
instances = ec2.instances.filter(Filters=filters)
# get the stopped instance IDs
stoppedInstances = [ for instance in instances]
# make sure there are some instances not already started
if len(stoppedInstances) > 0:
startingUp = ec2.instances.filter(InstanceIds=stoppedInstances).start()
print ("Finished launching all instances for tag: " + thisLaunchGroup)
So, here's how I got the tags in my Python code for my Lambda function.
ec2 = boto3.resource('ec2')
instance = ec2.Instance(instanceId)
# get image_id from instance-id
imageId = instance.image_id
for tags in instance.tags:
if tags["Key"] == 'Name':
newName = tags["Value"] + ""
So, using instance.tags and then checking the "Key" matching my Name tags and pulling out the "Value" for creating the FQDN (Fully Qualified Domain Name).
I am trying to add a tag to existing ec2 instances using create_tags.
ec2 = boto3.resource('ec2', region_name=region)
instances = ec2.instances.filter(Filters=[{'Name': 'instance-state-name',
'Values': ['running']}])
for instance in instances:
ec2.create_tags([], {"TagName": "TagValue"})
This is giving me this error:
TypeError: create_tags() takes exactly 1 argument (3 given)
First, you CANNOT use boto3.resource("ec2") like that. The boto3.resource is a high level layer that associate with particular resources. Thus following already return the particular instances resources. The collection document always looks like this
# resource will inherit associate instances/services resource.
tag = resource.create_tags(
'Key': 'string',
'Value': 'string'
So in your code,you JUST reference it directly on the resource collection :
for instance in instances:
instance.create_tags(Tags={'TagName': 'TagValue'})
Next, is the tag format, follow the documentation. You get the filter format correct, but not the create tag dict
response = client.create_tags(
'Key': 'string',
'Value': 'string'
On the other hand, boto3.client()are low level client that require an explicit resources ID .
import boto3
ec2 = boto3.client("ec2")
reservations = ec2.describe_instances(
Filters=[{'Name': 'instance-state-name',
'Values': ['running']}])["Reservations"]
mytags = [{
"Key" : "TagName",
"Value" : "TagValue"
"Key" : "APP",
"Value" : "webapp"
"Key" : "Team",
"Value" : "xteam"
for reservation in reservations :
for each_instance in reservation["Instances"]:
Resources = [each_instance["InstanceId"] ],
Tags= mytags
A reason to use resources is code reuse for universal object, i.e., following wrapper let you create tags for any resources.
def make_resource_tag(resource , tags_dictionary):
response = resource.create_tags(
Tags = tags_dictionary)