How to access EMR master private ip address using pure python / boto - python

I've searched on this site and google but have not been able to get an answer for this.
I have code running from an EC2 instance which creates and manager EMR clusters using boto.
I can use this framework to get the flow_id (or cluster_id, not sure which is the right name for it), it start with "j-" and has a fixed amount of chars to identify the cluster.
Using the framework I can establish an emr or ec2 connection, but for the life of me I cannot do the following using boto:
aws emr --list-clusters --cluster-id=j-ASDFGHJKL | json '["instances"].[0].["privateipaddress"]
**The above is a little fudged, I cannot remember the json format and what the json command is or what args it wants, but cli nonetheless.
I've pprint.pprint()'ed and inspected with inspect.getmembers() the connections, getting the conn to the specific cluster_id, but I have yet to see this field/var/attribute with or without method calls.
I've been up and down amazon and boto, how do they do it like
here?
In the
def test_list_instances(self): #line 317
...
self.assertEqual(response.instances[0].privateipaddress , '10.0.0.60')
...
P.S. I've tried this but python complains that "instances" property is not iterable, array accessable (i forget the "var[0]" naming), and something else I tried, including inspecting.
BTW, i can access the publicDNSaddress from here, and many other things, just not the privateIP...
Please tell me if I messup up somewhere and where I can find the answer, i'm using an ugly fix by using subprocess!

If you are asking for taking master ip of emr then below commands will work:
list_intance_resp = boto3.client('emr',region_name='us-east-1').list_instances(ClusterId ='j-XXXXXXX')
print list_intance_resp['Instances'][len(list_intance_resp['Instances'])-1]['PrivateIpAddress']

Check your version of boto using
pip show boto
My guess is your using a version 2.24 or earlier as this version did not parse instance information see https://github.com/boto/boto/blob/2.24.0/tests/unit/emr/test_connection.py#L117
compared to
https://github.com/boto/boto/blob/2.25.0/tests/unit/emr/test_connection.py#L313
If you upgrade your version of boto to 2.25 or newer you'll be able to do the following
from boto.emr.connection import EmrConnection
conn = EmrConnection(<your aws key>, <your aws secret key>)
jobid = 'j-XXXXXXXXXXXXXX' # your job id
response = conn.list_instances(jobid)
for instance in response.instances:
print instance.privateipaddress

You just need to query the master instances from the Master Instance Group with the help of the EMR cluster-ID. If you have more than one master, you can parse the boto3 output and take the IP if the first listed master.
Your Boto3 execution environment should have access to EMR describe a cluster and its instance groups. Here is the
emr_list_instance_rep = boto_client_emr.list_instances(
ClusterId=cluster_id,
InstanceGroupTypes=[
'MASTER',
],
InstanceStates=[
'RUNNING',
]
)
return emr_list_instance_rep["Instances"][0]["PrivateIpAddress"]
You can find the full boto3 script and its explanation here https://scriptcrunch.com/script-retrieve-aws-emr-master-ip/

Related

How can I dynamically observe/change limits of a Autoscale group?

I want to modify the number of minimum/maximum/target instances of an autoscale group and see if there's any instance on from this autoscale group, all dynamically using the AWS SDK for Python. How can I do it?
I'm unable to find it from literature.
I will help you by pointing out where you can find informaton about using AutoScaling and the AWS SDK for Python. Refer to the AWS SDK Code Examples Code Library.
This doc should be the reference point when you want to learn how to do tasks using a given AWS SDK.
See:
https://docs.aws.amazon.com/code-library/latest/ug/auto-scaling_example_auto-scaling_Scenario_GroupsAndInstances_section.html
First verify your time is sync with aws:
sudo ntpdate pool.ntp.org
Read configuration:
import boto3
client = boto3.client('autoscaling')
response = client.describe_auto_scaling_groups(
AutoScalingGroupNames=[
'autoscaling_group_name',
]
)
print(response['AutoScalingGroups'][0]['MinSize'], response['AutoScalingGroups'][0]['MaxSize'], response['AutoScalingGroups'][0]['DesiredCapacity'], response['AutoScalingGroups'][0]['Instances'])
Set desired/min/max:
response = client.update_auto_scaling_group(
AutoScalingGroupName='autoscaling_group_name',
MinSize=123,
MaxSize=123,
DesiredCapacity=123,
)

azure virtual machine operations

I am new to azure.I am learning azure python sdk and have some doubts.
I am not using any credentials to log in azure account and still can access
VM's in subscription in my code below, how?
I am trying to get list of all VM's using list_all() which is present in azure doc https://learn.microsoft.com/en-us/python/api/azure-mgmt-compute/azure.mgmt.compute.v2018_10_01.operations.virtualmachinesoperations?view=azure-python#list-all-custom-headers-none--raw-false----operation-config-
, How can i get list of VM's or how to iterate over VirtualMachinePaged object return by list_all() to get list of VM's?
When i tried to print name of VM using #print(client.virtual_machines.get(resource_group_name='GSLab', vm_name='GSLabVM2')) i got error Resource group 'GSLab' could not be found.
, i checked and sure that name of resource group in 'GSLab', so why am i getting this error?
Here is my code, Thank you and please suggest any other source for better understanding of these concepts if possible.
from azure.common.client_factory import get_client_from_auth_file
from azure.mgmt.compute import ComputeManagementClient
client = get_client_from_auth_file(ComputeManagementClient)
#print(client)
vmlist = client.virtual_machines.list_all()
print(vmlist)
for vm in vmlist:
print(vm.name)
print(client.virtual_machines.get(resource_group_name='GSLab', vm_name='GSLabVM2'))
Q1: You get the credentials from the Authentication file that you set and the service principal is in it.
Q2: You just need to delete the print(vmlist) and then everything is OK.
Q3:
The code:
client.virtual_machines.get(resource_group_name='GSLab', vm_name='GSLabVM2')
The result will like this:
So you need to check that if the resource group 'GSLab' really exist in the subscription you have set in the Authentication file.
vmlist = client.virtual_machines.list_all()
for vm in vmlist:
print(vm.name)
this code is correct and this one as well:
client.virtual_machines.get(resource_group_name='GSLab', vm_name='GSLabVM2')
if they both return nothing you authenticated to the wrong subscription, you need to auth to the proper subscription.
simple way to check you got some output:
vmlist.next().name

How to instantiate an AWS Linux using python API?

1) Instantiate an AWS Linux, micro instance using the AWS python API (include authentication to AWS)
2) Update the instance with tags: customer=ACME, environment=PROD
3) Assign a security group to the instance
To program in Python on AWS, you should use the boto3 library.
You will need to do the following:
supply credentials to the library (link)
create an EC2 client (link)
use the EC2 client to launch EC2 instances using run_instances (link)
You can specify both tags and security groups in the run_instances call. Additionally, the boto3 documentation provides some Amazon EC2 examples that will help.
Maybe you want to observe this project:
https://github.com/nchammas/flintrock
This is a hadoop and apache spark clustering project. But, it can inspire you.
Actually, there is many feature that you want like security group or filtering by tag name. Just, look around of code

Discovering peer instances in Azure Virtual Machine Scale Set

Problem: Given N instances launched as part of VMSS, I would like my application code on each azure instance to discover the IP address of the other peer instances. How do I do this?
The overall intent is to cluster the instances so, as to provide active passive HA or keep the configuration in sync.
Seems like there is some support for REST API based querying : https://learn.microsoft.com/en-us/rest/api/virtualmachinescalesets/
Would like to know any other way to do it, i.e. either python SDK or instance meta data URL etc.
The RestAPI you mentioned has a Python SDK, the "azure-mgmt-compute" client
https://learn.microsoft.com/python/api/azure.mgmt.compute.compute.computemanagementclient
One way to do this would be to use instance metadata. Right now instance metadata only shows information about the VM it's running on, e.g.
curl -H Metadata:true "http://169.254.169.254/metadata/instance/compute?api-version=2017-03-01"
{"compute":
{"location":"westcentralus","name":"imdsvmss_0","offer":"UbuntuServer","osType":"Linux","platformFaultDomain":"0","platformUpdateDomain":"0",
"publisher":"Canonical","sku":"16.04-LTS","version":"16.04.201703300","vmId":"e850e4fa-0fcf-423b-9aed-6095228c0bfc","vmSize":"Standard_D1_V2"},
"network":{"interface":[{"ipv4":{"ipaddress":[{"ipaddress":"10.0.0.4","publicip":"52.161.25.104"}],"subnet":[{"address":"10.0.0.0","dnsservers":[],"prefix":"24"}]},
"ipv6":{"ipaddress":[]},"mac":"000D3AF8BECE"}]}}
You could do something like have each VM send the info to a listener on VM#0, or to an external service, or you could combine this with Azure Files, and have each VM output to a common share. There's an Azure template proof of concept here which outputs information from each VM to an Azure File share.. https://github.com/Azure/azure-quickstart-templates/tree/master/201-vmss-azure-files-linux - every VM has a mountpoint which contains info written by every VM.

Launch Openstack Instances using python-boto

I am trying to launch instances on opensatck setup with multiple networks configured using python-boto.
But I got following error,
EC2ResponseError: EC2ResponseError: 400 Bad Request
<?xml version="1.0"?>
<Response><Errors><Error><Code>NetworkAmbiguous</Code><Message>Multiple possible networks found, use a Network ID to be more specific.</Message></Error></Errors><RequestID>req-28b5a4e8-3838-4111-95db-337c5048716d</RequestID></Response>
My code is like here,
from boto import ec2
ostack = ec2.connection.EC2Connection(
ec2_access_key, ec2_secret_key, is_secure=False, port=8773, region='nova',
path='/services/Cloud'
)
ostack.run_instances('ami-xxxxx', key_name='BotoTest')
The above is working fine for single network configured to openstack.
Note: run_instances doesn't have keyword argument for network-id.
Where I made a mistake or how to fix it? or is it bug in python-boto?
Advance in Thanks.
I believe that this isn't a bug of boto, which was built to communicate with the AWS-API. While most of the EC2-AWS functionality work well with the EC2-OpenStack API, some features are not implemented and are answered with a HTTP-Error 500 or 400.
AWS use the VPC (Virtual Private Cloud) as Network and an Availability Zone as Subnet. Both have a default setting, which is taken if there is no further specification when creating a new instance. But in OpenStack I can't see a possibility to mark a Network and a Subnet as default.
In my attempts, neither private_ip_address nor subnet_id works to specify a network/subnet at run_instances() if there are more than one at OpenStack.
Edit: if you only have one network/subnet, the following code works fine with boto at trystack.org:
import boto
conn = boto.connect_ec2_endpoint("http://8.21.28.222:8773/services/Cloud",aws_access_key_id='...',aws_secret_access_key='...')
new_instance = conn.run_instances("ami-00000020", key_name="trystack", security_groups=["default"], instance_type="m1.small")
Have you tried? :
from boto import ec2
ostack = ec2.connection.EC2Connection(
ec2_access_key, ec2_secret_key, is_secure=False, port=8773, region='nova',
path='/services/Cloud', debug=1
)
then
ostack.run_instances('ami-xxxxx', subnet_id='your network id', key_name='BotoTest')
Amazon uses this for VPC networks? Are you running a VPC?

Categories

Resources