So, I think I'm running up against an issue with out of date documentation. According to the documentation here I should be able to use list_schemas() to get a list of schemas defined in the Hive Data Catalog: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.list_schemas
However, this method doesn't seem to exist:
import boto3
glue = boto3.client('glue')
glue.list_schemas()
AttributeError: 'Glue' object has no attribute 'list_schemas'
Other methods (e.g. list_crawlers()) still appear to be present and work just fine. Has this method been moved? Do I need to install some additional boto3 libraries for this to work?
Based on the comments.
The issue was caused by using old boto3. Upgrading to the newer version solved the issue.
You should make a session first, and use the client method of the session, then it should work:
import boto3
session = boto3.session.Session()
glue_client = session.client('glue')
schemas_name = glue_client.list_schemas()
Related
In Java, for instance, we have a class that represents the SageMaker client class: AmazonSageMakerClient, but I couldn't find the equivalent for Python.
I was hoping to be able to do something like:
from sagemaker import SageMakerClient
client: SageMakerClient = boto3.client("sagemaker")
I looked into the library code and docs but I couldn't find any references to such class containing the defined methods for that client. In fact, I couldn't find any classes for AWS clients like s3, sqs, etc. Are those hidden somewhere or am I missing something obvious?
In boto3, there is basically 2 levels of objects avaialble:
A client
Actual objects like you are asking about
Take a look at S3, and you will see that in addition to the Client object there are also other rich object types like Bucket.
It would seem that Sagemaker doesn't (yet) have this second level of abstraction available.
To be more productive, and work with Python classes rather than Json, try to use the SageMaker Python SDK whenever possible rather than Boto3 clients.
With Boto3 you have several SageMaker clients (As #anon said correctly):
SageMaker - Most of SageMaker features
SageMakerRuntime - Invoking endpoints
SageMaker* - Other misc SageMaker features like feature store, edge manager, ...
The boto3-stubs library can help with this.
Install using the instructions for your IDE on the package page, and then install the specific type annotations for SageMaker.
pip install 'boto3-stubs[sagemaker]'
You should be able to see type hints for the client object (type: SageMakerClient).
import boto3
client = boto3.client('sagemaker')
If you need to add hints yourself:
from mypy_boto3_sagemaker import SageMakerClient
def my_func(client: SageMakerClient):
client.create_algorithm(...)
I am trying to use AWS python library boto3 to create a session. I found out we can do that either
session = boto3.Session(profile_name='profile1')
or
session2 = boto3.session.Session(profile_name='profile2')
I have checked their docs, it suppose to use boto3.session.Session().
Why both ways work ? What the different of concept behind them ?
It is just for convenience; they both refer to the same class. What is happening here is that the __init__.py for the python boto3 package includes the following:
from boto3.session import Session
This just allows you to refer to the Session class in your python code as boto3.Session rather than boto3.session.Session.
This article provides more information about this python idiom:
One common thing to do in your __init__.py is to import selected Classes, functions, etc into the package level so they can be conveniently imported from the package.
Trying to simply connect to the google-cloud-storage using these instructions;
https://googleapis.github.io/google-cloud-python/latest/storage/index.html
However, I keep getting the problem with the storage module, no client attribute.
from google.cloud import storage
# Instantiates a client
storage_client = storage.Client(credentials=creds, project='name')
# The name for the new bucket
bucket_name = 'my-new-bucket'
# Creates the new bucket
bucket = storage_client.create_bucket(bucket_name)
print('Bucket {} created.'.format(bucket.name))
This is a problem I've seen several times, and happens as well in other google.cloud modules. Most of the time it is related to a broken installation
Try to uninstall and then installgoogle.cloud packages. If no luck, try to use it on a newly created virtual environment (this will work for sure)
Related git issue with same solution
I try to tag various objects in AWS using python. AFAIK it's not possible for some services using boto. Therefore, I decided to take a look at boto3. I've got stacked on the RDS. Based on the documentation the add_tags_to_resource method needs a resource ARN. I don't see a way to get it.
To address above problem I thought about creating ARN on my own. After all it's not so hard -RDS Tagging documentation. But there is another problem. In my script I cannot guarantee to know account number, so I wonder how can I get account number to create ARN on my own.
I wonder how can I get account number
Unfortunately it's not that easy to find. But you can have some hacks:
If you have access to certain API calls, you can get security group or AMI and check for an OwnerId.
>>> import boto3
>>> client = boto3.client('ec2')
>>> client.describe_security_groups()['SecurityGroups'][0]['OwnerId']
'1234567890'
This trick will only work if you can guarantee that SG or AMI was created by the account that you are looking for.
OR
Use make an API call to the IAM and parse ARN of your own Role or User
>>> client = boto3.client('iam')
>>> client.get_user()['User']['Arn'].split(':')[4]
upgrade your boto3 to the latest version, it now returns a new attribute DBInstanceArn
http://boto3.readthedocs.io/en/latest/reference/services/rds.html#RDS.Client.describe_db_instances
Using boto3 you can get the db arn with this code....
from boto3 import client
REGION = "us-east-1"
INSTANCES = ["db-name-01"]
rds = client("rds", region_name=REGION)
if INSTANCES:
for instance in INSTANCES:
instance_counts = {}
response = rds.describe_db_instances(DBInstanceIdentifier=instance)
dbins = response['DBInstances']
dbin = dbins[0]
dbarn = dbin['DBInstanceArn']
You can get the rds arn from
rds_dict = self.rds_client.describe_db_instances()
if you iterate through rds_dict you will get a key named DBInstanceArn
i am using boto3 for this
I've searched on this site and google but have not been able to get an answer for this.
I have code running from an EC2 instance which creates and manager EMR clusters using boto.
I can use this framework to get the flow_id (or cluster_id, not sure which is the right name for it), it start with "j-" and has a fixed amount of chars to identify the cluster.
Using the framework I can establish an emr or ec2 connection, but for the life of me I cannot do the following using boto:
aws emr --list-clusters --cluster-id=j-ASDFGHJKL | json '["instances"].[0].["privateipaddress"]
**The above is a little fudged, I cannot remember the json format and what the json command is or what args it wants, but cli nonetheless.
I've pprint.pprint()'ed and inspected with inspect.getmembers() the connections, getting the conn to the specific cluster_id, but I have yet to see this field/var/attribute with or without method calls.
I've been up and down amazon and boto, how do they do it like
here?
In the
def test_list_instances(self): #line 317
...
self.assertEqual(response.instances[0].privateipaddress , '10.0.0.60')
...
P.S. I've tried this but python complains that "instances" property is not iterable, array accessable (i forget the "var[0]" naming), and something else I tried, including inspecting.
BTW, i can access the publicDNSaddress from here, and many other things, just not the privateIP...
Please tell me if I messup up somewhere and where I can find the answer, i'm using an ugly fix by using subprocess!
If you are asking for taking master ip of emr then below commands will work:
list_intance_resp = boto3.client('emr',region_name='us-east-1').list_instances(ClusterId ='j-XXXXXXX')
print list_intance_resp['Instances'][len(list_intance_resp['Instances'])-1]['PrivateIpAddress']
Check your version of boto using
pip show boto
My guess is your using a version 2.24 or earlier as this version did not parse instance information see https://github.com/boto/boto/blob/2.24.0/tests/unit/emr/test_connection.py#L117
compared to
https://github.com/boto/boto/blob/2.25.0/tests/unit/emr/test_connection.py#L313
If you upgrade your version of boto to 2.25 or newer you'll be able to do the following
from boto.emr.connection import EmrConnection
conn = EmrConnection(<your aws key>, <your aws secret key>)
jobid = 'j-XXXXXXXXXXXXXX' # your job id
response = conn.list_instances(jobid)
for instance in response.instances:
print instance.privateipaddress
You just need to query the master instances from the Master Instance Group with the help of the EMR cluster-ID. If you have more than one master, you can parse the boto3 output and take the IP if the first listed master.
Your Boto3 execution environment should have access to EMR describe a cluster and its instance groups. Here is the
emr_list_instance_rep = boto_client_emr.list_instances(
ClusterId=cluster_id,
InstanceGroupTypes=[
'MASTER',
],
InstanceStates=[
'RUNNING',
]
)
return emr_list_instance_rep["Instances"][0]["PrivateIpAddress"]
You can find the full boto3 script and its explanation here https://scriptcrunch.com/script-retrieve-aws-emr-master-ip/