I am trying to use AWS python library boto3 to create a session. I found out we can do that either
session = boto3.Session(profile_name='profile1')
or
session2 = boto3.session.Session(profile_name='profile2')
I have checked their docs, it suppose to use boto3.session.Session().
Why both ways work ? What the different of concept behind them ?
It is just for convenience; they both refer to the same class. What is happening here is that the __init__.py for the python boto3 package includes the following:
from boto3.session import Session
This just allows you to refer to the Session class in your python code as boto3.Session rather than boto3.session.Session.
This article provides more information about this python idiom:
One common thing to do in your __init__.py is to import selected Classes, functions, etc into the package level so they can be conveniently imported from the package.
Related
Let's say:
I have my python code in main.py and I am using Pandas
I am storing my API Key(to some azure service) in a Windows Environment Variable ( variable name = "AZURE_KEY" and variable_value = "abc123abc")
I will import this API Key in main.py using azure_key = os.environ.get("AZURE_KEY")
Question:
How can I be sure that Pandas Library hasn't sent azure_key's value to somewhere outside my local system?
Possible Approach:
I know one way is to go through the entire Pandas module files and understand the source code to see if any fishy stuff is happening , but such an approach is not feasible.
Note:
Pandas is just an example for the question.I want to use an API Key within a Streamlit code.
Hence,Please take this question agnostic to the library..
For a production system (on a server), you could use a firewall to filter outgoing connections
For a development system (your machine), you could add restrictions to the "API Key" account (e.g. only access test data, only access systems you really need, etc.)
In Java, for instance, we have a class that represents the SageMaker client class: AmazonSageMakerClient, but I couldn't find the equivalent for Python.
I was hoping to be able to do something like:
from sagemaker import SageMakerClient
client: SageMakerClient = boto3.client("sagemaker")
I looked into the library code and docs but I couldn't find any references to such class containing the defined methods for that client. In fact, I couldn't find any classes for AWS clients like s3, sqs, etc. Are those hidden somewhere or am I missing something obvious?
In boto3, there is basically 2 levels of objects avaialble:
A client
Actual objects like you are asking about
Take a look at S3, and you will see that in addition to the Client object there are also other rich object types like Bucket.
It would seem that Sagemaker doesn't (yet) have this second level of abstraction available.
To be more productive, and work with Python classes rather than Json, try to use the SageMaker Python SDK whenever possible rather than Boto3 clients.
With Boto3 you have several SageMaker clients (As #anon said correctly):
SageMaker - Most of SageMaker features
SageMakerRuntime - Invoking endpoints
SageMaker* - Other misc SageMaker features like feature store, edge manager, ...
The boto3-stubs library can help with this.
Install using the instructions for your IDE on the package page, and then install the specific type annotations for SageMaker.
pip install 'boto3-stubs[sagemaker]'
You should be able to see type hints for the client object (type: SageMakerClient).
import boto3
client = boto3.client('sagemaker')
If you need to add hints yourself:
from mypy_boto3_sagemaker import SageMakerClient
def my_func(client: SageMakerClient):
client.create_algorithm(...)
So, I think I'm running up against an issue with out of date documentation. According to the documentation here I should be able to use list_schemas() to get a list of schemas defined in the Hive Data Catalog: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.list_schemas
However, this method doesn't seem to exist:
import boto3
glue = boto3.client('glue')
glue.list_schemas()
AttributeError: 'Glue' object has no attribute 'list_schemas'
Other methods (e.g. list_crawlers()) still appear to be present and work just fine. Has this method been moved? Do I need to install some additional boto3 libraries for this to work?
Based on the comments.
The issue was caused by using old boto3. Upgrading to the newer version solved the issue.
You should make a session first, and use the client method of the session, then it should work:
import boto3
session = boto3.session.Session()
glue_client = session.client('glue')
schemas_name = glue_client.list_schemas()
When developing a dockerized AWS application locally, it's a common practice to simulate Amazon's services using LocalStack. One way to get the Python application talking to LocalStack at test-time is to monkeypatch the Boto Client and ServiceResource, and use the "link" feature in the docker-compose file.
Unfortunately the Docker-Compose reference manual advises against this. It seems that Docker might remove the link feature. Instead they recommend that we should use the internal networks feature of a docker-compose file. This mean that instead of accessing the LocalStack services via localhost (e.g. http://localhost:4566) it will be via something like http://localstack:4566, provided the service-name was "localstack".
Is there a way to change the monkey-patch configuration so that this works? This is the standard monkey-patch code:
import localstack_client.session
import pytest
#pytest.fixture(autouse=True)
def boto3_localstack_patch(monkeypatch):
session_ls = localstack_client.session.Session()
monkeypatch.setattr(boto3, "client", session_ls.client)
monkeypatch.setattr(boto3, "resource", session_ls.resource)
There's no obvious way to indicate that the tests ought to use a different hostname, so how do I do this?
Found an answer:
#pytest.fixture(autouse=True)
def boto3_localstack_patch(monkeypatch):
session_ls = localstack_client.session.Session(localstack_host="localstack")
monkeypatch.setattr(boto3, "client", session_ls.client)
monkeypatch.setattr(boto3, "resource", session_ls.resource)
The localstack_client.session.Session() object takes an argument localstack_host which can be used to specify how to connect to LocalStack if not on localhost.
I have legacy Boto3 code that makes a lot of use of the default Boto3 session, e.g.
import boto3
client = boto3.client('ec2')
client.describe_images(DryRun=False)
...
I wish to write unit tests for this legacy code using placebo.
However, docs there seem to imply that the code-under-test would need to always manage the Boto3 session explicitly, i.e.
import boto3
import placebo
session = boto3.Session()
pill = placebo.attach(session, data_path='/path/to/response/directory')
pill.record()
client = session.client('ec2')
client.describe_images(DryRun=False)
...
My reading of the code (e.g.) is that this is quite a limitation of the Placebo Mock framework, although I am no expert Python programmer.
Am I misunderstanding something basic - is there any way to work-around this, or would I have to refactor all my legacy code to explicitly pass around a session?
placebo needs a Session object and the examples all show creating an explicit Session object but I think you could just reference the "built-in" Session object.
import boto3
import placebo
pill = placebo.attach(boto3.session, data_path='/path/to/response/directory')
I figured it out by reading through the Boto3 unit tests (ref).
To attach Placebo to the default session, it is necessary to explicitly setup the default session, before calling Placebo:
import boto3
import placebo
boto3.setup_default_session()
session = boto3.DEFAULT_SESSION
pill = placebo.attach(session, data_path='/path/to/response/directory')
pill.record()
client = boto3.client('ec2')
client.describe_images(DryRun=False)
Now, just by adding those four lines, I can record Boto3 calls in my legacy code, without further refactoring.
I will raise a pull request to add these notes in the Placebo README.