Connect Function App to CosmosDB with Managed Identity

Connect Function App to CosmosDB with Managed Identity - python

I'm trying to write a function in a Function App that manipulates data in a CosmosDB. I get it working if I drop the read-write key in the environment variables. To make it more robust I wanted it to work as a managed identity app. The app has the role 'DocumentDB Account Contributor' on the Cosmos DB.
However, the CosmosClient constructor doesn't accept a Credential and needs the read-write key. I've been chasing down the rabbit hole of azure.mgmt.cosmosdb.operations where there is a DatabaseAccountsOperations class with a list_keys() method. I can't find a neat way to access that function though. If I try to create that object (which requires poaching the config, serializer and deserializer from my dbmgmt object) it still requires the resourceGroupName and accountName.
I can't help but think that I've taken a wrong turn somewhere because this has to be possible in a more straightforward manner. Especially given that the JavaScript SDK references a more logical class CosmosDBManagementClient in line with the SubscriptionClient. However, I can't find that class anywhere on the python side.
Any pointers?
from azure.identity import DefaultAzureCredential
from azure.cosmos import CosmosClient
from azure.mgmt.resource import SubscriptionClient
from azure.mgmt.cosmosdb import CosmosDB
from .cred_wrapper import CredentialWrapper
def main(req: func.HttpRequest) -> func.HttpResponse:
request_body = req.get_body()
# credential = DefaultAzureCredential()
# https://gist.github.com/lmazuel/cc683d82ea1d7b40208de7c9fc8de59d
credential = CredentialWrapper()
uri = os.environ.get('cosmos-db-uri')
# db = CosmosClient(url=uri, credential=credential) # Doesn't work, wants a credential that is a RW/R key
# Does work if I replace it with my primary / secondary key but the goal is to remove dependence on that.
subscription_client = SubscriptionClient(credential)
subscription = next(subscription_client.subscriptions.list())
dbmgmt = CosmosDB(credential, subscription.subscription_id) # This doesn't accept the DB URI??
operations = list(dbmgmt.operations.list()) # I see the list_keys() Operation there...
EDIT
A helpful soul provided a response here but removed it before I could even react or accept it as the answer. They pointed out that there is an equivalent python SDK and that from azure.mgmt.cosmosdb import CosmosDBManagementClient would do the trick.
From there, I was on my own as that resulted in
ImportError: cannot import name 'CosmosDBManagementClient' from 'azure.mgmt.cosmosdb'
I believe the root of the problem lies in an incompatibility of the package azure-mgmt. After removing azure-mgmt from my requirements.txt and only loading the cosmos and identiy related packages, the import error was resolved.
This solved 90% of the problem.
dbmgmt = CosmosDBManagementClient(credential, subscription.subscription_id, c_uri)
print(dbmgmt.database_accounts.list_keys())
TypeError: list_keys() missing 2 required positional arguments: 'resource_group_name' and 'account_name'
Does one really need to collect each of these parameters? Compared to the example that reads a secret from a Vault it seems so convoluted.

For other unfortunate ones looking to access CosmosDB with Managed Identity, it seems that this is, as of May 2021, not yet possible.
Source: Discussion on Github

Update 12/05/2021 - I came here finding a solution for this with Javascript/Typescript. So leaving the answer here for others. I think that a similar approach could work for Python.
You can use RBAC for data plane operations with Managed Identities. Finding the documentation was difficult.
RBAC for Cosmos DB data plane operations with Managed Identities
Important - If you get the error Request blocked by Auth mydb : Request is blocked because principal [xxxxxx-6fad-44e4-98bc-2d423a88b65f] does not have required RBAC permissions to perform action Microsoft.DocumentDB/databaseAccounts/readMetadata on resource [/]. Don't use the Portal to assign roles, use the Azure CLI for CosmosDB.
How to - creating a role assignment for a user/system MSI/user MSI is done using the Azure CosmosDB CLI
# Find the role ID:
resourceGroupName='<myResourceGroup>'
accountName='<myCosmosAccount>'
az cosmosdb sql role definition list --account-name $accountName --resource-group $resourceGroupName
# Assign to the MSI or user managed MSI:
readOnlyRoleDefinitionId = '<roleDefinitionId>' # as fetched above
principalId = '<aadPrincipalId>'
az cosmosdb sql role assignment create --account-name $accountName --resource-group $resourceGroupName --scope "/" --principal-id $principalId --role-definition-id $readOnlyRoleDefinitionId
Once this step is done, the code for connecting is very easy. Use the #azure/identity package's Default Credential. This works in Azure Function App with managed identity and on your laptop with VS code or with az login.
Docs for #azure/identity sdk
Examples of authentication with #azure/identity to get the credential object
import { CosmosClient } from "#azure/cosmos";
import { DefaultAzureCredential, ManagedIdentityCredential, ChainedTokenCredential } from "#azure/identity";
const defaultCredentials = new DefaultAzureCredential();
const managedCredentials = new ManagedIdentityCredential();
const aadCredentials = new ChainedTokenCredential(managedCredentials, defaultCredentials);
client = new CosmosClient({
endpoint: "https://mydb.documents.azure.com:443/",
aadCredentials
});

Related

How to list all azure virtual machine images using Python code

I can't able to list all azure vm images using Python code
I can only list specific location , offers like that..I need to list all vm images in Python script

There is no list method in Python that returns all virtual machine images without using any filters such as offers or publishers. When using virtual machine images.list(), You must pass options such as location, offers, and publishers. Otherwise, it throws an error because there are not enough required parameters to get the desired outcome.
Supported list() Methods:
After workaround on this, I could be able to get the results using below script:
from itertools import tee
from azure.identity import DefaultAzureCredential
from azure.mgmt.compute import ComputeManagementClient
credentials = DefaultAzureCredential()
subscription_ID = '<subscriptionID>'
client = ComputeManagementClient(credentials, subscription_ID)
vmlist = client.virtual_machine_images.list(location="eastus")
There are no virtual machine images in my environment that match the specified filters. As an outcome, it was successfully debugged.
Output:
DefaultAzurecredential Auth:
Service Principal Credentials Authentication:
Note: Register a new application to get the client_ID, client_secret details under Azure Active Directory.
Refer SO by #Peter Pan.

As Jahnavi pointed out, there is no way to list all images without specifying the corresponding filters. Not all images are available in all regions and for all customers. However, if you want to list all images, you could iterate through the corresponding lists by fist listing publishers, then offers, then skus and finally images. However, there are A LOT of images, so this will take A LOT of time - and I strongly recommend to filter for at least one of the aforementioned criteria.
The below code should list all images in a given region and a given subscription. Note that it is using the AzureCliCredential class from the Azure Identity library. This requires you to be logged in to Azure through the Azure CLI and should only be used for testing. You can pick another appropriate authentication class from the library if desired.
from azure.identity import AzureCliCredential
from azure.mgmt.compute import ComputeManagementClient
credential = AzureCliCredential()
subscription_id = "{your-subscription-id}"
my_location = "{your-region}"
compute_client = ComputeManagementClient(credential=credential, subscription_id=subscription_id)
results = []
# Get all Publishers in a given location
publishers = compute_client.virtual_machine_images.list_publishers(location=my_location)
for publisher in publishers:
offers = compute_client.virtual_machine_images.list_offers(location=my_location, publisher_name=publisher.name)
for offer in offers:
skus = compute_client.virtual_machine_images.list_skus(location=my_location,publisher_name=publisher.name, offer=offer.name)
for sku in skus:
images = compute_client.virtual_machine_images.list(location=my_location, publisher_name=publisher.name, offer=offer.name, skus=sku.name)
for image in images:
image_dict = dict({
'publisherName' : publisher.name,
'offerName' : offer.name,
'skuName': sku.name,
'imageName': image.name
})
results.append(image_dict)
This will leave you with a dictionary that could be used for further processing. For example, you could load it into a Pandas DataFrame:
import pandas as pd
df = pd.DataFrame(results)
Potential output:
publisherName offerName skuName imageName
...
17101 f5-networks f5-big-ip-good f5-bigip-virtual-edition-10g-good-hourly-po-f5 16.1.202000
17102 f5-networks f5-big-ip-good f5-bigip-virtual-edition-10g-good-hourly-po-f5 16.1.300000
17103 f5-networks f5-big-ip-good f5-bigip-virtual-edition-10g-good-hourly-po-f5 16.1.301000
...
But really, this is only food for thought as it will be pretty bad performancewise and will take a lot of time - and is probably not intended to be used this way. The better option would be to filter for certain publishers first - just sayin'. :-)

Create a Java UDF that uses geoip2 library with the database in a S3 bucket

Correct me if i'm wrong, but my understanding of the UDF function in Snowpark is that you can send the function UDF from your IDE and it will be executed inside Snowflake. I have a staged database called GeoLite2-City.mmdb inside a S3 bucket on my Snowflake account and i would like to use it to retrieve informations about an ip address. So my strategy was to
1 Register an UDF which would return a response string n my IDE Pycharm
2 Create a main function which would simple question the database about the ip address and give me a response.
The problem is that, how the UDF and my code can see the staged file at
s3://path/GeoLite2-City.mmdb
in my bucket, in my case i simply named it so assuming that it will eventually find it (with geoip2.database.Reader('GeoLite2-City.mmdb') as reader:) since the
stage_location='#AWS_CSV_STAGE' is the same as were the UDF will be saved? But i'm not sure if i understand correctly what the option stage_location is referring exactly.
At the moment i get the following error:
"Cannot add package geoip2 because Anaconda terms must be accepted by ORGADMIN to use Anaconda 3rd party packages. Please follow the instructions at https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages.html#using-third-party-packages-from-anaconda."
Am i importing geoip2.database correctly in order to use it with snowpark and udf?
Do i import it by writing session.add_packages('geoip2') ?
Thank You for clearing my doubts.
The instructions i'm following about geoip2 are here.
https://geoip2.readthedocs.io/en/latest/
my code:
from snowflake.snowpark import Session
import geoip2.database
from snowflake.snowpark.functions import col
import logging
from snowflake.snowpark.types import IntegerType, StringType
logger = logging.getLogger()
logger.setLevel(logging.INFO)
session = None
user = ''*********'
password = '*********'
account = '*********'
warehouse = '*********'
database = '*********'
schema = '*********'
role = '*********'
print("Connecting")
cnn_params = {
"account": account,
"user": user,
"password": password,
"warehouse": warehouse,
"database": database,
"schema": schema,
"role": role,
}
def first_udf():
with geoip2.database.Reader('GeoLite2-City.mmdb') as reader:
response = reader.city('203.0.113.0')
print('response.country.iso_code')
return response
try:
print('session..')
session = Session.builder.configs(cnn_params).create()
session.add_packages('geoip2')
session.udf.register(
func=first_udf
, return_type=StringType()
, input_types=[StringType()]
, is_permanent=True
, name='SNOWPARK_FIRST_UDF'
, replace=True
, stage_location='#AWS_CSV_STAGE'
)
session.sql('SELECT SNOWPARK_FIRST_UDF').show()
except Exception as e:
print(e)
finally:
if session:
session.close()
print('connection closed..')
print('done.')
UPDATE
I'm trying to solve it using a java udf as in my staging area i have the 'geoip2-2.8.0.jar' library staged already. If i could import it's methods to get the country of an ip it would be perfect, the problem is that i don't know how to do it exactly. I'm trying to follow these instructions https://maxmind.github.io/GeoIP2-java/.
I wanna interrogate the database and get as output the iso code of the country and i want to do it on snowflake worksheet.
CREATE OR REPLACE FUNCTION GEO()
returns varchar not null
language java
imports = ('#AWS_CSV_STAGE/lib/geoip2-2.8.0.jar', '#AWS_CSV_STAGE/geodata/GeoLite2-City.mmdb')
handler = 'test'
as
$$
def test():
File database = new File("geodata/GeoLite2-City.mmdb")
DatabaseReader reader = new DatabaseReader.Builder(database).build();
InetAddress ipAddress = InetAddress.getByName("128.101.101.101");
CityResponse response = reader.city(ipAddress);
Country country = response.getCountry();
System.out.println(country.getIsoCode());
$$;
SELECT GEO();

This will be more complicated that it looks:
To use session.add_packages('geoip2') in Snowflake you need to accept the Anaconda terms. This is easy if you can ask your account admin.
But then you can only get the packages that Anaconda has added to Snowflake in this way. The list is https://repo.anaconda.com/pkgs/snowflake/, and I don't see geoip2 there yet.
So you will need to package you own Python code (until Anaconda sees enough requests for geoip2 in the wishlist). I described the process here https://medium.com/snowflake/generating-all-the-holidays-in-sql-with-a-python-udtf-4397f190252b.
But wait! GeoIP2 is not pure Python, so you will need to wait until Anaconda packages the C extension libmaxminddb. But this will be harder, as you can see their docs don't offer a straightforward way like other pip installable C libraries.
So this will be complicated.
There are other alternative paths, like a commercial provider of this functionality (like I describe here https://medium.com/snowflake/new-in-snowflake-marketplace-monetization-315aa90b86c).
There other approaches to get this done without using a paid dataset, but I haven't written about that yet - but someone else might before I get to do it.
Btw, years ago I wrote something like this for BigQuery (https://cloud.google.com/blog/products/data-analytics/geolocation-with-bigquery-de-identify-76-million-ip-addresses-in-20-seconds), but today I was notified that Google recently deleted the tables that I had shared with the world (https://twitter.com/matthew_hensley/status/1598386009129058315).
So it's time to rebuild in Snowflake. But who (me?) and when is still a question.

Python KustoManagementClient using my own credentials

I'm trying to use azure-mgmt-kusto Pkg for some Kusto Cluster operations, using KustoManagementClient. This client requires TokenCredential on constructor. For my own scenario, I would like to use my own AAD credentials, preferably using interactive login or IWA (Integrated Windows Authentication). The closest I was able to achieve this is using the following code:
creds = DefaultAzureCredential(exclude_interactive_browser_credential=False).get_token('')
kusto_client = azure.mgmt.kusto.KustoManagementClient(credential=creds, subscription_id='<>')
but this raises an error in the second line:
Expected type 'TokenCredential', got 'AccessToken' instead
which I couldn't find any way around!
Any suggestions on how to resolve this? or other methods to use?

Actually, after simply trying despite the Pycharm warning, this worked:
from azure.identity import DefaultAzureCredential
from azure.mgmt.kusto import KustoManagementClient
credential = DefaultAzureCredential()
kusto_management_client = KustoManagementClient(credential, subId)

How to use `not` condition in the gitlab api issue query

I am trying to read the list of open issues title which doesn't have label resolved. For that I am referring the API documentation (https://docs.gitlab.com/ee/api/issues.html) which mentions NOT but I couldn't able to get the NOT to work.
The following python script I have tried so far to read the list of issues now I am not able to find how to use NOT to filter the issue which doesn't have resolved label.
import gitlab
# private token or personal token authentication
gl = gitlab.Gitlab('https://example.com', private_token='XXXYYYZZZ')
# make an API request to create the gl.user object. This is mandatory if you
# use the username/password authentication.
gl.auth()
# list all the issues
issues = gl.issues.list(all=True,scope='all',state='opened',assignee_username='username')
for issue in issues:
print(issue.title)

From Gitlab issues api documentation, not is of type Hash. It's a special type documented here
For example to exclude the labels Category:DAST and devops::secure, and to exclude the milestone 13.11, you would use the following parameters:
not[labels]=Category:DAST,devops::secure
not[milestone]=13.11
api example: https://gitlab.com/api/v4/issues?scope=all&state=opened&assignee_username=derekferguson&not[labels]=Category:DAST,devops::secure&not[milestone]=13.11
Using gitlab python module, you would need to pass some extra parameters by adding more keyword arguments:
import gitlab
gl = gitlab.Gitlab('https://gitlab.com')
extra_params = {
'not[labels]': "Category:DAST,devops::secure",
"not[milestone]": "13.11"
}
issues = gl.issues.list(all=True, scope='all', state='opened',
assignee_username='derekferguson', **extra_params)
for issue in issues:
print(issue.title)

Is it possible to get the ASC location from the Azure Python SDK?

I am fetching a subscription's Secure Score using the Microsoft Azure Security Center (ASC) Management Client Library. All operations in the library state that
You should not instantiate directly this class, but create a Client instance that will create it for you and attach it as attribute.
Therefore, I am creating a SecurityCenter client with the following specification:
SecurityCenter(credentials, subscription_id, asc_location, base_url=None)
However, it seems to me like the only way to get the asc_location information properly is to use the SecurityCenter client to fetch it... The spec says the same as the quote above, You should not instantiate.... So I am stuck not being able to create the client because I need the ASC location to do so, and I need to create the client to get the ASC locations.
The documentation mentions
The location where ASC stores the data of the subscription. can be retrieved from Get locations
Googling and searching through the Python SDK docs for this "Get locations" gives me nothing (else than the REST API). Have I missed something? Are we supposed to hard-code the location like in this SO post or this GitHub issue from the SDK repository?

As offical API reference list locations indicates:
The location of the responsible ASC of the specific subscription (home
region). For each subscription there is only one responsible location.
It will not change, so you can hardcode this value if you already know the value of asc_location of your subscription.
But each subscription may have different asc_location values(my 2 Azure subscriptions have different asc_location value).
So if you have a lot of Azure subscriptions, you can just query asc_location by API (as far as I know, this is the only way I can find to do this)and then use SDK to get the Secure Score, try the code below:
from azure.mgmt.security import SecurityCenter
from azure.identity import ClientSecretCredential
import requests
from requests.api import head, request
TENANT_ID = ''
CLIENT = ''
KEY = ''
subscription_id= ''
getLocationsURL = "https://management.azure.com/subscriptions/"+subscription_id+"/providers/Microsoft.Security/locations?api-version=2015-06-01-preview"
credentials = ClientSecretCredential(
client_id = CLIENT,
client_secret = KEY,
tenant_id = TENANT_ID
)
#request for asc_location for a subscription
azure_access_token = credentials.get_token('https://management.azure.com/.default')
r = requests.get(getLocationsURL,headers={"Authorization":"Bearer " + azure_access_token.token}).json()
location = r['value'][0]['name']
print("location:" + location)
client = SecurityCenter(credentials, subscription_id, asc_location=location)
for score in client.secure_scores.list():
print(score)
Result:

I recently came across this problem.
Based on my observation, I can use whatever location under my subscription to initiate SecurityCenter client. Then later client.locations.list() gives me exactly one ASC location.
# Any of SubscriptionClient.subscriptions.list_locations will do
location = 'eastasia'
client = SecurityCenter(
credential, my_subscription_id,
asc_location=location
)
data = client.locations.list().next().as_dict()
pprint(f"Asc location: {data}")
In my case, the it's always westcentralus regardless my input was eastasia.
Note that you'll get exception if you use get instead of list
data = client.locations.get().as_dict()
pprint(f"Asc location: {data}")
# azure.core.exceptions.ResourceNotFoundError: (ResourceNotFound) Could not find location 'eastasia'
So what i did was a bit awkward,
create a SecurityCenter client using a location under my subscription
client.locations.list() to get ASC location
Use the retrieved ASC location to create SecurityCenter client again.

I ran into this recently too, and initially did something based on #stanley-gong's answer. But it felt a bit awkward, and I checked to see how the Azure CLI does it. I noticed that they hardcode a value for asc_location:
def _cf_security(cli_ctx, **_):
from azure.cli.core.commands.client_factory import get_mgmt_service_client
from azure.mgmt.security import SecurityCenter
return get_mgmt_service_client(cli_ctx, SecurityCenter, asc_location="centralus")
And the PR implementing that provides some more context:
we have a task to remove the asc_location from the initialization of the clients. currently we hide the asc_location usage from the user.
centralus is a just arbitrary value and is our most common region.
So... maybe the dance of double-initializing a client or pulling a subscription's home region isn't buying us anything?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.