I'm trying to use azure-mgmt-kusto Pkg for some Kusto Cluster operations, using KustoManagementClient. This client requires TokenCredential on constructor. For my own scenario, I would like to use my own AAD credentials, preferably using interactive login or IWA (Integrated Windows Authentication). The closest I was able to achieve this is using the following code:
creds = DefaultAzureCredential(exclude_interactive_browser_credential=False).get_token('')
kusto_client = azure.mgmt.kusto.KustoManagementClient(credential=creds, subscription_id='<>')
but this raises an error in the second line:
Expected type 'TokenCredential', got 'AccessToken' instead
which I couldn't find any way around!
Any suggestions on how to resolve this? or other methods to use?
Actually, after simply trying despite the Pycharm warning, this worked:
from azure.identity import DefaultAzureCredential
from azure.mgmt.kusto import KustoManagementClient
credential = DefaultAzureCredential()
kusto_management_client = KustoManagementClient(credential, subId)
Related
Correct me if i'm wrong, but my understanding of the UDF function in Snowpark is that you can send the function UDF from your IDE and it will be executed inside Snowflake. I have a staged database called GeoLite2-City.mmdb inside a S3 bucket on my Snowflake account and i would like to use it to retrieve informations about an ip address. So my strategy was to
1 Register an UDF which would return a response string n my IDE Pycharm
2 Create a main function which would simple question the database about the ip address and give me a response.
The problem is that, how the UDF and my code can see the staged file at
s3://path/GeoLite2-City.mmdb
in my bucket, in my case i simply named it so assuming that it will eventually find it (with geoip2.database.Reader('GeoLite2-City.mmdb') as reader:) since the
stage_location='#AWS_CSV_STAGE' is the same as were the UDF will be saved? But i'm not sure if i understand correctly what the option stage_location is referring exactly.
At the moment i get the following error:
"Cannot add package geoip2 because Anaconda terms must be accepted by ORGADMIN to use Anaconda 3rd party packages. Please follow the instructions at https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages.html#using-third-party-packages-from-anaconda."
Am i importing geoip2.database correctly in order to use it with snowpark and udf?
Do i import it by writing session.add_packages('geoip2') ?
Thank You for clearing my doubts.
The instructions i'm following about geoip2 are here.
https://geoip2.readthedocs.io/en/latest/
my code:
from snowflake.snowpark import Session
import geoip2.database
from snowflake.snowpark.functions import col
import logging
from snowflake.snowpark.types import IntegerType, StringType
logger = logging.getLogger()
logger.setLevel(logging.INFO)
session = None
user = ''*********'
password = '*********'
account = '*********'
warehouse = '*********'
database = '*********'
schema = '*********'
role = '*********'
print("Connecting")
cnn_params = {
"account": account,
"user": user,
"password": password,
"warehouse": warehouse,
"database": database,
"schema": schema,
"role": role,
}
def first_udf():
with geoip2.database.Reader('GeoLite2-City.mmdb') as reader:
response = reader.city('203.0.113.0')
print('response.country.iso_code')
return response
try:
print('session..')
session = Session.builder.configs(cnn_params).create()
session.add_packages('geoip2')
session.udf.register(
func=first_udf
, return_type=StringType()
, input_types=[StringType()]
, is_permanent=True
, name='SNOWPARK_FIRST_UDF'
, replace=True
, stage_location='#AWS_CSV_STAGE'
)
session.sql('SELECT SNOWPARK_FIRST_UDF').show()
except Exception as e:
print(e)
finally:
if session:
session.close()
print('connection closed..')
print('done.')
UPDATE
I'm trying to solve it using a java udf as in my staging area i have the 'geoip2-2.8.0.jar' library staged already. If i could import it's methods to get the country of an ip it would be perfect, the problem is that i don't know how to do it exactly. I'm trying to follow these instructions https://maxmind.github.io/GeoIP2-java/.
I wanna interrogate the database and get as output the iso code of the country and i want to do it on snowflake worksheet.
CREATE OR REPLACE FUNCTION GEO()
returns varchar not null
language java
imports = ('#AWS_CSV_STAGE/lib/geoip2-2.8.0.jar', '#AWS_CSV_STAGE/geodata/GeoLite2-City.mmdb')
handler = 'test'
as
$$
def test():
File database = new File("geodata/GeoLite2-City.mmdb")
DatabaseReader reader = new DatabaseReader.Builder(database).build();
InetAddress ipAddress = InetAddress.getByName("128.101.101.101");
CityResponse response = reader.city(ipAddress);
Country country = response.getCountry();
System.out.println(country.getIsoCode());
$$;
SELECT GEO();
This will be more complicated that it looks:
To use session.add_packages('geoip2') in Snowflake you need to accept the Anaconda terms. This is easy if you can ask your account admin.
But then you can only get the packages that Anaconda has added to Snowflake in this way. The list is https://repo.anaconda.com/pkgs/snowflake/, and I don't see geoip2 there yet.
So you will need to package you own Python code (until Anaconda sees enough requests for geoip2 in the wishlist). I described the process here https://medium.com/snowflake/generating-all-the-holidays-in-sql-with-a-python-udtf-4397f190252b.
But wait! GeoIP2 is not pure Python, so you will need to wait until Anaconda packages the C extension libmaxminddb. But this will be harder, as you can see their docs don't offer a straightforward way like other pip installable C libraries.
So this will be complicated.
There are other alternative paths, like a commercial provider of this functionality (like I describe here https://medium.com/snowflake/new-in-snowflake-marketplace-monetization-315aa90b86c).
There other approaches to get this done without using a paid dataset, but I haven't written about that yet - but someone else might before I get to do it.
Btw, years ago I wrote something like this for BigQuery (https://cloud.google.com/blog/products/data-analytics/geolocation-with-bigquery-de-identify-76-million-ip-addresses-in-20-seconds), but today I was notified that Google recently deleted the tables that I had shared with the world (https://twitter.com/matthew_hensley/status/1598386009129058315).
So it's time to rebuild in Snowflake. But who (me?) and when is still a question.
The following code works for me when using Apache Jena Fuseki 4.3.2 (docker image secoresearch/fuseki:4.3.2) with rdflib 6.1.1:
from rdflib import Graph
from rdflib.plugins.stores.sparqlstore import SPARQLUpdateStore
FUSEKI_QUERY = 'http://localhost:3030/ds/sparql'
FUSEKI_UPDATE = 'http://localhost:3030/ds/update'
store = SPARQLUpdateStore(query_endpoint=FUSEKI_QUERY,
update_endpoint=FUSEKI_UPDATE,
method='POST',
autocommit=False)
graph = Graph(store=store, identifier=GRAPH_NAME)
graph.parse('./dump.ttl') # file containing 1000 example triples
store.commit()
But when I change to OpenLink Virtuoso 07.20.3233 (docker image tenforce/virtuoso:latest), I get the following error:
urllib.error.HTTPError: HTTP Error 500: SPARQL Request Failed
With some trial and error, I got the following to work for Virtuoso:
from rdflib import Graph
from rdflib.plugins.stores.sparqlstore import SPARQLUpdateStore
VIRTUOSO_QUERY = 'http://localhost:8890/sparql'
VIRTUOSO_UPDATE = 'http://localhost:8890/sparql'
store = SPARQLUpdateStore(query_endpoint=VIRTUOSO_QUERY,
update_endpoint=VIRTUOSO_UPDATE,
method='POST',
autocommit=False)
intermediate_graph = Graph()
intermediate_graph.parse('./dump.ttl')
graph = Graph(store=store, identifier=GRAPH_NAME)
for triple in intermediate_graph:
graph.add(triple)
store.commit() # Have to commit after every add here
If I don't commit after every add, but only once after the loop, I get the same error as above. At the moment, I don't see any helpful HTTP or server log entries, that might point me to the problem.
So my question is, has anyone an idea why this error occurs and what the solution may be? I guess it has something to do with the way my Virtuoso instance is configured?
Update 03.06.2022:
I noticed, that I was using an old version of Virtuoso with docker image tenforce/virtuoso:latest (07.20.3233), so I switched to openlink/virtuoso-opensource-7:latest (07.20.3234). With that, my code for Virtuoso does not work anymore (same error as stated above).
Also, as TallTed identified correctly in his comment, I use /sparql for both query and update. I can do that, because I gave the user SPARQL the SPARQL_UPDATE role in the Virtuoso Conductor interface. It is kind of a workaround for now, since I didn't get basic auth to work over rdflib. Could that have something to do with the problem, since I don't use /sparql-auth?
I'm trying to write a function in a Function App that manipulates data in a CosmosDB. I get it working if I drop the read-write key in the environment variables. To make it more robust I wanted it to work as a managed identity app. The app has the role 'DocumentDB Account Contributor' on the Cosmos DB.
However, the CosmosClient constructor doesn't accept a Credential and needs the read-write key. I've been chasing down the rabbit hole of azure.mgmt.cosmosdb.operations where there is a DatabaseAccountsOperations class with a list_keys() method. I can't find a neat way to access that function though. If I try to create that object (which requires poaching the config, serializer and deserializer from my dbmgmt object) it still requires the resourceGroupName and accountName.
I can't help but think that I've taken a wrong turn somewhere because this has to be possible in a more straightforward manner. Especially given that the JavaScript SDK references a more logical class CosmosDBManagementClient in line with the SubscriptionClient. However, I can't find that class anywhere on the python side.
Any pointers?
from azure.identity import DefaultAzureCredential
from azure.cosmos import CosmosClient
from azure.mgmt.resource import SubscriptionClient
from azure.mgmt.cosmosdb import CosmosDB
from .cred_wrapper import CredentialWrapper
def main(req: func.HttpRequest) -> func.HttpResponse:
request_body = req.get_body()
# credential = DefaultAzureCredential()
# https://gist.github.com/lmazuel/cc683d82ea1d7b40208de7c9fc8de59d
credential = CredentialWrapper()
uri = os.environ.get('cosmos-db-uri')
# db = CosmosClient(url=uri, credential=credential) # Doesn't work, wants a credential that is a RW/R key
# Does work if I replace it with my primary / secondary key but the goal is to remove dependence on that.
subscription_client = SubscriptionClient(credential)
subscription = next(subscription_client.subscriptions.list())
dbmgmt = CosmosDB(credential, subscription.subscription_id) # This doesn't accept the DB URI??
operations = list(dbmgmt.operations.list()) # I see the list_keys() Operation there...
EDIT
A helpful soul provided a response here but removed it before I could even react or accept it as the answer. They pointed out that there is an equivalent python SDK and that from azure.mgmt.cosmosdb import CosmosDBManagementClient would do the trick.
From there, I was on my own as that resulted in
ImportError: cannot import name 'CosmosDBManagementClient' from 'azure.mgmt.cosmosdb'
I believe the root of the problem lies in an incompatibility of the package azure-mgmt. After removing azure-mgmt from my requirements.txt and only loading the cosmos and identiy related packages, the import error was resolved.
This solved 90% of the problem.
dbmgmt = CosmosDBManagementClient(credential, subscription.subscription_id, c_uri)
print(dbmgmt.database_accounts.list_keys())
TypeError: list_keys() missing 2 required positional arguments: 'resource_group_name' and 'account_name'
Does one really need to collect each of these parameters? Compared to the example that reads a secret from a Vault it seems so convoluted.
For other unfortunate ones looking to access CosmosDB with Managed Identity, it seems that this is, as of May 2021, not yet possible.
Source: Discussion on Github
Update 12/05/2021 - I came here finding a solution for this with Javascript/Typescript. So leaving the answer here for others. I think that a similar approach could work for Python.
You can use RBAC for data plane operations with Managed Identities. Finding the documentation was difficult.
RBAC for Cosmos DB data plane operations with Managed Identities
Important - If you get the error Request blocked by Auth mydb : Request is blocked because principal [xxxxxx-6fad-44e4-98bc-2d423a88b65f] does not have required RBAC permissions to perform action Microsoft.DocumentDB/databaseAccounts/readMetadata on resource [/]. Don't use the Portal to assign roles, use the Azure CLI for CosmosDB.
How to - creating a role assignment for a user/system MSI/user MSI is done using the Azure CosmosDB CLI
# Find the role ID:
resourceGroupName='<myResourceGroup>'
accountName='<myCosmosAccount>'
az cosmosdb sql role definition list --account-name $accountName --resource-group $resourceGroupName
# Assign to the MSI or user managed MSI:
readOnlyRoleDefinitionId = '<roleDefinitionId>' # as fetched above
principalId = '<aadPrincipalId>'
az cosmosdb sql role assignment create --account-name $accountName --resource-group $resourceGroupName --scope "/" --principal-id $principalId --role-definition-id $readOnlyRoleDefinitionId
Once this step is done, the code for connecting is very easy. Use the #azure/identity package's Default Credential. This works in Azure Function App with managed identity and on your laptop with VS code or with az login.
Docs for #azure/identity sdk
Examples of authentication with #azure/identity to get the credential object
import { CosmosClient } from "#azure/cosmos";
import { DefaultAzureCredential, ManagedIdentityCredential, ChainedTokenCredential } from "#azure/identity";
const defaultCredentials = new DefaultAzureCredential();
const managedCredentials = new ManagedIdentityCredential();
const aadCredentials = new ChainedTokenCredential(managedCredentials, defaultCredentials);
client = new CosmosClient({
endpoint: "https://mydb.documents.azure.com:443/",
aadCredentials
});
I am trying to read the list of open issues title which doesn't have label resolved. For that I am referring the API documentation (https://docs.gitlab.com/ee/api/issues.html) which mentions NOT but I couldn't able to get the NOT to work.
The following python script I have tried so far to read the list of issues now I am not able to find how to use NOT to filter the issue which doesn't have resolved label.
import gitlab
# private token or personal token authentication
gl = gitlab.Gitlab('https://example.com', private_token='XXXYYYZZZ')
# make an API request to create the gl.user object. This is mandatory if you
# use the username/password authentication.
gl.auth()
# list all the issues
issues = gl.issues.list(all=True,scope='all',state='opened',assignee_username='username')
for issue in issues:
print(issue.title)
From Gitlab issues api documentation, not is of type Hash. It's a special type documented here
For example to exclude the labels Category:DAST and devops::secure, and to exclude the milestone 13.11, you would use the following parameters:
not[labels]=Category:DAST,devops::secure
not[milestone]=13.11
api example: https://gitlab.com/api/v4/issues?scope=all&state=opened&assignee_username=derekferguson¬[labels]=Category:DAST,devops::secure¬[milestone]=13.11
Using gitlab python module, you would need to pass some extra parameters by adding more keyword arguments:
import gitlab
gl = gitlab.Gitlab('https://gitlab.com')
extra_params = {
'not[labels]': "Category:DAST,devops::secure",
"not[milestone]": "13.11"
}
issues = gl.issues.list(all=True, scope='all', state='opened',
assignee_username='derekferguson', **extra_params)
for issue in issues:
print(issue.title)
I love using Boto API for Amazon Web Services but now I'm not capable of finding where is the error.
I'm using AWS for check domain availability and I have created a script in Python that includes the class at this link:
https://www.codatlas.com/github.com/boto/boto/develop/boto/route53/domains/layer1.py?line=67
I call the method check_domain_availability() on passing domain name:
Route53DomainsConnection.check_domain_availability('example.com',None)
but the method returns this error:
AttributeError: 'str' object has no attribute 'make_request'
I can try to pass parameters in many modes but no result.
Where am I wrong? Thanks in advance.
P.S: I use Debian wheezy and Python3.2
More on status of subdomains
I have found a method to get the status of a record just create with route53.
this is the code:
changes = ResourceRecordSets(conn, "ZONEID")
change = changes.add_change("STRING FOR ADD NEW SUBDOMAIN")
change.add_value(MY_IP)
status = changes.commit()
If print the status variable is contained the response of commit and the status:
{u'ChangeResourceRecordSetsResponse':{u'ChangeInfo': {u'Status: u'PENDING etc.....
Now i would like to be able to swhitch to another operation only if the status of subdomamin is "SYNC" but i doesn't able to access dinamically to string for check status.
I can use a while ? Can i use sleep command ? Can anyone help me over to resolve my problem ? Thanks
You don't show your code which makes it harder to debug but this line:
Route53DomainsConnection.check_domain_availability('example.com',None)
looks suspicious. It looks like you are trying to access the check_domain_availability method from the class rather than an instance of the class. I just did the following and it worked for me:
In [1]: import boto.route53.domains
In [2]: c = boto.route53.domains.connect_to_region('us-east-1')
In [3]: c.check_domain_availability('foobar.com')
Out[3]: {u'Availability': u'UNAVAILABLE'}