when running the azure CLI command:
az storage account blob-service-properties show --account-name sa36730 --resource-group rg-exercise1
The output json contains the filed isVersioningEnabled.
I am trying to get this field using python sdk.
I wrote this code but the output doesnt contain the version enabled information.
def blob_service_properties():
connection_string = "<connection string>"
# Instantiate a BlobServiceClient using a connection string
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
properties = blob_service_client.get_service_properties()
pprint.pprint(properties)
# [END get_blob_service_properties]
My output looks like:
{'analytics_logging': <azure.storage.blob._models.BlobAnalyticsLogging object at 0x7ff0f8b7c340>,
'cors': [<azure.storage.blob._models.CorsRule object at 0x7ff1088b61c0>],
'delete_retention_policy': <azure.storage.blob._models.RetentionPolicy object at 0x7ff0f8b9b1c0>,
'hour_metrics': <azure.storage.blob._models.Metrics object at 0x7ff0f8b9b700>,
'minute_metrics': <azure.storage.blob._models.Metrics object at 0x7ff0f8b9b3d0>,
'static_website': <azure.storage.blob._models.StaticWebsite object at 0x7ff0f8ba5c10>,
'target_version': None}
Is there a way to get the versioning information using Python SDK for storage blob?
I tried the below steps, it worked for me, in my environment
To get the Blobservice properties you can use azure-mgmt-storage package.
You can use the below to get the blob service properties.
Code:
from azure.mgmt.storage import StorageManagementClient
from azure.identity import DefaultAzureCredential
storage_client=StorageManagementClient(credential=DefaultAzureCredential(),subscription_id="<your subscription Id>")
blob_service_list = storage_client.blob_services.list("<your resourcegrouup name>', '<your account name>')
for items in blob_service_list:
print(items)
Console:
Related
I am using Azure storage account connection string to load a data file into Azure blob storage container, using Python program. Here is the code snippet of my program:
from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
... ...
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_name = "test"
# Create the container
container_client = blob_service_client.create_container(container_name)
upload_file_path = "dummy_data.xlsx"
blob_client = blob_service_client.get_blob_client(container=container_name, blob=upload_file_path)
# Upload file
with open(file=upload_file_path, mode="rb") as data:
blob_client.upload_blob(data)
My program successfully created a container in the blog storage, but failed to load data into the container, with error message like this:
ClientAuthenticationError: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:adfasa-asdfa0adfa
Time:2022-10-25T20:32:19.0165690Z
ErrorCode:AuthenticationFailed
authenticationerrordetail:The MAC signature found in the HTTP request 'bacadreRER=' is not the same as any computed signature. Server used following string to sign: 'PUT
I got stuck with the error. I tried to use SAS key and it worked. Why it's not working for a connection string? I am following Microsoft's code example to write my program:
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?tabs=managed-identity%2Croles-azure-portal%2Csign-in-azure-cli
Tried to manually upload data file with Azure Portal, and it worked. Using SAS key string in my Python code was also working. But it didn't work with Access Key connection string. It's odd that with the connection string I could create a container successfully.
I tried in my environment and got below results:
I executed the same code and successfully uploaded file in blob storage.
Code:
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
connect_str="DefaultEndpointsProtocol=https;AccountName=storage326123;AccountKey=3Lf7o2+vi3HgGKmUWaIG4xVdyzrzhxW5NxDNaUGVwykBPT5blZNKIyjbQlo0OAfuz0nllLUOGLRs+ASt9gqF+Q==;EndpointSuffix=core.windows.net"
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_name = "test"
# Create the container
container_client = blob_service_client.create_container(container_name)
upload_file_path = "C:\\Users\\v-vsettu\\Downloads\\dog.jpg"
blob_client = blob_service_client.get_blob_client(container=container_name, blob=upload_file_path)
# Upload file
with open(file=upload_file_path, mode="rb") as data:
blob_client.upload_blob(data)
Console:
Portal:
ClientAuthenticationError: Server failed to authenticate the request.Make sure the value of Authorization header is formed correctly including the signature. RequestId:adfasa-asdfa0adfa
Time:2022-10-25T20:32:19.0165690ZErrorCode:AuthenticationFailed
authenticationerrordetail:The MAC signature found in the HTTP request'bacadreRER=' is not the same as any computed signature. Server used following string to sign: 'PUT
The above error shows you missing something in connection string also check with signature.
You can get the connection string by
To create a default bigquery client I use:
from google.cloud import bigquery
client = bigquery.Client()
This uses the (default) credentials available in the environment.
But how I see then which (default) service account is used?
While you can interrogate the credentials directly (be it json keys, metadata server, etc), I have occasionally found it valuable to simply query bigquery using the SESSION_USER() function.
Something quick like this should suffice:
client = bigquery.Client()
query_job = client.query("SELECT SESSION_USER() as whoami")
results = query_job.result()
for row in results:
print("i am {}".format(row.whoami))
This led me in the right direction:
Google BigQuery Python Client using the wrong credentials
To see the service-account used you can do:
client._credentials.service_account_email
However:
This statement above works when you run it on a jupyter notebook (in Vertex AI), but when you run it in a cloud function with print(client._credentials.service_account_email) then it just logs 'default' to Cloud Logging. But the default service account for a Cloud Function should be: <project_id>#appspot.gserviceaccount.com.
This will also give you the wrong answer:
client.get_service_account_email()
The call to client.get_service_account_email() does not return the credential's service account email address. Instead, it returns the BigQuery service account email address used for KMS encryption/decryption.
Following John Hanley's comment (when running on a Compute Engine) you can query the metadata service to get the email user name:
https://cloud.google.com/compute/docs/metadata/default-metadata-values
So you can either use linux:
curl "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/email" -H "Metadata-Flavor: Google"
Or python:
import requests
headers = {'Metadata-Flavor': 'Google'}
response = requests.get(
"http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/email",
headers=headers
)
print(response.text)
The default in the url used is the alias of the actual service account used.
I'm developing an API using Azure Function Apps. The API works fine locally (using localhost). However, after publishing to Function App, I'm getting this error:
[Errno 30] Read-only file system
This error happens after I made the connection as a function to allow establishing new connection every time the API is requested. The data is taken from Azure Blob Storage container.
The code:
DBConnection.py:
import os, uuid
from azure.storage.blob import BlockBlobService, AppendBlobService
from datetime import datetime
import pandas as pd
import dask.dataframe as dd
import logging
def BlobConnection() :
try:
print("Connecting...")
#Establish connection
container_name = 'somecontainer'
blob_name = 'some_name.csv'
file_path = 'somepath'
account_name = 'XXXXXX'
account_key = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
blobService = BlockBlobService(account_name=account_name, account_key=account_key)
blobService.get_blob_to_path(container_name, blob_name, file_path)
df = dd.read_csv(file_path, dtype={'Bearing': 'int64', 'Speed': 'int64'})
df = df.compute()
return df
except Exception as ex:
print('Unable to connect!')
print('Exception:')
print(ex)
You are probably running in Package or Zip.
If so when you run your code the following line is trying to save the blob and can't. If you update that to use get_blob_to_bytes or get_blob_to_stream you would be fine.
blobService.get_blob_to_path(container_name, blob_name, file_path)
From [https://stackoverflow.com/questions/53630773/how-to-disable-read-only-mode-in-azure-function-app]
Part 1 - Disabling read-only mode
You'll likely find if you're using the latest tools that your function app is in run-from-package mode, which means it's reading the files directly from the uploaded ZIP and so there's no way to edit it. You can turn that off by deleting the WEBSITE_RUN_FROM_ZIP or WEBSITE_RUN_FROM_PACKAGE application setting in the portal. Note this will clear your function app until the next time you publish.
If your tools are a little older, or if you've deployed using the latest tools but with func azure functionapp publish my-app-name --nozip then you can use the App Service Editor in Platform Features in the portal to edit the function.json files and remove the "generatedBy" setting, which will stop them being read-only.
I am attempting to create a python script to connect to and interact with my AWS account. I was reading up on it here https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html
and I see that it reads your credentials from ~/.aws/credentials (on a Linux machine). I however and not connecting with an IAM user but SSO user. Thus, the profile connection data I use is located at ~/.aws/sso/cache directory.
Inside that directory, I see two json files. One has the following keys:
startUrl
region
accessToken
expiresAt
the second has the following keys:
clientId
clientSecret
expiresAt
I don't see anywhere in the docs about how to tell it to use my SSO user.
Thus, when I try to run my script, I get error such as
botocore.exceptions.ClientError: An error occurred (AuthFailure) when calling the DescribeSecurityGroups operation: AWS was not able to validate the provided access credentials
even though I can run the same command fine from the command prompt.
This was fixed in boto3 1.14.
So given you have a profile like this in your ~/.aws/config:
[profile sso_profile]
sso_start_url = <sso-url>
sso_region = <sso-region>
sso_account_id = <account-id>
sso_role_name = <role>
region = <default region>
output = <default output (json or text)>
And then login with
$ aws sso login --profile sso_profile
You will be able to create a session:
import boto3
boto3.setup_default_session(profile_name='sso_profile')
client = boto3.client('<whatever service you want>')
So here's the long and hairy answer tested on boto3==1.21.39:
It's an eight-step process where:
register the client using sso-oidc.register_client
start the device authorization flow using sso-oidc.start_device_authorization
redirect the user to the sso login page using webbrowser.open
poll sso-oidc.create_token until the user completes the signin
list and present the account roles to the user using sso.list_account_roles
get role credentials using sso.get_role_credentials
create a new boto3 session with the session credentials from (6)
eat a cookie
Step 8 is really key and should not be overlooked as part of any successful authorization flow.
In the sample below the account_id should be the account id of the account you are trying to get credentials for. And the start_url should be the url that aws generates for you to start the sso flow (in the AWS SSO management console, under Settings).
from time import time, sleep
import webbrowser
from boto3.session import Session
session = Session()
account_id = '1234567890'
start_url = 'https://d-0987654321.awsapps.com/start'
region = 'us-east-1'
sso_oidc = session.client('sso-oidc')
client_creds = sso_oidc.register_client(
clientName='myapp',
clientType='public',
)
device_authorization = sso_oidc.start_device_authorization(
clientId=client_creds['clientId'],
clientSecret=client_creds['clientSecret'],
startUrl=start_url,
)
url = device_authorization['verificationUriComplete']
device_code = device_authorization['deviceCode']
expires_in = device_authorization['expiresIn']
interval = device_authorization['interval']
webbrowser.open(url, autoraise=True)
for n in range(1, expires_in // interval + 1):
sleep(interval)
try:
token = sso_oidc.create_token(
grantType='urn:ietf:params:oauth:grant-type:device_code',
deviceCode=device_code,
clientId=client_creds['clientId'],
clientSecret=client_creds['clientSecret'],
)
break
except sso_oidc.exceptions.AuthorizationPendingException:
pass
access_token = token['accessToken']
sso = session.client('sso')
account_roles = sso.list_account_roles(
accessToken=access_token,
accountId=account_id,
)
roles = account_roles['roleList']
# simplifying here for illustrative purposes
role = roles[0]
role_creds = sso.get_role_credentials(
roleName=role['roleName'],
accountId=account_id,
accessToken=access_token,
)
session = Session(
region_name=region,
aws_access_key_id=role_creds['accessKeyId'],
aws_secret_access_key=role_creds['secretAccessKey'],
aws_session_token=role_creds['sessionToken'],
)
Your current .aws/sso/cache folder structure looks like this:
$ ls
botocore-client-XXXXXXXX.json cXXXXXXXXXXXXXXXXXXX.json
The 2 json files contain 3 different parameters that are useful.
botocore-client-XXXXXXXX.json -> clientId and clientSecret
cXXXXXXXXXXXXXXXXXXX.json -> accessToken
Using the access token in cXXXXXXXXXXXXXXXXXXX.json you can call get-role-credentials. The output from this command can be used to create a new session.
Your Python file should look something like this:
import json
import os
import boto3
dir = os.path.expanduser('~/.aws/sso/cache')
json_files = [pos_json for pos_json in os.listdir(dir) if pos_json.endswith('.json')]
for json_file in json_files :
path = dir + '/' + json_file
with open(path) as file :
data = json.load(file)
if 'accessToken' in data:
accessToken = data['accessToken']
client = boto3.client('sso',region_name='us-east-1')
response = client.get_role_credentials(
roleName='string',
accountId='string',
accessToken=accessToken
)
session = boto3.Session(aws_access_key_id=response['roleCredentials']['accessKeyId'], aws_secret_access_key=response['roleCredentials']['secretAccessKey'], aws_session_token=response['roleCredentials']['sessionToken'], region_name='us-east-1')
A well-formed boto3-based script should transparently authenticate based on profile name. It is not recommended to handle the cached files or keys or tokens yourself, since the official code methods might change in the future. To see the state of your profile(s), run aws configure list --examples:
$ aws configure list --profile=sso
Name Value Type Location
---- ----- ---- --------
profile sso manual --profile
The SSO session associated with this profile has expired or is otherwise invalid.
To refresh this SSO session run aws sso login with the corresponding profile.
$ aws configure list --profile=old
Name Value Type Location
---- ----- ---- --------
profile old manual --profile
access_key ****************3DSx shared-credentials-file
secret_key ****************sX64 shared-credentials-file
region us-west-1 env ['AWS_REGION', 'AWS_DEFAULT_REGION']
What works for me is the following:
import boto 3
session = boto3.Session(profile_name="sso_profile_name")
session.resource("whatever")
using boto3==1.20.18.
This would work if you had previously configured SSO for aws ie. aws configure sso.
Interestingly enough, I don't have to go through this if I use ipython, I just aws sso login beforehand and then call boto3.Session().
I am trying to figure out whether there is something wrong with my approach - I fully agree with what was said above with respect to transparency and although it is a working solution, I am not in love with it.
EDIT: there was something wrong and here is how I fixed it:
run aws configure sso (as above);
install aws-vault - it basically replaces aws sso login --profile <profile-name>;
run aws-vault exec <profile-name> to create a sub-shell with AWS credentials exported to environment variables.
Doing so, it is possible to run any boto3 command both interactively (eg. iPython) and from a script, as in my case. Therefore, the snippet above simply becomes:
import boto 3
session = boto3.Session()
session.resource("whatever")
Here for further details on AWS vault.
I have a bigQuery dataset defined in Google Cloud with my userA account, and I want my colleague userB, who is a member of the same group, to be able to see the dataset that I have defined. Using the bq command-line interface, userB can see the project, but not the dataset. How can I share the dataset created by userA with userB using python script?
Another thing you may run into is that you must give access at the data set level in BigQuery. Depending on how you have setup user roles in cloud platform and BigQuery, you may need to give the service account direct access to the Bigquery data set.
To do this go into BigQuery, hover on your dataset and click the down arrow, select 'share data set'. A modal will open where you can then specify which email address's and service accounts to share the data set with and control their access rights.
Let me know if my instructions are too confusing and I'll upload some images showing exactly how to do this.
Good Luck!!
An example using the Python Client Library. Adapted from here but adding a get_dataset call to get the current ACL policy for already existing datasets:
from google.cloud import bigquery
project_id = "PROJECT_ID"
dataset_id = "DATASET_NAME"
group_name= "google-group-name#google.com"
role = "READER"
client = bigquery.Client(project=project_id)
dataset_info = client.get_dataset(client.dataset(dataset_id))
access_entries = dataset_info.access_entries
access_entries.append(
bigquery.AccessEntry(role, "groupByEmail", group_name)
)
dataset_info.access_entries = access_entries
dataset_info = client.update_dataset(
dataset_info, ['access_entries'])
Another way to do it is using the Google Python API Client and the get and patch methods. First, we retrieve the existing dataset ACL, add the group as READER to the response and patch the dataset metadata:
from oauth2client.client import GoogleCredentials
from googleapiclient import discovery
project_id="PROJECT_ID"
dataset_id="DATASET_NAME"
group_name="google-group-name#google.com"
role="READER"
credentials = GoogleCredentials.get_application_default()
bq = discovery.build("bigquery", "v2", credentials=credentials)
response = bq.datasets().get(projectId=project_id, datasetId=dataset_id).execute()
response['access'].append({u'role': u'{}'.format(role), u'groupByEmail': u'{}'.format(group_name)})
bq.datasets().patch(projectId=project_id, datasetId=dataset_id, body=response).execute()
Replace the project_id, dataset_id, group_name and role variables accordingly.
Versions used:
$ pip freeze | grep -E 'bigquery|api-python'
google-api-python-client==1.7.7
google-cloud-bigquery==1.8.1