I am trying to download data in Azure Storage container using Python. Using account keys is not an option, so I am trying to use Azure AD but have not been able to make it work so far. I am primarily using the doc here for reference: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-directory-file-acl-python.
Code to connect using Azure AD:
def initialize_storage_account_ad(storage_account_name, client_id, client_secret, tenant_id):
try:
global service_client
credential = ClientSecretCredential(tenant_id, client_id, client_secret)
service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.windows.net".format(
"https", storage_account_name), credential=credential)
except Exception as e:
print(e)
Code to download data:
def download_file_from_directory():
try:
file_system_client = service_client.get_file_system_client(file_system="my-file-system")
directory_client = file_system_client.get_directory_client("my-directory")
local_file = open("C:\\file-to-download.txt",'wb')
file_client = directory_client.get_file_client("uploaded-file.txt")
download = file_client.download_file()
downloaded_bytes = download.readall()
local_file.write(downloaded_bytes)
local_file.close()
except Exception as e:
print(e)
Now I know that I have the download set up correctly, because I am able to get the data when I use account key. But somehow, no success with using Azure AD to connect. I started with registering an app, finding tenant id/client id/client secret. I have also granted the registered app the permissions to Azure Storage and enabled implicit grant flow (ID tokens). Am I still missing anything? Any help is appreciated.
If you're using Azure Active Directory (Azure AD) to authorize access, then make sure that you assigned the Storage Blob Data Owner role . And Grant access to Azure Blob data with RBAC in the Azure Portal
You'll have to assign one of the following Azure role-based access control (Azure RBAC) roles to your security principal.
Storage Blob Data Owner: All directories and files in the account.
Storage Blob Data Contributor: Only directories and files owned by the security principal.
For more details refer this document
Related
My Sql server DB password is saved on Azure app vault which has DATAREF ID as a identifier. I need that password to create spark dataframe from table which is present in SQL server. I am running this .py file on google Dataproc cluster. How can I get that password using python?
Since you are accessing an Azure service from a non-Azure service, you will need a service principal. You can use certificate or secret. See THIS link for the different methods. You will need to give the service principal proper access and this will depend if you are using RBAC or access policy for your key vault.
So the steps you need to follow are:
Create a key vault and create a secret.
Create a Service principal or application registration. Store the clientid, clientsecret and tenantid.
Give the service principal proper access to the key vault(if you are using access policies) or to the specific secret(if you are using RBAC model)
The python link for the code is HERE.
The code that will work for you is below:
from azure.identity import ClientSecretCredential
from azure.keyvault.secrets import SecretClient
tenantid = <your_tenant_id>
clientsecret = <your_client_secret>
clientid = <your_client_id>
my_credentials = ClientSecretCredential(tenant_id=tenantid, client_id=clientid, client_secret=clientsecret)
secret_client = SecretClient(vault_url="https://<your_keyvault_name>.vault.azure.net/", credential=my_credentials)
secret = secret_client.get_secret("<your_secret_name>")
print(secret.name)
print(secret.value)
I am using below code to copy blob across different storage accounts, but it fails with the below error
src_blob = '{0}/{1}?{2}'.format('source_url',b_name,'sp=rw&st=2022-11-17T20:44:03Z&se=2022-12-31T04:44:03Z&spr=https&sv=2021-06-08&sr=c&sig=ZXRe2FptVF5ArRM%2BKDAkLboCN%2FfaD9Mx38yZGWhnps0%3D')
destination_client = BlobServiceClient.from_connection_string("destination_connection_string")//The connection string has sas token which has sr=c
copied_blob = destination_client.get_blob_client('standardfeed', b_name)
copied_blob.start_copy_from_url(src_blob)
ErrorCode: AuthorizationPermissionMismatch
This request is not authorized to perform this operation using this permission.
Any thing missing or did I copy the wrong SAS token?
I tried in my environment and successfully copied blob from one storage account to another storage account.
Code:
from azure.storage.blob import BlobServiceClient
b_name="sample1.pdf"
src_blob = '{0}/{1}?{2}'.format('https://venkat123.blob.core.windows.net/test',b_name,'sp=r&st=2022-11-18T07:46:10Z&se=2022-11-18T15:46:10Z&spr=https&sv=<SAS token >)
destination_client = BlobServiceClient.from_connection_string("<connection string>")
copied_blob = destination_client.get_blob_client('test1', b_name)
copied_blob.start_copy_from_url(src_blob)
Console:
Portal:
Make sure you has necessary permission for authentication purpose you need to assign roles in your storage account.
Storage Blob Data Contributor
Storage Blob Data Reader
Portal:
Update:
You can get the connection string through portal:
Reference:
Azure Blob Storage "Authorization Permission Mismatch" error for get request with AD token - Stack Overflow
I want to create a Python script that can do some really basic things (add/remove files in a shared google drive)
The script will be running on remote PCs so anything involving web authentication is off the table
I read online that using a service account was the way to go, so I did the following:
Created a project
Enabled the Google Drive API for the project
Created a service account for the project (left "Grant this service account access to project", couldn't figure out what, if anything, I should put here)
Created a key for the service account, downloaded it and tried to use it the following way:
service_account_info = json.load(open('service_account.json'))
SCOPES = ['https://www.googleapis.com/auth/drive.metadata.readonly'',
"https://www.googleapis.com/auth/drive.file",
"https://www.googleapis.com/auth/drive"]
creds = service_account.Credentials.from_service_account_info(service_account_info, scopes=SCOPES)
service = build('drive', 'v3', credentials=creds)
results = service.files().list(pageSize=10, fields="nextPageToken, files(id, name)").execute()
items = results.get('files', [])
Which results in an exception after my "results" line executes:
google.auth.exceptions.RefreshError: ('No access token in response.',
{'id_token': '[longtoken'})
I saw something that hinted I needed to use the "OAuth 2.0 Playground" where I tick off "Use your own Oauth credentials", specify the ClientID and ClientSecret and authorize the Google Drive APIs but when I try to do that I Just get "Error 400: redirect_uri_mismatch" (and I'm not sure if resolving this will actually get me anywhere)
Is there a viable way of doing what I want to do?
You're missing these steps:
Setting up Oauth consent
Create OAuth Client ID Credentials for a Desktop App using the OAuth Consent
Tunnel a remote port from your server to your local Machine
Service account is possible but it is also possible to authenticate on a remote machine -see here
I am trying to download a file from Amazon S3 bucket to my local device using the below code but I got an error saying "Unable to locate credentials"
Given below is the code I have written:
import boto3
import botocore
BUCKET_NAME = 'my-bucket'
KEY = 'my_image_in_s3.jpg'
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
Could anyone help me on this. Thanks in advance.
AWS use a shared credentials system for AWS CLI and all other AWS SDKs this way there is no risk of leaking your AWS credentials to some code repository, AWS security practices recommend to use a shared credentials file which is located usually on linux
~/.aws/credentials
this file contains an access key and secret key which is used by all sdk and aws cli the file the file can be created manually or automatically using this command
aws configure
it will ask few questions and create the credentials file for you, note that you need to create a user with appropiate permissions before accessing aws resources.
For more information click on the link below -:
AWS cli configuration
You are not using the session you created to download the file, you're using s3 client you created. If you want to use the client you need to specify credentials.
your_bucket.download_file('k.png', '/Users/username/Desktop/k.png')
or
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
s3.download_file('your_bucket','k.png','/Users/username/Desktop/k.png')
I am trying to upload a folder in my local machine to google cloud bucket. I get an error with the credentials. Where should I be providing the credentials and what all information is needed in it.
from_dest = '/Users/xyzDocuments/tmp'
gsutil_link = 'gs://bucket-1991'
from google.cloud import storage
try:
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print('File {} uploaded to {}.'.format(source_file_name,destination_blob_name))
except Exception as e:
print e
The error is
could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://developers.google.com/accounts/do`cs/application-default-credentials.
You need to acquire the application default credentials for your project and set them as an environmental variable:
Go to the Create service account key page in the GCP Console.
From the Service account drop-down list, select New service account.
Enter a name into the Service account name field.
From the Role drop-down list, select Project > Owner.
Click Create. A JSON file that contains your key downloads to your computer.
Then, set an environmental variable which will provide the application credentials to your application when it runs locally:
$ export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"
This error message is usually thrown when the application is not being authenticated correctly due to several reasons such as missing files, invalid credential paths, incorrect environment variables assignations, among other causes. Keep in mind that when you set an environment variable value in a session, it is reset every time the session is dropped.
Based on this, I recommend you to validate that the credential file and file path are being correctly assigned, as well as follow the Obtaining and providing service account credentials manually guide, in order to explicitly specify your service account file directly into your code; In this way, you will be able to set it permanently and verify if you are passing the service credentials correctly.
Passing the path to the service account key in code example:
def explicit():
from google.cloud import storage
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json('service_account.json')
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)