I am using below code to copy blob across different storage accounts, but it fails with the below error
src_blob = '{0}/{1}?{2}'.format('source_url',b_name,'sp=rw&st=2022-11-17T20:44:03Z&se=2022-12-31T04:44:03Z&spr=https&sv=2021-06-08&sr=c&sig=ZXRe2FptVF5ArRM%2BKDAkLboCN%2FfaD9Mx38yZGWhnps0%3D')
destination_client = BlobServiceClient.from_connection_string("destination_connection_string")//The connection string has sas token which has sr=c
copied_blob = destination_client.get_blob_client('standardfeed', b_name)
copied_blob.start_copy_from_url(src_blob)
ErrorCode: AuthorizationPermissionMismatch
This request is not authorized to perform this operation using this permission.
Any thing missing or did I copy the wrong SAS token?
I tried in my environment and successfully copied blob from one storage account to another storage account.
Code:
from azure.storage.blob import BlobServiceClient
b_name="sample1.pdf"
src_blob = '{0}/{1}?{2}'.format('https://venkat123.blob.core.windows.net/test',b_name,'sp=r&st=2022-11-18T07:46:10Z&se=2022-11-18T15:46:10Z&spr=https&sv=<SAS token >)
destination_client = BlobServiceClient.from_connection_string("<connection string>")
copied_blob = destination_client.get_blob_client('test1', b_name)
copied_blob.start_copy_from_url(src_blob)
Console:
Portal:
Make sure you has necessary permission for authentication purpose you need to assign roles in your storage account.
Storage Blob Data Contributor
Storage Blob Data Reader
Portal:
Update:
You can get the connection string through portal:
Reference:
Azure Blob Storage "Authorization Permission Mismatch" error for get request with AD token - Stack Overflow
Related
I am trying to download data in Azure Storage container using Python. Using account keys is not an option, so I am trying to use Azure AD but have not been able to make it work so far. I am primarily using the doc here for reference: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-directory-file-acl-python.
Code to connect using Azure AD:
def initialize_storage_account_ad(storage_account_name, client_id, client_secret, tenant_id):
try:
global service_client
credential = ClientSecretCredential(tenant_id, client_id, client_secret)
service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.windows.net".format(
"https", storage_account_name), credential=credential)
except Exception as e:
print(e)
Code to download data:
def download_file_from_directory():
try:
file_system_client = service_client.get_file_system_client(file_system="my-file-system")
directory_client = file_system_client.get_directory_client("my-directory")
local_file = open("C:\\file-to-download.txt",'wb')
file_client = directory_client.get_file_client("uploaded-file.txt")
download = file_client.download_file()
downloaded_bytes = download.readall()
local_file.write(downloaded_bytes)
local_file.close()
except Exception as e:
print(e)
Now I know that I have the download set up correctly, because I am able to get the data when I use account key. But somehow, no success with using Azure AD to connect. I started with registering an app, finding tenant id/client id/client secret. I have also granted the registered app the permissions to Azure Storage and enabled implicit grant flow (ID tokens). Am I still missing anything? Any help is appreciated.
If you're using Azure Active Directory (Azure AD) to authorize access, then make sure that you assigned the Storage Blob Data Owner role . And Grant access to Azure Blob data with RBAC in the Azure Portal
You'll have to assign one of the following Azure role-based access control (Azure RBAC) roles to your security principal.
Storage Blob Data Owner: All directories and files in the account.
Storage Blob Data Contributor: Only directories and files owned by the security principal.
For more details refer this document
I am trying to download an excel file on a blob. However, it keeps generating the error "The specified blob does not exist". This error happens at blob_client.download_blob() although I can get the blob_client. Any idea why or other ways I can connect using managed identity?
default_credential = DefaultAzureCredential()
blob_url = BlobServiceClient('url', credential = default_credential)
container_client = blob_url.get_container_client('xx-xx-data')
blob_client = container_client.get_blob_client('TEST.xlsx')
downloaded_blob = blob_client.download_blob()
df=pd.read_excel(downloaded_blob.content_as_bytes(), sheet_name='Test',skiprows=2)
Turns out that I have to also provide 'Reader' access on top of 'Storage Blob Data Contributor' to be able to identify the blob. There was no need for SAS URL.
The reason you're getting this error is because each request to Azure Blob Storage must be an authenticated request. Only exception to this is when you're reading (downloading) a blob from a public blob container. In all likelihood, the blob container holding this blob is having a Private ACL and since you're sending an unauthenticated request, you're getting this error.
I would recommend using a Shared Access Signature (SAS) URL for the blob with Read permission instead of simple blob URL. Since a SAS URL has authorization information embedded in the URL itself (sig portion), you should be able to download the blob provided SAS is valid and has not expired.
Please see this for more information on Shared Access Signature: https://learn.microsoft.com/en-us/rest/api/storageservices/delegate-access-with-shared-access-signature.
so I try to make a python API so the user can upload a pdf file then the API directly sends it to Azure storage. what I found is I must have a directory i.e.
container_client = ContainerClient.from_connection_string(conn_str=conn_str,container_name='mycontainer')
with open('mylocalpath/myfile.pdf',"rb") as data:
container_client.upload_blob(name='myblockblob.pdf', data=data)
another solution is I have to store it on VM and then replace the local path to it, but I don't want to make my VM full.
If you want to upload it directly from the client-side to azure storage blob instead of receiving that file to your API you can use azure shared access signature inside your storage account and from your API you can make a function to generate Pre-Signed URL using that shared access signature service and return that URL to your client it will allow the client to upload file to your blob via that URL.
To generate URL can you follow the below code:
from datetime import datetime, timedelta
from azure.storage.blob import generate_blob_sas, BlobSasPermissions
blobname= "<blobname>"
accountkey="<accountkey>" #get this from access key section in azure storage.
containername = "<containername>"
def getpushurl(filename):
token = generate_blob_sas(
account_name=blobname,
container_name=containername,
account_key=accountkey,
permission=BlobSasPermissions(write=True),
expiry=datetime.utcnow() + timedelta(seconds=100),
blob_name=filename,
)
url = f"https://{blobname}.blob.core.windows.net/{containername}/{filename}?{token}"
return url
pdfpushurl = getpushurl("demo.text")
print(pdfpushurl)
So after generating this URL give it to the client so client could directly send the file to the URL received with PUT request and it will get uploaded directly to azure storage.
You can generate a SAS token with write permission for your users so that your users could upload .pdf files directly on their side without storing them on the server. For details, pls see my previous post here.
Try the code below to generate a SAS token with container write permission:
from azure.storage.blob import BlobServiceClient,ContainerSasPermissions,generate_container_sas
from datetime import datetime, timedelta
storage_connection_string=''
container_name = ''
block_blob_service = BlobServiceClient.from_connection_string(storage_connection_string)
container_client = block_blob_service.get_container_client(container_name)
sasToken = generate_container_sas(account_name=container_client.account_name,
container_name=container_client.container_name,
account_key= container_client.credential.account_key,
#grant write permission only
permission=ContainerSasPermissions(write=True),
start=datetime.utcnow() - timedelta(minutes=1),
#1 hour vaild time
expiry=datetime.utcnow() + timedelta(hours=1)
)
print(sasToken)
After you have replied to this SAS token to your user, just see this official guide to upload files from a HTML page, I think it would be helpful if you are developing a web app.
Under Google Cloud Run, you can select which service account your container is running. Using the default compute service account fails to generate a signed url.
The work around listed here works on Google Cloud Compute -- if you allow all the scopes for the service account. There does not seem to be away to do that in Cloud Run (not that I can find).
https://github.com/googleapis/google-auth-library-python/issues/50
Things I have tried:
Assigned the service account the role: roles/iam.serviceAccountTokenCreator
Verified the workaround in the same GCP project in a Virtual Machine (vs Cloud Run)
Verified the code works locally in the container with the service account loaded from private key (via json file).
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('EXAMPLE_BUCKET')
blob = bucket.get_blob('libraries/image_1.png')
expires = datetime.now() + timedelta(seconds=86400)
blob.generate_signed_url(expiration=expires)
Fails with:
you need a private key to sign credentials.the credentials you are currently using <class 'google.auth.compute_engine.credentials.Credentials'> just contains a token. see https://googleapis.dev/python/google-api-core/latest/auth.html#setting-up-a-service-account for more details.
/usr/local/lib/python3.8/site-packages/google/cloud/storage/_signing.py, line 51, in ensure_signed_credentials
Trying to add the workaround,
Error calling the IAM signBytes API:
{ "error": { "code": 400,
"message": "Request contains an invalid argument.",
"status": "INVALID_ARGUMENT" }
}
Exception Location: /usr/local/lib/python3.8/site-packages/google/auth/iam.py, line 81, in _make_signing_request
Workaround code as mention in Github issue:
from google.cloud import storage
from google.auth.transport import requests
from google.auth import compute_engine
from datetime import datetime, timedelta
def get_signing_creds(credentials):
auth_request = requests.Request()
print(credentials.service_account_email)
signing_credentials = compute_engine.IDTokenCredentials(auth_request, "", service_account_email=credentials.ser
vice_account_email)
return signing_credentials
client = storage.Client()
bucket = client.get_bucket('EXAMPLE_BUCKET')
blob = bucket.get_blob('libraries/image_1.png')
expires = datetime.now() + timedelta(seconds=86400)
signing_creds = get_signing_creds(client._credentials)
url = blob.generate_signed_url(expiration=expires, credentials=signing_creds)
print(url)
How do I generate a signed url under Google Cloud Run?
At this point, it seems like I may have to mount the service account key which I wanted to avoid.
EDIT:
To try and clarify, the service account has the correct permissions - it works in GCE and locally with the JSON private key.
Yes you can, but I had to deep dive to find how (jump to the end if you don't care about the details)
If you go in the _signing.py file, line 623, you can see this
if access_token and service_account_email:
signature = _sign_message(string_to_sign, access_token, service_account_email)
...
If you provide the access_token and the service_account_email, you can use the _sign_message method. This method uses the IAM service SignBlob API at this line
It's important because you can now sign blob without having locally the private key!! So, that solves the problem, and the following code works on Cloud Run (and I'm sure on Cloud Function)
def sign_url():
from google.cloud import storage
from datetime import datetime, timedelta
import google.auth
credentials, project_id = google.auth.default()
# Perform a refresh request to get the access token of the current credentials (Else, it's None)
from google.auth.transport import requests
r = requests.Request()
credentials.refresh(r)
client = storage.Client()
bucket = client.get_bucket('EXAMPLE_BUCKET')
blob = bucket.get_blob('libraries/image_1.png')
expires = datetime.now() + timedelta(seconds=86400)
# In case of user credential use, define manually the service account to use (for development purpose only)
service_account_email = "YOUR DEV SERVICE ACCOUNT"
# If you use a service account credential, you can use the embedded email
if hasattr(credentials, "service_account_email"):
service_account_email = credentials.service_account_email
url = blob.generate_signed_url(expiration=expires,service_account_email=service_account_email, access_token=credentials.token)
return url, 200
Let me know if it's not clear
The answer #guillaume-blaquiere posted here does work, but it requires an additional step not mentioned, which is to add the Service Account Token Creator role in IAM to your default service account, which will allow said default service account to "Impersonate service accounts (create OAuth2 access tokens, sign blobs or JWTs, etc)."
This allows the default service account to sign blobs, as per the signBlob documentation.
I tried it on AppEngine and it worked perfectly once that permission was given.
import datetime as dt
from google import auth
from google.cloud import storage
# SCOPES = [
# "https://www.googleapis.com/auth/devstorage.read_only",
# "https://www.googleapis.com/auth/iam"
# ]
credentials, project = auth.default(
# scopes=SCOPES
)
credentials.refresh(auth.transport.requests.Request())
expiration_timedelta = dt.timedelta(days=1)
storage_client = storage.Client(credentials=credentials)
bucket = storage_client.get_bucket("bucket_name")
blob = bucket.get_blob("blob_name")
signed_url = blob.generate_signed_url(
expiration=expiration_timedelta,
service_account_email=credentials.service_account_email,
access_token=credentials.token,
)
I downloaded a key for the AppEngine default service account to test locally, and in order to make it work properly outside of the AppEngine environment, I had to add the proper scopes to the credentials, as per the commented lines setting the SCOPES. You can ignore them if running only in AppEngine itself.
You can't sign urls with the default service account.
Try your service code again with a dedicated service account with the permissions, and see if that resolves your error
References and further reading:
https://stackoverflow.com/a/54272263
https://cloud.google.com/storage/docs/access-control/signed-urls
https://github.com/googleapis/google-auth-library-python/issues/238
An updated approach has been added to GCP's documentation for serverless instances such as Cloud Run and App Engine.
The following snippet shows how to create a signed URL from the storage library.
def generate_upload_signed_url_v4(bucket_name, blob_name):
"""Generates a v4 signed URL for uploading a blob using HTTP PUT.
Note that this method requires a service account key file. You can not use
this if you are using Application Default Credentials from Google Compute
Engine or from the Google Cloud SDK.
"""
# bucket_name = 'your-bucket-name'
# blob_name = 'your-object-name'
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(blob_name)
url = blob.generate_signed_url(
version="v4",
# This URL is valid for 15 minutes
expiration=datetime.timedelta(minutes=15),
# Allow PUT requests using this URL.
method="PUT",
content_type="application/octet-stream",
)
return url
Once your backend returns the signed URL you could execute curl put request from your frontend as follows
curl -X PUT -H 'Content-Type: application/octet-stream' --upload-file my-file 'my-signed-url'
I had to add both Service Account Token Creator and Storage Object Creator to the default compute engine service account (which is what my Cloud Run services use) before it worked. You could also create a custom Role that has just iam.serviceAccounts.signBlob instead of Service Account Token Creator, which is what I did:
I store the credentials.json contents in Secret Manager then load it in my Django app like this:
project_id = os.environ.get("GOOGLE_CLOUD_PROJECT")
client = secretmanager.SecretManagerServiceClient()
secret_name = "service_account_credentials"
secret_path = f"projects/{project_id}/secrets/{secret_name}/versions/latest"
credentials_json = client.access_secret_version(name=secret_path).payload.data.decode("UTF-8")
service_account_info = json.loads(credentials_json)
google_service_credentials = service_account.Credentials.from_service_account_info(
service_account_info)
I tried the answer from #guillaume-blaquiere and I added the permission recommended by #guilherme-coppini but when using Google Cloud Run I always saw the same "You need a private key to sign credentials.the credentials you are currently using..." error.
I am trying to upload a folder in my local machine to google cloud bucket. I get an error with the credentials. Where should I be providing the credentials and what all information is needed in it.
from_dest = '/Users/xyzDocuments/tmp'
gsutil_link = 'gs://bucket-1991'
from google.cloud import storage
try:
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print('File {} uploaded to {}.'.format(source_file_name,destination_blob_name))
except Exception as e:
print e
The error is
could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://developers.google.com/accounts/do`cs/application-default-credentials.
You need to acquire the application default credentials for your project and set them as an environmental variable:
Go to the Create service account key page in the GCP Console.
From the Service account drop-down list, select New service account.
Enter a name into the Service account name field.
From the Role drop-down list, select Project > Owner.
Click Create. A JSON file that contains your key downloads to your computer.
Then, set an environmental variable which will provide the application credentials to your application when it runs locally:
$ export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"
This error message is usually thrown when the application is not being authenticated correctly due to several reasons such as missing files, invalid credential paths, incorrect environment variables assignations, among other causes. Keep in mind that when you set an environment variable value in a session, it is reset every time the session is dropped.
Based on this, I recommend you to validate that the credential file and file path are being correctly assigned, as well as follow the Obtaining and providing service account credentials manually guide, in order to explicitly specify your service account file directly into your code; In this way, you will be able to set it permanently and verify if you are passing the service credentials correctly.
Passing the path to the service account key in code example:
def explicit():
from google.cloud import storage
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json('service_account.json')
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)