I have created an Azure function which is trigered when a new file is added to my Blob Storage. This part works well !
BUT, now I would like to start the "Speech-To-Text" Azure service using the API. So I try to create my URI leading to my new blob and then add it to the API call. To do so I created an SAS Token (From Azure Portal) and I add it to my new Blob Path .
https://myblobstorage...../my/new/blob.wav?[SAS Token generated]
By doing so I get an error which says :
Authentification failed Invalid URI
What am I missing here ?
N.B : When I generate manually the SAS token from the "Azure Storage Explorer" everything is working well. Plus my token is not expired in my test
Thank you for your help !
You might generate the SAS token with wrong authentication.
Make sure the Object option is checked.
Here is the reason in docs:
Service (s): Access to service-level APIs (e.g., Get/Set Service Properties, Get Service Stats, List Containers/Queues/Tables/Shares)
Container (c): Access to container-level APIs (e.g., Create/Delete Container, Create/Delete Queue, Create/Delete Table, Create/Delete
Share, List Blobs/Files and Directories)
Object (o): Access to object-level APIs for blobs, queue messages, table entities, and files(e.g. Put Blob, Query Entity, Get Messages,
Create File, etc.)
Related
data source is from SaaS Server's API endpoints, aim to use python to move data into AWS S3 Bucket(Python's Boto3 lib)
API is assigned via authorized Username/password combination and unique api-key.
then every time initially call API need get Token for further info fetch.
have 2 question:
how to manage those secrets above, save to a head file (*.ini, *.json *.yaml) or saved via AWS's Secret-Manager?
the Token is a bit challenging, the basically way is each Endpoint, fetch a new token and do the API call
then that's end of too many pipeline (like if 100 Endpoints info need per downstream business needs) then
need to craft 100 pipeline like an universal template repeating 100 times.
I am new to Python programing world, you all feel free to comment to share any user-case.
Much appreciate !!
I searched and read this show-case
[saving-from-api-to-s3-bucket/74648533]
saving from api to s3 bucket
and
"how-to-write-a-file-or-data-to-an-s3-object-using-boto3"
How to write a file or data to an S3 object using boto3
I found this has been helpful:
#Python-decopule summary: store parameters in .ini or .env files;
#few options of manage(hiding) sensitive info
a. IAM role
b. Store Secrets using **Parameter Store**
c. Store Secrets using **Secrets Manager** - Current method
recommended by AWS
I am using python sdk to copy blobs from one container to another, Here is the code,
from azure.storage.blob import BlobServiceClient
src_blob = '{0}/{1}'.format(src_url,blob_name)
destination_client = BlobServiceClient.from_connection_string(connectionstring)
copied_blob = destination_client.get_blob_client(dst_container,b_name)
copied_blob.start_copy_from_url(src_blob)
It throws the below error,
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>CannotVerifyCopySource</Code><Message>Public access is not permitted on this storage account.
I already gone through this post here and in my case the public access is disabled .
I do not have sufficient privilege to enable public access on the storage and test? Is there a work around solution to accomplish copy without changing that setting?
Azcopy 409 Public access is not permitted on this storage account
Do I need to change the way I connect to the account?
When copying a blob across storage accounts, the source blob must be publicly accessible so that Azure Storage Service can access the source blob. You were getting the error because you were using just the blob's URL. If the blob is in a private blob container, Azure Storage Service won't be able to access the blob using just its URL.
To fix this issue, you would need to generate a SAS token on the source blob with at least Read permission and use that SAS URL as copy source.
So your code would be something like:
src_blob_sas_token = generate_sas_token_somehow()
src_blob = '{0}/{1}?{2}'.format(src_url,blob_name, src_blob_sas_token)
check the privilege of your SAS token.
In your example, it doesn't look like you are passing the SAS token
For my current Python project I' using the Microsoft Azure SDK for Python.
I want to copy a specific blob from one container path to another and tested already some options, described here.
Overall they are basically "working", but unfortunately the new_blob.start_copy_from_url(source_blob_url) command always leads to an erorr: ErrorCode:CannotVerifyCopySource.
Is someone getting the same error message here, or has an idea, how to solve it?
I was also trying to modify the source_blob_url as a sas-token, but still doesn't work. I have the feeling that there is some connection to the access levels of the storage account, but so far I wasn't able to figure it out. Hopefully someone here can help me.
Is someone getting the same error message here, or has an idea, how to solve it?
As you have mentioned you might be receiving this error due to permissions while including the SAS Token.
The difference to my code was, that I used the blob storage sas_token from the Azure website, instead of generating it directly for the blob client with the azure function.
In order to allow access to certain areas of your storage account, a SAS is generated by default with a number of permissions such as read/write, services, resource type, Start and expiration date/time, and Allowed IP addresses, etc.
It's not that you always need to generate directly for the blob client with the azure function but you can generate one from the portal too by allowing the permissions.
REFERENCES: Grant limited access to Azure Storage resources using SAS - MSFT Document
I try to configure Secret Manager for my Composer (ver 1.16, airflow 1.10) but I have a weird situation like below. In my Composer, I've used a variable.json file to manage Variables in Airflow
# variable.json
{
"sleep": "False",
"ssh_host": "my_host"
}
Then, I use this article to configure Secret Manager. Follow the instructions, I override the config with a section secrets
backend: airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend
backend_kwargs: {"variables_prefix":"airflow-variables", "sep":"-", "project_id": "my-project"}
And in my Secret Manager, I also created my secret : airflow-variables-sercret_name .
In fact, everything is fine, I could get my secret via (and of course, I don't have any problem with the service account)
from airflow.models.variable import Variable
my_secret = Variable.get('sercret_name')
But when I check the Logs, I found out that Airflow also tries to find the other variables in my variables.json file
2022-04-13 15:49:46.590 CEST airflow-worker Google Cloud API Call Error (NotFound): Secret ID airflow-secrets-sleep not found.
2022-04-13 15:49:46.590 CEST airflow-worker Google Cloud API Call Error (NotFound): Secret ID airflow-secrets-ssh_host not found.
So how could I avoid this situation, please? Or Did I miss understand something?
Thanks !!!
These errors are known when you use Secret Manager, but here are some workarounds below:
Add a way to skip Secret backend.
File a Feature Request to lower the log priority; You can use this as a template for your issue.
Create logs exclusion in Cloud Logging.
About Logs Exclusion:
Sinks control how Cloud Logging routes logs. Sinks belong to a given Google Cloud resource: Cloud projects, billing accounts, folders, and organizations. When the resource receives a log entry, it routes the log entry according to the sinks contained by that resource. The routing behavior for each sink is controlled by configuring the inclusion filter and exclusion filters for that sink.
When you create a sink, you can set multiple exclusion filters, letting you exclude matching log entries from being routed to the sink's destination or from being ingested by Cloud Logging.
I am using the Python Google Storage Client, however I am using a bucket with public read/write access. (I know this is usually a terrible idea but I have a rare use case where it is fine).
When I try to retrieve some files, I get a DefaultCredentialsError.
BUCKET_NAME = 'my-public-bucket-name'
storage_client = storage.Client()
bucket = storage_client.get_bucket(BUCKET_NAME)
def list_blobs(prefix, delimiter=None):
blobs = bucket.list_blobs(prefix=prefix, delimiter=delimiter)
print('Blobs:')
for blob in blobs:
print(blob.name)
The specific error reads:
google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
That page suggests using Oath or other tokens, but I shouldn't need these since my bucket is public? I can make an HTTP request to the bucket in chrome and receive data.
How should I get around this issue? Can I provide default or null credentials?
The default for a storage client with no parameters is to use environment credentials (e.g. authenticate with the gcloud tools first). If you want to use a client with no credentials you have to use
the create_anonymous_client method, which lets you access resources available to allUsers.
Be careful though which APIs you use, not all of them support anonymous credentials. E.g. instead of client.get_bucket('my-bucket') you have to use client.bucket(bucket_name='my-bucket').
Also note that it seems any permissions error returns a generic ValueError: Anonymous credentials cannot be refreshed.. E.g. if you try to overwrite an existing file while only having read/write permissions.
So a full example of uploading a file to a publicly accessible bucket is
from google.cloud import storage
client = storage.Client.create_anonymous_client()
bucket = client.bucket(bucket_name='my-public-bucket')
blob = bucket.blob('my-file')
blob.upload_from_filename('my-local-file')
From "Cloud Storage Authentication":
Most of the operations you perform in Cloud Storage must be authenticated. The only exceptions are operations on objects that allow anonymous access. Objects are anonymously accessible if the allUsers group has READ permission. The allUsers group includes anyone on the Internet.