azure copy blobs across storage accounts fails with ErrorCode:CannotVerifyCopySource - python

I am using python sdk to copy blobs from one container to another, Here is the code,
from azure.storage.blob import BlobServiceClient
src_blob = '{0}/{1}'.format(src_url,blob_name)
destination_client = BlobServiceClient.from_connection_string(connectionstring)
copied_blob = destination_client.get_blob_client(dst_container,b_name)
copied_blob.start_copy_from_url(src_blob)
It throws the below error,
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>CannotVerifyCopySource</Code><Message>Public access is not permitted on this storage account.
I already gone through this post here and in my case the public access is disabled .
I do not have sufficient privilege to enable public access on the storage and test? Is there a work around solution to accomplish copy without changing that setting?
Azcopy 409 Public access is not permitted on this storage account
Do I need to change the way I connect to the account?

When copying a blob across storage accounts, the source blob must be publicly accessible so that Azure Storage Service can access the source blob. You were getting the error because you were using just the blob's URL. If the blob is in a private blob container, Azure Storage Service won't be able to access the blob using just its URL.
To fix this issue, you would need to generate a SAS token on the source blob with at least Read permission and use that SAS URL as copy source.
So your code would be something like:
src_blob_sas_token = generate_sas_token_somehow()
src_blob = '{0}/{1}?{2}'.format(src_url,blob_name, src_blob_sas_token)

check the privilege of your SAS token.
In your example, it doesn't look like you are passing the SAS token

Related

Azure IoTHub - FileUpload - Remove Device Name from BlobName

I work on a solution in order to upload images from a LocalFileSystem to Azure Storage.
For the moment we use a TokenSaS and a BlobClient but we would like to avoid to store locally an expiring SaSToken.
In order to do this, we thought about Azure IoTHub, that allows us to replace this process.
def __upload_file_Azure_IoTHub(self,src_path:str,blob_path:str) ->BlobClient:
if os.path.exists(src_path):
# We start by creating a blobClient from AzureIoTHub
storage_info=self.IoTHub_client.get_storage_info_for_blob(blob_path)
# We create the SAS Url from Client + Token
sas_url="https://{}/{}/{}{}".format(
storage_info["hostName"],
storage_info["containerName"],
storage_info['blobName'],
storage_info["sasToken"])
try:
with BlobClient.from_blob_url(sas_url) as blob_client:
with open(src_path, "rb") as fp:
blob=blob_client.upload_blob(fp,overwrite=True, timeout=self.config.azure_timeout)
self.IoTHub_client.notify_blob_upload_status(storage_info["correlationId"], True, 200, "OK: {}".format(blob_path))
return blob
except Exception as ex:
self.IoTHub_client.notify_blob_upload_status(storage_info["correlationId"], False, 403, "Upload Failed")
raise Exception("AzureUpload_IoTHub")
https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/iot-hub/iot-hub-devguide-file-upload.md#device-initialize-a-file-upload
The problem is that when we upload this way, the device name is added as a prefix to the blobName.
It is a problem because it will cause us problems of rectrocompatibility :
We defined a naming convention in our storage and this behavior will break everything.
Let's imagine :
DeviceName = FileWatcherDaemon
BlobPath = YYYY/MM/DD/MyBlob.whatever
# Then :
BlobName = FileWatcherDaemon/YYYY/MM/DD/MyBlob.whatever
# Instead of :
BlobName = YYYY/MM/DD/MyBlob.whatever
I tried replacing the blobName by my blob_path, but it is not working because the generated sasToken is blobLevel and not containerLevel.
Do you have an idea about how to remove this device name ?
For us it is a problem because we will have many different devices uploading in the same storage. We would like the naming convention to fit with our business needs and not to technical information.
Are you using any of the other capabilities of Azure IoT Hub besides the fileUpload? Do you need your device to received messages from the Cloud? For your scenario, it may be overkill using IoT Hub just for uploading files.
For the moment we use a TokenSaS and a BlobClient but we would like to avoid to store locally an expiring SaSToken.
Have a look at Identity and access management Security Recommendations and consider using Azure Key Vault in your scenario.
Microsoft recommends using Azure AD to authorize requests to Azure Storage. However, if you must use Shared Key authorization, then secure your account keys with Azure Key Vault. You can retrieve the keys from the key vault at runtime, instead of saving them with your application. For more information about Azure Key Vault, see Azure Key Vault overview.
For more security "Azure Key Vaults may be either software-protected or, with the Azure Key Vault Premium tier, hardware-protected by hardware security modules (HSMs)."

Azure function and Azure Blob Storage

I have created an Azure function which is trigered when a new file is added to my Blob Storage. This part works well !
BUT, now I would like to start the "Speech-To-Text" Azure service using the API. So I try to create my URI leading to my new blob and then add it to the API call. To do so I created an SAS Token (From Azure Portal) and I add it to my new Blob Path .
https://myblobstorage...../my/new/blob.wav?[SAS Token generated]
By doing so I get an error which says :
Authentification failed Invalid URI
What am I missing here ?
N.B : When I generate manually the SAS token from the "Azure Storage Explorer" everything is working well. Plus my token is not expired in my test
Thank you for your help !
You might generate the SAS token with wrong authentication.
Make sure the Object option is checked.
Here is the reason in docs:
Service (s): Access to service-level APIs (e.g., Get/Set Service Properties, Get Service Stats, List Containers/Queues/Tables/Shares)
Container (c): Access to container-level APIs (e.g., Create/Delete Container, Create/Delete Queue, Create/Delete Table, Create/Delete
Share, List Blobs/Files and Directories)
Object (o): Access to object-level APIs for blobs, queue messages, table entities, and files(e.g. Put Blob, Query Entity, Get Messages,
Create File, etc.)

Use Python Google Storage Client without credentials

I am using the Python Google Storage Client, however I am using a bucket with public read/write access. (I know this is usually a terrible idea but I have a rare use case where it is fine).
When I try to retrieve some files, I get a DefaultCredentialsError.
BUCKET_NAME = 'my-public-bucket-name'
storage_client = storage.Client()
bucket = storage_client.get_bucket(BUCKET_NAME)
def list_blobs(prefix, delimiter=None):
blobs = bucket.list_blobs(prefix=prefix, delimiter=delimiter)
print('Blobs:')
for blob in blobs:
print(blob.name)
The specific error reads:
google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
That page suggests using Oath or other tokens, but I shouldn't need these since my bucket is public? I can make an HTTP request to the bucket in chrome and receive data.
How should I get around this issue? Can I provide default or null credentials?
The default for a storage client with no parameters is to use environment credentials (e.g. authenticate with the gcloud tools first). If you want to use a client with no credentials you have to use
the create_anonymous_client method, which lets you access resources available to allUsers.
Be careful though which APIs you use, not all of them support anonymous credentials. E.g. instead of client.get_bucket('my-bucket') you have to use client.bucket(bucket_name='my-bucket').
Also note that it seems any permissions error returns a generic ValueError: Anonymous credentials cannot be refreshed.. E.g. if you try to overwrite an existing file while only having read/write permissions.
So a full example of uploading a file to a publicly accessible bucket is
from google.cloud import storage
client = storage.Client.create_anonymous_client()
bucket = client.bucket(bucket_name='my-public-bucket')
blob = bucket.blob('my-file')
blob.upload_from_filename('my-local-file')
From "Cloud Storage Authentication":
Most of the operations you perform in Cloud Storage must be authenticated. The only exceptions are operations on objects that allow anonymous access. Objects are anonymously accessible if the allUsers group has READ permission. The allUsers group includes anyone on the Internet.

How to properly use create_anonymous_client() function in google cloud storage python library for access on public buckets?

I made a publicly listable bucket on google cloud storage. I can see all the keys if I try to list the bucket objects in the browser. I was trying to use the create_anonymous_client() function so that I can list the bucket keys in the python script. It is giving me an exception. I looked up everywhere and still can't find the proper way to use the function.
from google.cloud import storage
client = storage.Client.create_anonymous_client()
a = client.lookup_bucket('publically_listable_bucket')
a.list_blobs()
Exception I am getting:
ValueError: Anonymous credentials cannot be refreshed.
Additional Query: Can I list and download contents of public google cloud storage buckets using boto3, If yes, how to do it anonymously?
I was also struggling with thing and couldn't find an answer anywhere online. Turns out you can access the bucket with just the bucket() method.
I'm not sure why, but this method can take several seconds sometimes.
client = storage.Client.create_anonymous_client()
bucket = client.bucket('publically_listable_bucket')
blobs = list(bucket.list_blobs())
This error means the bucket you are attempting to list does not grant the right permission. You must Give "Storage Object Viewer" or "Storage Legacy Bucket Reader" role to "allUsers".

how can i download my data from google-cloud-platform using python?

I have my data on google cloud platform and i want to be able to be able to download it locally, this is my first time trying that and eventually i'll use the downloaded data with my python code.
I have checked the docs, like https://cloud.google.com/genomics/downloading-credentials-for-api-access and https://cloud.google.com/storage/docs/cloud-console i have successfully got the Json file for my first link, the second one is where u'm struggling, i'm using python 3.5 and assuming my json files name is data.json i have added the following code:
os.environ["file"] = "data.json"
urllib.request.urlopen('https://storage.googleapis.com/[bucket_name]/[filename]')
first of all i don't even know what i should call the value near environ so i just called it file, not sure how i'm supposed to fill it and i got access denied on the second line, obviously it's not how to download my file as there is no destination local repository or anything in that command any guidance will be appreciated.
Edit:
from google.cloud.storage import Blob
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "credentials/client_secret.json"
storage_client = storage.Client.from_service_account_json('service_account.json')
client = storage.Client(project='my-project')
bucket = client.get_bucket('my-bucket')
blob = Blob('path/to/my-object', bucket)
download_to_filename('local/path/to/my-file')
I'm getting unresolved reference for storage and download_to_filename and should i replace service_account.json with credentials/client_secret.json. Plus i tried to print the content of os.environ["GOOGLE_APPLICATION_CREDENTIALS"]['installed'] like i'd do with any Json but it just said i should give numbers meaning it read the input path as regular text only.
You should use the idiomatic Google Cloud library to run operations in GCS.
With the example there, and knowing that the client library will get the application default credentials, first we have to set the applicaiton default credentials with
gcloud auth application-default login
===EDIT===
That was the old way. Now you should use the instructions in this link.
This means downloading a service account key file from the console, and setting the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the downloaded JSON.
Also, make sure that this service account has the proper permissions on the project of the bucket.
Or you can create the client with explicit credentials. You'll need to download the key file all the same, but when creating the client, use:
storage_client = storage.Client.from_service_account_json('service_account.json')
==========
And then, following the example code:
from google.cloud import storage
client = storage.Client(project='project-id')
bucket = client.get_bucket('bucket-id')
blob = storage.Blob('bucket/file/path', bucket)
blob.download_to_filename('/path/to/local/save')
Or, if this is a one-off download, just install the SDK and use gsutil to download:
gsutil cp gs://bucket/file .

Categories

Resources