Download Blob From Blob Storage Using Python

Download Blob From Blob Storage Using Python - python

I am trying to download an excel file on a blob. However, it keeps generating the error "The specified blob does not exist". This error happens at blob_client.download_blob() although I can get the blob_client. Any idea why or other ways I can connect using managed identity?
default_credential = DefaultAzureCredential()
blob_url = BlobServiceClient('url', credential = default_credential)
container_client = blob_url.get_container_client('xx-xx-data')
blob_client = container_client.get_blob_client('TEST.xlsx')
downloaded_blob = blob_client.download_blob()
df=pd.read_excel(downloaded_blob.content_as_bytes(), sheet_name='Test',skiprows=2)

Turns out that I have to also provide 'Reader' access on top of 'Storage Blob Data Contributor' to be able to identify the blob. There was no need for SAS URL.

The reason you're getting this error is because each request to Azure Blob Storage must be an authenticated request. Only exception to this is when you're reading (downloading) a blob from a public blob container. In all likelihood, the blob container holding this blob is having a Private ACL and since you're sending an unauthenticated request, you're getting this error.
I would recommend using a Shared Access Signature (SAS) URL for the blob with Read permission instead of simple blob URL. Since a SAS URL has authorization information embedded in the URL itself (sig portion), you should be able to download the blob provided SAS is valid and has not expired.
Please see this for more information on Shared Access Signature: https://learn.microsoft.com/en-us/rest/api/storageservices/delegate-access-with-shared-access-signature.

Related

Boto3 (SignatureDoesNotMatch) when calling the GetObject operation: Unknown

I'm uploading and delivering my files to an Object Storage (using Contabo as provider). Uploading works.
The problem is that I can't figure out how to A: access files without setting them to "Public" from inside my Object Storage and B: protect the uploaded files from being accessed by unauthorized users when displaying them on my webpage.
The idea is that we save the link to the file in the Object Storage inside our database. When someone wants to see the file in question, they get back the link from our database and can view it if they're logged in and have authorization. When I want to access the file via the link itself I only get an "Unauthorized" JSON response back. When trying to access the file via the boto3 get_object operation I get back the following error:
An error occurred (SignatureDoesNotMatch) when calling the GetObject operation: Unknown
The code trying to get the desired object is as follows:
client = boto3.client(
"s3",
region_name=settings.OBJECT_STORAGE_REGION_NAME,
endpoint_url=settings.OBJECT_STORAGE_ENDPOINT_URL,
aws_access_key_id=settings.AWS_ACCESS_KEY,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY
)
#https://eu2.contabostorage.com/bucket/videos/file.mp4
link = "/videos/file.mp4"
response = client.get_object(Bucket="bucket", Key=link)
data = response["Body"].read()
print(data)

AuthorizationPermissionMismatch when copy blobs across different storage accounts

I am using below code to copy blob across different storage accounts, but it fails with the below error
src_blob = '{0}/{1}?{2}'.format('source_url',b_name,'sp=rw&st=2022-11-17T20:44:03Z&se=2022-12-31T04:44:03Z&spr=https&sv=2021-06-08&sr=c&sig=ZXRe2FptVF5ArRM%2BKDAkLboCN%2FfaD9Mx38yZGWhnps0%3D')
destination_client = BlobServiceClient.from_connection_string("destination_connection_string")//The connection string has sas token which has sr=c
copied_blob = destination_client.get_blob_client('standardfeed', b_name)
copied_blob.start_copy_from_url(src_blob)
ErrorCode: AuthorizationPermissionMismatch
This request is not authorized to perform this operation using this permission.
Any thing missing or did I copy the wrong SAS token?

I tried in my environment and successfully copied blob from one storage account to another storage account.
Code:
from azure.storage.blob import BlobServiceClient
b_name="sample1.pdf"
src_blob = '{0}/{1}?{2}'.format('https://venkat123.blob.core.windows.net/test',b_name,'sp=r&st=2022-11-18T07:46:10Z&se=2022-11-18T15:46:10Z&spr=https&sv=<SAS token >)
destination_client = BlobServiceClient.from_connection_string("<connection string>")
copied_blob = destination_client.get_blob_client('test1', b_name)
copied_blob.start_copy_from_url(src_blob)
Console:
Portal:
Make sure you has necessary permission for authentication purpose you need to assign roles in your storage account.
Storage Blob Data Contributor
Storage Blob Data Reader
Portal:
Update:
You can get the connection string through portal:
Reference:
Azure Blob Storage "Authorization Permission Mismatch" error for get request with AD token - Stack Overflow

Is there a temporary directory or direct way to upload a file in azure storage?

so I try to make a python API so the user can upload a pdf file then the API directly sends it to Azure storage. what I found is I must have a directory i.e.
container_client = ContainerClient.from_connection_string(conn_str=conn_str,container_name='mycontainer')
with open('mylocalpath/myfile.pdf',"rb") as data:
container_client.upload_blob(name='myblockblob.pdf', data=data)
another solution is I have to store it on VM and then replace the local path to it, but I don't want to make my VM full.

If you want to upload it directly from the client-side to azure storage blob instead of receiving that file to your API you can use azure shared access signature inside your storage account and from your API you can make a function to generate Pre-Signed URL using that shared access signature service and return that URL to your client it will allow the client to upload file to your blob via that URL.
To generate URL can you follow the below code:
from datetime import datetime, timedelta
from azure.storage.blob import generate_blob_sas, BlobSasPermissions
blobname= "<blobname>"
accountkey="<accountkey>" #get this from access key section in azure storage.
containername = "<containername>"
def getpushurl(filename):
token = generate_blob_sas(
account_name=blobname,
container_name=containername,
account_key=accountkey,
permission=BlobSasPermissions(write=True),
expiry=datetime.utcnow() + timedelta(seconds=100),
blob_name=filename,
)
url = f"https://{blobname}.blob.core.windows.net/{containername}/{filename}?{token}"
return url
pdfpushurl = getpushurl("demo.text")
print(pdfpushurl)
So after generating this URL give it to the client so client could directly send the file to the URL received with PUT request and it will get uploaded directly to azure storage.

You can generate a SAS token with write permission for your users so that your users could upload .pdf files directly on their side without storing them on the server. For details, pls see my previous post here.
Try the code below to generate a SAS token with container write permission:
from azure.storage.blob import BlobServiceClient,ContainerSasPermissions,generate_container_sas
from datetime import datetime, timedelta
storage_connection_string=''
container_name = ''
block_blob_service = BlobServiceClient.from_connection_string(storage_connection_string)
container_client = block_blob_service.get_container_client(container_name)
sasToken = generate_container_sas(account_name=container_client.account_name,
container_name=container_client.container_name,
account_key= container_client.credential.account_key,
#grant write permission only
permission=ContainerSasPermissions(write=True),
start=datetime.utcnow() - timedelta(minutes=1),
#1 hour vaild time
expiry=datetime.utcnow() + timedelta(hours=1)
)
print(sasToken)
After you have replied to this SAS token to your user, just see this official guide to upload files from a HTML page, I think it would be helpful if you are developing a web app.

Unable to validate access credentials when creating/using presigned URLs in boto3

I'm using boto3 to copy encrypted EBS snapshots from one region to another, but I've been getting Invalid presigned URL messages when I try to copy. I'm generating the presigned URL using the boto3 client method generate_presigned_url in the source region and calling the copy function in the destination region like so:
uw2_client = non_prod.client(
'ec2',
region_name="us-west-2",
config=Config(signature_version='s3v4')
)
presigned_url = uw2_client.generate_presigned_url(
ClientMethod='copy_snapshot',
Params={
'SourceSnapshotId': og_snapshot_id, # Original snapshot ID
'SourceRegion': 'us-west-2',
'DestinationRegion': 'us-east-1'
# I also tried include all parameters from copy_snapshot.
# It didn't make a difference.
# 'Description': desc,
# 'KmsKeyId': 'alias/xva-nonprod-all-amicopykey',
# 'Encrypted': True,
}
)
Here's my code to create the copy.
ue1_client = non_prod.client(
'ec2',
region_name="us-east-1",
config=Config(signature_version='s3v4')
)
response = ue1_client.copy_snapshot(
Description=desc,
KmsKeyId='alias/xva-nonprod-all-amicopykey', # Exists in us-east-1
Encrypted=True,
SourceSnapshotId=og_snapshot_id,
SourceRegion='us-west-2',
DestinationRegion='us-east-1',
PresignedUrl=pre_signed_url
)
It successfully returns the presigned URL. But if I attempt to use that presigned URL to copy a snapshot, I get the invalid URL error. If I try to validate the url:
r = requests.post(presigned_url)
print(r.status_code)
print(r.text)
I get:
<Response>
<Errors>
<Error>
<Code>AuthFailure</Code>
<Message>AWS was not able to validate the provided access credentials</Message>
</Error>
</Errors>
<RequestID>3189bb5b-54c9-4d11-ab4c-762cbea32d9a</RequestID>
</Response>
You'd think that it would an issue with my credentials, but I'm not sure how... It's the same credentials I'm using to create the pre-signed URL. And my IAM user has unfettered access to EC2.
I'm obviously doing something wrong here, but I cannot figure out what it is. Any insight would be appreciated.
EDIT
Just to confirm that it's not a permissions issue, I tried this with my personal account which has access to everything. Still getting the same error message.

As it turns out, the documentation is wrong... A signed URL is NOT required when copying encrypted snapshots within the same account (according to AWS Support).
From AWS Support:
... it's not actually necessary to create the pre-signed URL in order to copy encrypted snapshot from one region to another (within the same AWS account).
However, according to their documentation, it's not possible to copy encrypted snapshots to another account either... ¯\_(ツ)_/¯
The current boto3.EC2.Client.copy_snapshot function documentation says:
PresignedUrl (string) --
When you copy an encrypted source snapshot using the Amazon EC2 Query API, you must supply a pre-signed URL. This parameter is optional for unencrypted snapshots.
Instead, it can simply be accomplished by creating the client object in the destination region and calling the copy_snapshot() method like so:
try:
ec2 = boto3.client(
service_name='ec2',
region_name='us-east-1'
)
ec2.copy_snapshot(
SourceSnapshotId='snap-xxxxxxxxxxxx',
SourceRegion='us-west-2',
Encrypted=True,
KmsKeyId='DestinationRegionKeyId'
)
except Exception as e:
print(e)

Uploading large files to Google Storage GCE from a Kubernetes pod

We get this error when uploading a large file (more than 10Mb but less than 100Mb):
403 POST https://www.googleapis.com/upload/storage/v1/b/dm-scrapes/o?uploadType=resumable: ('Response headers must contain header', 'location')
Or this error when the file is more than 5Mb
403 POST https://www.googleapis.com/upload/storage/v1/b/dm-scrapes/o?uploadType=multipart: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>)
It seems that this API is looking at the file size and trying to upload it via multi part or resumable method. I can't imagine that is something that as a caller of this API I should be concerned with. Is the problem somehow related to permissions? Does the bucket need special permission do it can accept multipart or resumable upload.
from google.cloud import storage
try:
client = storage.Client()
bucket = client.get_bucket('my-bucket')
blob = bucket.blob('blob-name')
blob.upload_from_filename(zip_path, content_type='application/gzip')
except Exception as e:
print(f'Error in uploading {zip_path}')
print(e)
We run this inside a Kubernetes pod so the permissions get picked up by storage.Client() call automatically.
We already tried these:
Can't upload with gsutil because the container is Python 3 and gsutil does not run in python 3.
Tried this example: but runs into the same error: ('Response headers must contain header', 'location')
There is also this library. But it is basically alpha quality with little activity and no commits for a year.
Upgraded to google-cloud-storage==1.13.0
Thanks in advance

The problem was indeed the credentials. Somehow the error message was very miss-leading. When we loaded the credentials explicitly the problem went away.
# Explicitly use service account credentials by specifying the private key file.
storage_client = storage.Client.from_service_account_json(
'service_account.json')

I found my node pools had been spec'd with
oauthScopes:
- https://www.googleapis.com/auth/devstorage.read_only
and changing it to
oauthScopes:
- https://www.googleapis.com/auth/devstorage.full_control
fixed the error. As described in this issue the problem is an uninformative error message.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Download Blob From Blob Storage Using Python - python

Turns out that I have to also provide 'Reader' access on top of 'Storage Blob Data Contributor' to be able to identify the blob. There was no need for SAS URL.

Related

Boto3 (SignatureDoesNotMatch) when calling the GetObject operation: Unknown

AuthorizationPermissionMismatch when copy blobs across different storage accounts

Is there a temporary directory or direct way to upload a file in azure storage?

Unable to validate access credentials when creating/using presigned URLs in boto3

Uploading large files to Google Storage GCE from a Kubernetes pod

Categories

Resources