Uploading csv file using python to azure blob storage - python

I'm trying to upload a csv file to a container. It is constantly giving me an error that says - Retry policy did not allow for a retry: , HTTP status code=Unknown, Exception=HTTPSConnectionPool
Here is my code -
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='myAccoutName', account_key='myAccountKey')
block_blob_service.get_blob_to_path(container_name='test1', blob_name='pho.csv', file_path = 'C:\\Users\\A9Q5NZZ\\pho.csv')
I am new to Python so if you can answer with a simple language, that would be really helpful.
Forget uploading a CSV file, it doesn't even let me view existing blobs in an existing container! It gives the same 'Retry Policy' error for the below code -
container_name = 'test1'
generator = block_blob_service.list_blobs(container_name)
for blob in generator:
print("\t Blob name: " + blob.name)
I understand I've asked two questions, but I think the error is the same. Any help is appreciated. Again, since I am new to Python, an explanation/code with simpler terms would be great!

The method get_blob_to_path you're using is for downloading blob to local. If you want to upload a local file to azure blob storage, you should use this method block_blob_service.create_blob_from_path(container_name="",blob_name="",file_path="")
The sample code works at my side:
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='xxx', account_key='xxxx')
block_blob_service.create_blob_from_path(container_name="mycontainier",blob_name="test2.csv",file_path="D:\\temp\\test2.csv")

Related

Generate SAS token for a directory in Azure blob storage in Python

I'm attempting to use Python in order to limit which parts of my Azure storage different users can access.
I have been looking for code that can generate a SAS token for a specific directory in my Storage container. I am hoping that generating a SAS token on my directory, will give me access to the files/blobs it contains. (Just like how it works on azure.portal, where I can right-click my directory and press 'Generate SAS'.
however I have not been able to find any Python code that could archive this.
All I can find are the following 3 function:
generate_account_sas()
generate_container_sas()
generate_blob_sas()
Found here: https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob?view=azure-python
I have attemted to use the 'generate_blob_sas()' function but using the name of my directory instead of a file/blob.
from datetime import datetime, timedelta
from azure.storage.blob import BlobClient, generate_blob_sas, BlobSasPermissions
account_name = 'STORAGE_ACCOUNT_NAME'
account_key = 'STORAGE_ACCOUNT_ACCESS_KEY'
container_name = 'CONTAINER_NAME'
blob_name = 'NAME OF MY DIRECTORY'
def get_blob_sas(account_name,account_key, container_name, blob_name):
sas_blob = generate_blob_sas(account_name=account_name,
container_name=container_name,
blob_name=blob_name,
account_key=account_key,
permission=BlobSasPermissions(read=True),
expiry=datetime.utcnow() + timedelta(hours=1))
return sas_blob
blob = get_blob_sas(account_name,account_key, container_name, blob_name)
url = 'https://'+account_name+'.blob.core.windows.net/'+container_name+'/'+blob_name+'?'+blob
However when I attempt to use this url, I get the following response:
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>AuthenticationFailed</Code>
<Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:31qv254a-201e-0509-3f26-8587fb000000 Time:2021-07-30T09:37:21.1142028Z</Message>
<AuthenticationErrorDetail>Signature did not match. String to sign used was rt 2021-07-30T10:08:37Z /blob/my_account/my_container/my_directory/my_file.png 2020-06-12 b </AuthenticationErrorDetail>
</Error>
Is there some other way for me, to generate a SAS token on a directory?
From your description, it looks like your storage account is Data Lake Gen2. If that's the case, then you will need to use a different SDK.
The SDK you're using is for Azure Blob Storage (non Data Lake Gen2) accounts where folders are virtual folders and not the real ones.
The SDK you would want to use is azure-storage-file-datalake and the method you would want to use for generating a SAS token on a directory will be generate_file_system_sas.

Save .xlsx file to Azure blob storage

I have a Django application and form which accepts from a user an Excel(.xlsx) and CSV (.csv) file. I need to save both files to Azure Blob Storage. I found it to be trivial to handle the .csv file but the same code fails when attempting up upload an xlsx file:
from azure.storage.blob import BlobServiceClient
# This code executes successfully when saving a CSV to blob storage
blob_service_client = BlobServiceClient.from_connection_string(os.getenv('STORAGE_CONN_STRING'))
blob_client = blob_service_client.get_blob_client(container="my-container-name", blob=form.cleaned_data.get('name_of_form_field_for_csv_file'))
blob_client.upload_blob(form.cleaned_data.get('name_of_form_field_for_csv_file''))
# This code fails when saving xlsx to blob storage
blob_client = blob_service_client.get_blob_client(container="my-container-name", blob=form.cleaned_data.get('name_of_form_field_for_xlsx_file'))
blob_client.upload_blob(form.cleaned_data.get('name_of_form_field_for_xlsx_file''))
ClientAuthenticationError at /mypage/create/
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
However, I've been unable to figure out how to save the .xlsx file. I--perhaps somewhat naively--assumed I could pass the .xlsx file as-is (like the .csv example above) but I get the error:
ClientAuthenticationError at /mypage/create/
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
I found this SO Answer about the above error, but there's no concensus at all on what the error means and I've been unable to progress much further from that link. However, there was some discussion about sending the data to Azure blob storage as a byte stream. Is this a possible way forward? I should note here that, ideally, I need to process the files in memory as my app is deployed within App Service (my understanding is that I don't have access to a file system in which to create and manipulate files.)
I have also learned that .xlsx files are compressed so do I need to first decompress the file and then send it as a byte stream? If so, has anyone got any experience with this who could point me in the right direction?
Storage account connection string:
STORAGE_CONN_STRING=DefaultEndpointsProtocol=https;AccountName=REDACTED;AccountKey=REDACTED;EndpointSuffix=core.windows.net
Did you try like below:
# Create a local directory to hold blob data
local_path = "./data"
os.mkdir(local_path)
# Create a file in the local data directory to upload and download
local_file_name = str(uuid.uuid4()) + ".xlsx"
upload_file_path = os.path.join(local_path, local_file_name)
# Write text to the file
file = open(upload_file_path, 'w')
file.write("Hello, World!")
file.close()
# Create a blob client using the local file name as the name for the blob
blob_client =
blob_service_client.get_blob_client(container=container_name,
blob=local_file_name)
# Upload the created file
with open(upload_file_path, "rb") as data:
blob_client.upload_blob(data)
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python
For reasons I don't fully understand (comments welcome for an explanation!), I can successfully save a .xlsx file to Azure Blob Storage with:
self.request.FILES['name_of_form_field_for_xlsx_file']
I suspect there's a difference in how csv vs. xlsx files are handled between request.FILES and form.cleaned_data.get() in Django, resulting in an authentication error as per the original question.
The full code to save a .csv and then a .xlsx is (note this is within a FormView):
from azure.storage.blob import BlobServiceClient
# Set connection string
blob_service_client = BlobServiceClient.from_connection_string(os.getenv('STORAGE_CONN_STRING'))
# Upload an xlsx file
blob_client = blob_service_client.get_blob_client(container="my-container", blob=self.request.FILES['xlsx_file'])
blob_client.upload_blob(self.request.FILES['xlsx_file'])
# Upload a CSV file
blob_client = blob_service_client.get_blob_client(container="my-container", blob=form.cleaned_data.get('csv_file'))
blob_client.upload_blob(form.cleaned_data.get('csv_file'))

How can I obtain Etag of a file in a storage account using Python SDK for Azure?

I want to get the etag associated with a file which is uploaded in my storage account in my python code.
Please use the code below:
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='xx', account_key='xx')
myetag = block_blob_service.get_blob_properties("your_container","the_blob_name").properties.etag
print(myetag)
Test result:

Trouble with Google Application Credentials

Hi there first and foremost this is my first time using Googles services. I'm trying to develop an app with the Google AutoML Vision Api (Custom Model). I have already build a custom model and generated the API keys(I hope I did it correctly tho).
After many attempts of developing via Ionics & Android and failing to connect to the to the API.
I have now taken the prediction modelling given codes in Python (on Google Colab) and even with that I still get an error message saying that Could not automatically determine credentials. I'm not sure where I have gone wrong in this. Please help. Dying.
#installing & importing libraries
!pip3 install google-cloud-automl
import sys
from google.cloud import automl_v1beta1
from google.cloud.automl_v1beta1.proto import service_pb2
#import key.json file generated by GOOGLE_APPLICATION_CREDENTIALS
from google.colab import files
credentials = files.upload()
#explicit function given by Google accounts
[https://cloud.google.com/docs/authentication/production#auth-cloud-implicit-python][1]
def explicit():
from google.cloud import storage
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json(credentials)
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
#import image for prediction
from google.colab import files
YOUR_LOCAL_IMAGE_FILE = files.upload()
#prediction code from modelling
def get_prediction(content, project_id, model_id):
prediction_client = automl_v1beta1.PredictionServiceClient()
name = 'projects/{}/locations/uscentral1/models/{}'.format(project_id,
model_id)
payload = {'image': {'image_bytes': content }}
params = {}
request = prediction_client.predict(name, payload, params)
return request # waits till request is returned
#print function substitute with values
content = YOUR_LOCAL_IMAGE_FILE
project_id = "REDACTED_PROJECT_ID"
model_id = "REDACTED_MODEL_ID"
print (get_prediction(content, project_id, model_id))
Error Message when run the last line of code:
credentials = files.upload()
storage_client = storage.Client.from_service_account_json(credentials)
these two lines are the issue I think.
The first one actually loads the contents of the file, but the second one expects a path to a file, instead of the contents.
Lets tackle the first line first:
I see that just passing the credentials you get after calling credentials = files.upload() will not work as explained in the docs for it. Doing it like you're doing, the credentials don't actually contain the value of the file directly, but rather a dictionary for filenames & contents.
Assuming you're only uploading the 1 credentials file, you can get the contents of the file like this (stolen from this SO answer):
from google.colab import files
uploaded = files.upload()
credentials_as_string = uploaded[uploaded.keys()[0]]
So now we actually have the contents of the uploaded file as a string, next step is to create an actual credentials object out of it.
This answer on Github shows how to create a credentials object from a string converted to json.
import json
from google.oauth2 import service_account
credentials_as_dict = json.loads(credentials_as_string)
credentials = service_account.Credentials.from_service_account_info(credentials_as_dict)
Finally we can create the storage client object using this credentials object:
storage_client = storage.Client(credentials=credentials)
Please note I've not tested this though, so please give it a go and see if it actually works.

Python Boto3 AWS Multipart Upload Syntax

I am successfully authenticating with AWS and using the 'put_object' method on the Bucket object to upload a file. Now I want to use the multipart API to accomplish this for large files. I found the accepted answer in this question:
How to save S3 object to a file using boto3
But when trying to implement I am getting "unknown method" errors. What am I doing wrong? My code is below. Thanks!
## Get an AWS Session
self.awsSession = Session(aws_access_key_id=accessKey,
aws_secret_access_key=secretKey,
aws_session_token=session_token,
region_name=region_type)
...
# Upload the file to S3
s3 = self.awsSession.resource('s3')
s3.Bucket('prodbucket').put_object(Key=fileToUpload, Body=data) # WORKS
#s3.Bucket('prodbucket').upload_file(dataFileName, 'prodbucket', fileToUpload) # DOESNT WORK
#s3.upload_file(dataFileName, 'prodbucket', fileToUpload) # DOESNT WORK
The upload_file method has not been ported over to the bucket resource yet. For now you'll need to use the client object directly to do this:
client = self.awsSession.client('s3')
client.upload_file(...)
Libcloud S3 wrapper transparently handles all the splitting and uploading of the parts for you.
Use upload_object_via_stream method to do so:
from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver
# Path to a very large file you want to upload
FILE_PATH = '/home/user/myfile.tar.gz'
cls = get_driver(Provider.S3)
driver = cls('api key', 'api secret key')
container = driver.get_container(container_name='my-backups-12345')
# This method blocks until all the parts have been uploaded.
extra = {'content_type': 'application/octet-stream'}
with open(FILE_PATH, 'rb') as iterator:
obj = driver.upload_object_via_stream(iterator=iterator,
container=container,
object_name='backup.tar.gz',
extra=extra)
For official documentation on S3 Multipart feature, refer to AWS Official Blog.

Categories

Resources