Save .xlsx file to Azure blob storage - python

I have a Django application and form which accepts from a user an Excel(.xlsx) and CSV (.csv) file. I need to save both files to Azure Blob Storage. I found it to be trivial to handle the .csv file but the same code fails when attempting up upload an xlsx file:
from azure.storage.blob import BlobServiceClient
# This code executes successfully when saving a CSV to blob storage
blob_service_client = BlobServiceClient.from_connection_string(os.getenv('STORAGE_CONN_STRING'))
blob_client = blob_service_client.get_blob_client(container="my-container-name", blob=form.cleaned_data.get('name_of_form_field_for_csv_file'))
blob_client.upload_blob(form.cleaned_data.get('name_of_form_field_for_csv_file''))
# This code fails when saving xlsx to blob storage
blob_client = blob_service_client.get_blob_client(container="my-container-name", blob=form.cleaned_data.get('name_of_form_field_for_xlsx_file'))
blob_client.upload_blob(form.cleaned_data.get('name_of_form_field_for_xlsx_file''))
ClientAuthenticationError at /mypage/create/
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
However, I've been unable to figure out how to save the .xlsx file. I--perhaps somewhat naively--assumed I could pass the .xlsx file as-is (like the .csv example above) but I get the error:
ClientAuthenticationError at /mypage/create/
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
I found this SO Answer about the above error, but there's no concensus at all on what the error means and I've been unable to progress much further from that link. However, there was some discussion about sending the data to Azure blob storage as a byte stream. Is this a possible way forward? I should note here that, ideally, I need to process the files in memory as my app is deployed within App Service (my understanding is that I don't have access to a file system in which to create and manipulate files.)
I have also learned that .xlsx files are compressed so do I need to first decompress the file and then send it as a byte stream? If so, has anyone got any experience with this who could point me in the right direction?
Storage account connection string:
STORAGE_CONN_STRING=DefaultEndpointsProtocol=https;AccountName=REDACTED;AccountKey=REDACTED;EndpointSuffix=core.windows.net

Did you try like below:
# Create a local directory to hold blob data
local_path = "./data"
os.mkdir(local_path)
# Create a file in the local data directory to upload and download
local_file_name = str(uuid.uuid4()) + ".xlsx"
upload_file_path = os.path.join(local_path, local_file_name)
# Write text to the file
file = open(upload_file_path, 'w')
file.write("Hello, World!")
file.close()
# Create a blob client using the local file name as the name for the blob
blob_client =
blob_service_client.get_blob_client(container=container_name,
blob=local_file_name)
# Upload the created file
with open(upload_file_path, "rb") as data:
blob_client.upload_blob(data)
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python

For reasons I don't fully understand (comments welcome for an explanation!), I can successfully save a .xlsx file to Azure Blob Storage with:
self.request.FILES['name_of_form_field_for_xlsx_file']
I suspect there's a difference in how csv vs. xlsx files are handled between request.FILES and form.cleaned_data.get() in Django, resulting in an authentication error as per the original question.
The full code to save a .csv and then a .xlsx is (note this is within a FormView):
from azure.storage.blob import BlobServiceClient
# Set connection string
blob_service_client = BlobServiceClient.from_connection_string(os.getenv('STORAGE_CONN_STRING'))
# Upload an xlsx file
blob_client = blob_service_client.get_blob_client(container="my-container", blob=self.request.FILES['xlsx_file'])
blob_client.upload_blob(self.request.FILES['xlsx_file'])
# Upload a CSV file
blob_client = blob_service_client.get_blob_client(container="my-container", blob=form.cleaned_data.get('csv_file'))
blob_client.upload_blob(form.cleaned_data.get('csv_file'))

Related

Does python has a functionality to upload tar.gz file from local pc to azure blob without extracting the files inside?

I have successfully downloaded tar.gz file from ftp server and have stored it in my local pc using below piece of code:
data = BytesIO()
save_file = ftp.retrbinary('RETR '+ filename, data.write, 1024)
data.seek(0)
uncompressed = gzip.decompress(data.read())
with open(filename, 'wb') as file:
file.write(uncompressed)
logging.info("success")
Now, I only want to upload the same to my azure blob storage without extracting it.
So far, I've tried this but it is letting me to do so:
```with open(filename, "rb") as f:
blob.upload_blob(f, overwrite=True)```
what I am missing here?
to upload tar.gz file from local pc to azure blob
I tried in my environment and got below results:
To upload tar.gz file from local folder to azure blob storage you can use below code.
Code:
from azure.storage.blob import BlobServiceClient
blobservice=BlobServiceClient.from_connection_string(conn_str="<connect-string>")
blob_client = blobservice.get_blob_client(container="test",
blob="sample1.tar.gz")
# Upload the created file
with open("C:\\Users\\v-vsettu\\Downloads\\sample.tar.gz", "rb") as data:
blob_client.upload_blob(data)
print("Uploaded!!!!!")
Console:
Portal:
Reference:
Quickstart: Azure Blob Storage client library for Python - Azure Storage | Microsoft Learn

AzureBlob Upload ERROR:The specified blob already exists

I am trying to upload file to Azure container daily.
I got an Error:"The specified blob already exists" when uploading file with same file( I want to overwrite the file)
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
conn_str = yml['AZURE_BLOB']['CONN_STR']
container_name = yml['AZURE_BLOB']['CONTAINER_NAME']
# Create the BlobServiceClient that is used to call the Blob service for the storage account
blob_service_client = BlobServiceClient.from_connection_string(conn_str=conn_str)
# Create a blob client using the local file name as the name for the blob
blob_client = blob_service_client.get_blob_client(container=container_name, blob=destination_file_name)
# Upload the created file
data = fs.open(source_path,mode='rb').read()
blob_client.upload_blob(data)
print(destination_file_name+'\t......[DONE]')
Error message:
azure.core.exceptions.ResourceExistsError: The specified blob already exists.
RequestId:13d062cd-801e-00a4-77c7-a81c56000000
Time:2019-12-02T04:18:06.0826908Z
ErrorCode:BlobAlreadyExists
Error:None
If you want to overwrite the existing blob using Blob storage client library v12, just add overwrite=True in the upload_blob method.
Here is the sample code:
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
conn_str = "xxx"
container_name = "test6"
blob_service_client = BlobServiceClient.from_connection_string(conn_str=conn_str)
blob_client = blob_service_client.get_blob_client(container=container_name,blob="a1.txt")
with open("F:\\temp\\a1.txt","rb") as data:
blob_client.upload_blob(data,overwrite=True)
print("**completed**")
After executing the code, the new blob is uploaded and the existing blob can be overwritten. Screenshot as below:
Check out this blog post about a known issue.
This is a known issue with development storage. This happens when there are multiple threads launched to upload the blocks (which constitute the blob). Basically what is happening is that development storage makes use of SQL Server as the data store. Now first thing it does is makes an entry into the table which stores blob information. If there are multiple threads working then all of these threads will try to perform the same operation. After the first thread succeeds, the subsequent threads will result in this exception being raised.

Uploading csv file using python to azure blob storage

I'm trying to upload a csv file to a container. It is constantly giving me an error that says - Retry policy did not allow for a retry: , HTTP status code=Unknown, Exception=HTTPSConnectionPool
Here is my code -
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='myAccoutName', account_key='myAccountKey')
block_blob_service.get_blob_to_path(container_name='test1', blob_name='pho.csv', file_path = 'C:\\Users\\A9Q5NZZ\\pho.csv')
I am new to Python so if you can answer with a simple language, that would be really helpful.
Forget uploading a CSV file, it doesn't even let me view existing blobs in an existing container! It gives the same 'Retry Policy' error for the below code -
container_name = 'test1'
generator = block_blob_service.list_blobs(container_name)
for blob in generator:
print("\t Blob name: " + blob.name)
I understand I've asked two questions, but I think the error is the same. Any help is appreciated. Again, since I am new to Python, an explanation/code with simpler terms would be great!
The method get_blob_to_path you're using is for downloading blob to local. If you want to upload a local file to azure blob storage, you should use this method block_blob_service.create_blob_from_path(container_name="",blob_name="",file_path="")
The sample code works at my side:
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='xxx', account_key='xxxx')
block_blob_service.create_blob_from_path(container_name="mycontainier",blob_name="test2.csv",file_path="D:\\temp\\test2.csv")

Get content_type from Google Cloud file

I have two api endpoints, one that takes a file from an http request and uploads it to a google cloud bucket using the python api, and another that downloads it again. in the first view, i get the file content type from the http request and upload it to the bucket,setting that metadata:
from google.cloud import storage
file_obj = request.FILES['file']
client = storage.Client.from_service_account_json(path.join(
path.realpath(path.dirname(__file__)),
'..',
'settings',
'api-key.json'
))
bucket = client.get_bucket('storage-bucket')
blob = bucket.blob(filename)
blob.upload_from_string(
file_text,
content_type=file_obj.content_type
)
Then in another view, I download the file:
...
bucket = client.get_bucket('storage-bucket')
blob = bucket.blob(filename)
blob.download_to_filename(path)
How can I access the file metadata I set earlier (content_type) ? It's not available on the blob object anymore since a new one was instantiated, but it still holds the file.
You should try
blob = bucket.get_blob(blob_name)
blob.content_type

Is this correct usage for loading a zip file into memory from Google Cloud Storage?

I am getting strange HTTP errors after I load a file from GCS in my python web app..
suspended generator urlfetch(context.py:1214) raised DeadlineExceededError(Deadline exceeded while waiting for HTTP response from URL: https://storage.googleapis.com/[bucketname]/dailyData_2014-01-11.zip)
However, based on what the app is logging below, it has already loaded the file (and based on memory usage, appears to be in memory).
bucket = '/[bucketname]'
filename = bucket + '/dailyData'+datetime.datetime.today().strftime('%Y-%m-%d')+'.zip'
gcs_file = gcs.open(filename,'r')
gcs_stats = gcs.stat(filename)
logging.info(gcs_stats)
zip_file = zipfile.ZipFile(gcs_file, 'r')
logging.info("zip file loaded")
Is there a way I should close the HTTP request or is it not actually loading the zip_file from memory and is instead trying to pull from GCS all the time...? Thanks!
You should make sure you close the files you're opening. You can use a with context, which will automatically close the file when it goes out of scope:
bucket = '/[bucketname]'
filename = bucket + '/dailyData'+datetime.datetime.today().strftime('%Y-%m-%d')+'.zip'
gcs_stats = gcs.stat(filename)
logging.info(gcs_stats)
with gcs.open(filename,'r') as gcs_file:
with zipfile.ZipFile(gcs_file, 'r') as zip_file:
logging.info("zip file loaded")

Categories

Resources