AzureBlob Upload ERROR:The specified blob already exists - python

I am trying to upload file to Azure container daily.
I got an Error:"The specified blob already exists" when uploading file with same file( I want to overwrite the file)
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
conn_str = yml['AZURE_BLOB']['CONN_STR']
container_name = yml['AZURE_BLOB']['CONTAINER_NAME']
# Create the BlobServiceClient that is used to call the Blob service for the storage account
blob_service_client = BlobServiceClient.from_connection_string(conn_str=conn_str)
# Create a blob client using the local file name as the name for the blob
blob_client = blob_service_client.get_blob_client(container=container_name, blob=destination_file_name)
# Upload the created file
data = fs.open(source_path,mode='rb').read()
blob_client.upload_blob(data)
print(destination_file_name+'\t......[DONE]')
Error message:
azure.core.exceptions.ResourceExistsError: The specified blob already exists.
RequestId:13d062cd-801e-00a4-77c7-a81c56000000
Time:2019-12-02T04:18:06.0826908Z
ErrorCode:BlobAlreadyExists
Error:None

If you want to overwrite the existing blob using Blob storage client library v12, just add overwrite=True in the upload_blob method.
Here is the sample code:
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
conn_str = "xxx"
container_name = "test6"
blob_service_client = BlobServiceClient.from_connection_string(conn_str=conn_str)
blob_client = blob_service_client.get_blob_client(container=container_name,blob="a1.txt")
with open("F:\\temp\\a1.txt","rb") as data:
blob_client.upload_blob(data,overwrite=True)
print("**completed**")
After executing the code, the new blob is uploaded and the existing blob can be overwritten. Screenshot as below:

Check out this blog post about a known issue.
This is a known issue with development storage. This happens when there are multiple threads launched to upload the blocks (which constitute the blob). Basically what is happening is that development storage makes use of SQL Server as the data store. Now first thing it does is makes an entry into the table which stores blob information. If there are multiple threads working then all of these threads will try to perform the same operation. After the first thread succeeds, the subsequent threads will result in this exception being raised.

Related

Failed to load data file into Azure blob storage container with Python program

I am using Azure storage account connection string to load a data file into Azure blob storage container, using Python program. Here is the code snippet of my program:
from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
... ...
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_name = "test"
# Create the container
container_client = blob_service_client.create_container(container_name)
upload_file_path = "dummy_data.xlsx"
blob_client = blob_service_client.get_blob_client(container=container_name, blob=upload_file_path)
# Upload file
with open(file=upload_file_path, mode="rb") as data:
blob_client.upload_blob(data)
My program successfully created a container in the blog storage, but failed to load data into the container, with error message like this:
ClientAuthenticationError: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:adfasa-asdfa0adfa
Time:2022-10-25T20:32:19.0165690Z
ErrorCode:AuthenticationFailed
authenticationerrordetail:The MAC signature found in the HTTP request 'bacadreRER=' is not the same as any computed signature. Server used following string to sign: 'PUT
I got stuck with the error. I tried to use SAS key and it worked. Why it's not working for a connection string? I am following Microsoft's code example to write my program:
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?tabs=managed-identity%2Croles-azure-portal%2Csign-in-azure-cli
Tried to manually upload data file with Azure Portal, and it worked. Using SAS key string in my Python code was also working. But it didn't work with Access Key connection string. It's odd that with the connection string I could create a container successfully.
I tried in my environment and got below results:
I executed the same code and successfully uploaded file in blob storage.
Code:
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
connect_str="DefaultEndpointsProtocol=https;AccountName=storage326123;AccountKey=3Lf7o2+vi3HgGKmUWaIG4xVdyzrzhxW5NxDNaUGVwykBPT5blZNKIyjbQlo0OAfuz0nllLUOGLRs+ASt9gqF+Q==;EndpointSuffix=core.windows.net"
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_name = "test"
# Create the container
container_client = blob_service_client.create_container(container_name)
upload_file_path = "C:\\Users\\v-vsettu\\Downloads\\dog.jpg"
blob_client = blob_service_client.get_blob_client(container=container_name, blob=upload_file_path)
# Upload file
with open(file=upload_file_path, mode="rb") as data:
blob_client.upload_blob(data)
Console:
Portal:
ClientAuthenticationError: Server failed to authenticate the request.Make sure the value of Authorization header is formed correctly including the signature. RequestId:adfasa-asdfa0adfa
Time:2022-10-25T20:32:19.0165690ZErrorCode:AuthenticationFailed
authenticationerrordetail:The MAC signature found in the HTTP request'bacadreRER=' is not the same as any computed signature. Server used following string to sign: 'PUT
The above error shows you missing something in connection string also check with signature.
You can get the connection string by

Save .xlsx file to Azure blob storage

I have a Django application and form which accepts from a user an Excel(.xlsx) and CSV (.csv) file. I need to save both files to Azure Blob Storage. I found it to be trivial to handle the .csv file but the same code fails when attempting up upload an xlsx file:
from azure.storage.blob import BlobServiceClient
# This code executes successfully when saving a CSV to blob storage
blob_service_client = BlobServiceClient.from_connection_string(os.getenv('STORAGE_CONN_STRING'))
blob_client = blob_service_client.get_blob_client(container="my-container-name", blob=form.cleaned_data.get('name_of_form_field_for_csv_file'))
blob_client.upload_blob(form.cleaned_data.get('name_of_form_field_for_csv_file''))
# This code fails when saving xlsx to blob storage
blob_client = blob_service_client.get_blob_client(container="my-container-name", blob=form.cleaned_data.get('name_of_form_field_for_xlsx_file'))
blob_client.upload_blob(form.cleaned_data.get('name_of_form_field_for_xlsx_file''))
ClientAuthenticationError at /mypage/create/
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
However, I've been unable to figure out how to save the .xlsx file. I--perhaps somewhat naively--assumed I could pass the .xlsx file as-is (like the .csv example above) but I get the error:
ClientAuthenticationError at /mypage/create/
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
I found this SO Answer about the above error, but there's no concensus at all on what the error means and I've been unable to progress much further from that link. However, there was some discussion about sending the data to Azure blob storage as a byte stream. Is this a possible way forward? I should note here that, ideally, I need to process the files in memory as my app is deployed within App Service (my understanding is that I don't have access to a file system in which to create and manipulate files.)
I have also learned that .xlsx files are compressed so do I need to first decompress the file and then send it as a byte stream? If so, has anyone got any experience with this who could point me in the right direction?
Storage account connection string:
STORAGE_CONN_STRING=DefaultEndpointsProtocol=https;AccountName=REDACTED;AccountKey=REDACTED;EndpointSuffix=core.windows.net
Did you try like below:
# Create a local directory to hold blob data
local_path = "./data"
os.mkdir(local_path)
# Create a file in the local data directory to upload and download
local_file_name = str(uuid.uuid4()) + ".xlsx"
upload_file_path = os.path.join(local_path, local_file_name)
# Write text to the file
file = open(upload_file_path, 'w')
file.write("Hello, World!")
file.close()
# Create a blob client using the local file name as the name for the blob
blob_client =
blob_service_client.get_blob_client(container=container_name,
blob=local_file_name)
# Upload the created file
with open(upload_file_path, "rb") as data:
blob_client.upload_blob(data)
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python
For reasons I don't fully understand (comments welcome for an explanation!), I can successfully save a .xlsx file to Azure Blob Storage with:
self.request.FILES['name_of_form_field_for_xlsx_file']
I suspect there's a difference in how csv vs. xlsx files are handled between request.FILES and form.cleaned_data.get() in Django, resulting in an authentication error as per the original question.
The full code to save a .csv and then a .xlsx is (note this is within a FormView):
from azure.storage.blob import BlobServiceClient
# Set connection string
blob_service_client = BlobServiceClient.from_connection_string(os.getenv('STORAGE_CONN_STRING'))
# Upload an xlsx file
blob_client = blob_service_client.get_blob_client(container="my-container", blob=self.request.FILES['xlsx_file'])
blob_client.upload_blob(self.request.FILES['xlsx_file'])
# Upload a CSV file
blob_client = blob_service_client.get_blob_client(container="my-container", blob=form.cleaned_data.get('csv_file'))
blob_client.upload_blob(form.cleaned_data.get('csv_file'))

Azure function apps : [Errno 30] Read-only file system

I'm developing an API using Azure Function Apps. The API works fine locally (using localhost). However, after publishing to Function App, I'm getting this error:
[Errno 30] Read-only file system
This error happens after I made the connection as a function to allow establishing new connection every time the API is requested. The data is taken from Azure Blob Storage container.
The code:
DBConnection.py:
import os, uuid
from azure.storage.blob import BlockBlobService, AppendBlobService
from datetime import datetime
import pandas as pd
import dask.dataframe as dd
import logging
def BlobConnection() :
try:
print("Connecting...")
#Establish connection
container_name = 'somecontainer'
blob_name = 'some_name.csv'
file_path = 'somepath'
account_name = 'XXXXXX'
account_key = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
blobService = BlockBlobService(account_name=account_name, account_key=account_key)
blobService.get_blob_to_path(container_name, blob_name, file_path)
df = dd.read_csv(file_path, dtype={'Bearing': 'int64', 'Speed': 'int64'})
df = df.compute()
return df
except Exception as ex:
print('Unable to connect!')
print('Exception:')
print(ex)
You are probably running in Package or Zip.
If so when you run your code the following line is trying to save the blob and can't. If you update that to use get_blob_to_bytes or get_blob_to_stream you would be fine.
blobService.get_blob_to_path(container_name, blob_name, file_path)
From [https://stackoverflow.com/questions/53630773/how-to-disable-read-only-mode-in-azure-function-app]
Part 1 - Disabling read-only mode
You'll likely find if you're using the latest tools that your function app is in run-from-package mode, which means it's reading the files directly from the uploaded ZIP and so there's no way to edit it. You can turn that off by deleting the WEBSITE_RUN_FROM_ZIP or WEBSITE_RUN_FROM_PACKAGE application setting in the portal. Note this will clear your function app until the next time you publish.
If your tools are a little older, or if you've deployed using the latest tools but with func azure functionapp publish my-app-name --nozip then you can use the App Service Editor in Platform Features in the portal to edit the function.json files and remove the "generatedBy" setting, which will stop them being read-only.

How to move millions of file to another file in the same container in Azure Blob Storage?

We have millions of record (both parquet and json files) in Azure Blob Storage as the structure of:
/RecordName/Year/Month/Day/Hour/ParquetOrJsonFiles.parquetOrjson
There are approx. 5 Million files in that structure and I want to reshape the folder path as:
/Year/Month/Day/Hour/RecordName/ParquetOrJsonFiles.parquetOrjson
I've created a basic script in DataBricks python notebook like this:
ps: Container was already mounted in my workspace.
import os
target_file = '/dbfs/containername/RecordName/Year/Month/Day/Hour/ParquetOrJsonFiles.parquetOrjson'
destination_file = '/dbfs/Year/Month/Day/Hour/RecordName/ParquetOrJsonFiles.parquetOrjson'
os.rename(target_file, destination_file)
However this script works very slowly. Is there any way to faster moving?
Actually, there is not any REST API of Azure Blob Storage to support renaming opertion for a blob, so the real operations for renaming a blob is first to copy it and then to delete it. The os.rename function operating in dbfs also is to do the copy and delete operation in order. That's the real reason to make your script slow.
The solution using REST APIs is first to do Copy Blob From URL for each blob in a container, and then to do Delete Blob for all original blobs within a Blob Batch.
Here is my sample code using the functions start_copy_from_url, delete_blobs of the latest Azure Storage SDK for Python (v12) which be installed via pip install azure-storage-blob.
from azure.storage.blob import BlobServiceClient
account_name = '<your account name>'
account_key = '<your account key>'
connection_string = f"AccountName={account_name};AccountKey={account_key};EndpointSuffix=core.windows.net;DefaultEndpointsProtocol=https;"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
container_name = '<your container name>'
container_client = blob_service_client.get_container_client(container_name)
blobs = list(container.list_blobs())
# Copy all blobs with a new name to the same container
for blob in blobs:
blob_name = blob.name
source_url = f"https://{account_name}.blob.core.windows.net/{container_name}/{blob_name}"
record_name, year, month, day, hour, name = blob_name.split('/')
new_blob_name = f'{year}/{month}/{day}/{hour}/{record_name}/name'
copied_blob = blob_service_client.get_blob_client(container_name, new_blob_name)
copied_blob.start_copy_from_url(source_url)
# Delete all original blobs
delete_blob_list = [b.name for b in blobs]
container_client.delete_blobs(*delete_blob_list)

Get content_type from Google Cloud file

I have two api endpoints, one that takes a file from an http request and uploads it to a google cloud bucket using the python api, and another that downloads it again. in the first view, i get the file content type from the http request and upload it to the bucket,setting that metadata:
from google.cloud import storage
file_obj = request.FILES['file']
client = storage.Client.from_service_account_json(path.join(
path.realpath(path.dirname(__file__)),
'..',
'settings',
'api-key.json'
))
bucket = client.get_bucket('storage-bucket')
blob = bucket.blob(filename)
blob.upload_from_string(
file_text,
content_type=file_obj.content_type
)
Then in another view, I download the file:
...
bucket = client.get_bucket('storage-bucket')
blob = bucket.blob(filename)
blob.download_to_filename(path)
How can I access the file metadata I set earlier (content_type) ? It's not available on the blob object anymore since a new one was instantiated, but it still holds the file.
You should try
blob = bucket.get_blob(blob_name)
blob.content_type

Categories

Resources