I was following the instructions from this link. I did all the steps for uploading the file to the storage account starting from running this command
setx AZURE_STORAGE_CONNECTION_STRING "12334455"
Basically, I just copied the code from the Microsoft site for uploading files. But after doing all the requirements as given on the Microsoft site I am still facing some errors.
The code I wrote is
import os, uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
try:
print("Azure Blob Storage v" + __version__ + " - Python quickstart sample")
# local_path = "C:\Users\shang\Desktop\Trial"
# os.mkdir(local_path)
#
# # Create a file in the local data directory to upload and download
# local_file_name = str(uuid.uuid4()) + ".txt"
# upload_file_path = os.path.join(local_path, local_file_name)
#
# # Write text to the file
# file = open(upload_file_path, 'w')
# file.write("Hello, World!")
# file.close()
upload_file_path = r"C:\Users\shang\Desktop\Trial\Trial.txt"
local_file_name = "Trial.txt"
# Create a blob client using the local file name as the name for the blob
blob_client = BlobServiceClient.get_blob_client(container="testingnlearning", blob=local_file_name)
print("\nUploading to Azure Storage as blob:\n\t" + local_file_name)
# Upload the created file
with open(upload_file_path, "rb") as data:
blob_client.upload_blob(data)
# Quick start code goes here
except Exception as ex:
print('Exception:')
print(ex)
Now while running the code I am getting the error
TypeError Traceback (most recent call last)
<ipython-input-3-3a6b42061e89> in <module>
----> 1 blob_client = BlobServiceClient.get_blob_client(container="testingnlearning", blob="Trial.txt")
TypeError: get_blob_client() missing 1 required positional argument: 'self'
Now I don't know of what I am doing wrong. It will be really wonderful if you can tell me on how to upload text files to Azure Storage container.
Thanks in advance.
The reason you're getting the error is because you are not creating an instance of BlobServiceClient and using it as static here:
blob_client = BlobServiceClient.get_blob_client(container="testingnlearning", blob=local_file_name)
What you would want to do is create an instance of BlobServiceClient and then use that instance. Something like:
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
blob_client = blob_service_client.get_blob_client(container="testingnlearning", blob=local_file_name)
Related
I am trying to download files from azure blob storage and I want to download it without using any file handler or open or close method.
This is my approach I want an alternative approach without using "with open()"
`
def download_to_blob_storage(CONTAINERNAME ,remote_path,local_file_path):
client = BlobServiceClient(STORAGEACCOUNTURL, credential=default_credential)
blob_client = client.get_blob_client(container =CONTAINERNAME,blob = remote_path)
with open(local_file_path, "wb") as my_blob:
download_stream = blob_client.download_blob()
my_blob.write(download_stream.readall())
print('downloaded'+remote_path+'file')
download_to_blob_storage(CONTAINERNAME , '/results/wordcount.py',"./wordcount.py")
`
For this question, first make sure what do you want to do.
What kind of concept is the "download" you want to achieve? If you just want to get the file content, I can provide you with a method. But if you want to avoid python's I/O mechanism to implement creating and writing files, then this is not possible. The open method of file is the most basic method of python's native, if you want a file that can be stored on disk, you will inevitably call this method.
A python demo for you:
from azure.storage.blob import BlobClient, ContainerClient
account_url = "https://xxx.blob.core.windows.net"
key = "xxx"
container_name = "test"
blob_name = "test.txt"
# Create the ContainerClient object
container_client = ContainerClient(account_url, container_name, credential=key)
# Create the BlobClient object
blob_client = BlobClient(account_url, container_name, blob_name, credential=key)
#download the file without using open
file_data = blob_client.download_blob().readall()
print("file content is: " + str(file_data))
Result:
Showing all blobs in a (foreign) container is possible with the code below, so I know the provide SAS-url is valid
from azure.storage.blob import ContainerClient, BlobServiceClient
sas_url = r'[the sas_token]'
container = ContainerClient.from_container_url(sas_url)
blob_list = container.list_blobs()
for blob in blob_list:
print(blob.name)
How do I download the contents of the container to a local folder?
With our own containers I would connect with a BlobServiceClient using the provided connection-string, which I don't have for this container.
You are almost there. All you need to do is create BlobClient from ContainerClient and blob name using get_blob_client method. Once you have that, you will be able to download the blob using download_blob method.
Your code would be something like:
sas_url = r'[the sas_token]'
container = ContainerClient.from_container_url(sas_url)
blob_list = container.list_blobs()
for blob in blob_list:
print(blob.name)
blob = container.get_blob_client(blob.name)
blob.download_blob();
Please ensure that your SAS URL has Read permission otherwise download operation will fail.
If someone else tries to save csv's from a blob here is the code is used with Gaurav's help
sas_url = r'[SAS_URL]'
sas_token = r'[SAS_token]'
container = ContainerClient.from_container_url(sas_url)
blob_service_client = BlobServiceClient(account_url="[ACCOUNT NAME]", credential=sas_token)
blob_list = container.list_blobs()
for blob in blob_list:
name = blob.name
length = len(name)
nr = length - name.rfind('/') - 1
filename = name[-nr:]
if name[-4:] == '.csv':
try:
blob_client = blob_service_client.get_blob_client(account_url='[CONTAINER]', blob=name)
blob_data = blob_client.download_blob()
file = blob_data.readall()
file = pd.read_csv(BytesIO(file))
file.to_csv(filename)
except:
Exception
Because I want to deploy this code as an azure container instance, I don't want to read a file from a local path on my computer. Basically, I am retrieving a file from a container in blob storage in Azure and then saving it in a different container. There would be some processing that changes the file from the blob storage and uploads the processed file into a different container. In my code, for simplicity I am uploading the file as it is without processing it and changing the file. So far I have managed to read a file from blob storage and make a local file out of it and upload the local file to a different container on blob storage but I don't want to create a local file. I want to process the file from blob storage and directly upload it to a different container without storing it in a local path. Can someone please help me figure it out. I have the following code:
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
import json
class FileProcessing:
def __init__(self):
self.file_access()
def file_access(self):
filename = "data_map.json"
container_name="filestorage"
constr = ""
blob_service_client = BlobServiceClient.from_connection_string(constr)
container_client = blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(filename)
streamdownloader = blob_client.download_blob()
fileReader = json.loads(streamdownloader.readall())
#Here it stores it in a local directory on my computer; I want it saved on Azure directly
#For simplicity I am not making any changes to the file yet
with open('json_data.json', 'w') as outfile:
json.dump(fileReader, outfile)
container_name2="filedeposit"
container_client = ContainerClient.from_connection_string(constr, container_name2)
print("Uploading files to blob storage")
blob_client= container_client.get_blob_client("json_data.json")
with open(r"C:\Users\python-test\json_data.json", "rb") as data:
blob_client.upload_blob(data)
print("file uploaded")
if __name__ == "__main__":
FileProcessing()
You don't really need to write the data to the file. You can simply convert the JSON data into a string and then upload it. Something like (untested code though):
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
import json
class FileProcessing:
def __init__(self):
self.file_access()
def file_access(self):
filename = "data_map.json"
container_name="filestorage"
constr = ""
blob_service_client = BlobServiceClient.from_connection_string(constr)
container_client = blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(filename)
streamdownloader = blob_client.download_blob()
fileReader = json.loads(streamdownloader.readall())
#Read the data into string
data = json.dumps(fileReader)
container_name2="filedeposit"
container_client = ContainerClient.from_connection_string(constr, container_name2)
print("Uploading files to blob storage")
blob_client= container_client.get_blob_client("json_data.json")
blob_client.upload_blob(data)
print("file uploaded")
if __name__ == "__main__":
FileProcessing()
I am new to Azure. I created a Blob Storage today. I am trying to run some Python code that I found online, to download files fro this Blob Container. Here is the code that I am testing.
# download_blobs.py
# Python program to bulk download blob files from azure storage
# Uses latest python SDK() for Azure blob storage
# Requires python 3.6 or above
import os
from azure.storage.blob import BlobServiceClient, BlobClient
from azure.storage.blob import ContentSettings, ContainerClient
# IMPORTANT: Replace connection string with your storage account connection string
# Usually starts with DefaultEndpointsProtocol=https;...
MY_CONNECTION_STRING = "DefaultEndpointsProtocol=https;AccountName=ryanpythonstorage;AccountKey=my_account_key;EndpointSuffix=core.windows.net"
# Replace with blob container
MY_BLOB_CONTAINER = "ryanpythonstorage" #copied from Access Keys
# Replace with the local folder where you want files to be downloaded
LOCAL_BLOB_PATH = "C:\\Users\\ryans\\Desktop\\blob\\"
class AzureBlobFileDownloader:
def __init__(self):
print("Intializing AzureBlobFileDownloader")
# Initialize the connection to Azure storage account
self.blob_service_client = BlobServiceClient.from_connection_string(MY_CONNECTION_STRING)
self.my_container = self.blob_service_client.get_container_client(MY_BLOB_CONTAINER)
def save_blob(self,file_name,file_content):
# Get full path to the file
download_file_path = os.path.join(LOCAL_BLOB_PATH, file_name)
# for nested blobs, create local path as well!
os.makedirs(os.path.dirname(download_file_path), exist_ok=True)
with open(download_file_path, "wb") as file:
file.write(file_content)
def download_all_blobs_in_container(self):
my_blobs = self.my_container.list_blobs()
for blob in my_blobs:
print(blob.name)
bytes = self.my_container.get_blob_client(blob).download_blob().readall()
self.save_blob(blob.name, bytes)
# Initialize class and upload files
azure_blob_file_downloader = AzureBlobFileDownloader()
azure_blob_file_downloader.download_all_blobs_in_container()
Here is the result that I see when I run the code.
Intializing AzureBlobFileDownloader
Traceback (most recent call last):
File "C:\Users\ryans\AppData\Local\Temp\ipykernel_14020\3010845424.py", line 37, in <module>
azure_blob_file_downloader.download_all_blobs_in_container()
File "C:\Users\ryans\AppData\Local\Temp\ipykernel_14020\3010845424.py", line 30, in download_all_blobs_in_container
for blob in my_blobs:
File "C:\Users\ryans\anaconda3\lib\site-packages\azure\core\paging.py", line 129, in __next__
return next(self._page_iterator)
File "C:\Users\ryans\anaconda3\lib\site-packages\azure\core\paging.py", line 76, in __next__
self._response = self._get_next(self.continuation_token)
File "C:\Users\ryans\anaconda3\lib\site-packages\azure\storage\blob\_list_blobs_helper.py", line 79, in _get_next_cb
process_storage_error(error)
File "C:\Users\ryans\anaconda3\lib\site-packages\azure\storage\blob\_shared\response_handlers.py", line 177, in process_storage_error
exec("raise error from None") # pylint: disable=exec-used # nosec
File "<string>", line 1, in <module>
File "C:\Users\ryans\anaconda3\lib\site-packages\azure\storage\blob\_list_blobs_helper.py", line 72, in _get_next_cb
return self._command(
File "C:\Users\ryans\anaconda3\lib\site-packages\azure\storage\blob\_generated\operations\_container_operations.py", line 1479, in list_blob_flat_segment
map_error(status_code=response.status_code, response=response, error_map=error_map)
File "C:\Users\ryans\anaconda3\lib\site-packages\azure\core\exceptions.py", line 105, in map_error
raise error
ResourceNotFoundError: The specified container does not exist.
RequestId:4d2df1c1-a01e-008b-4209-21e493000000
Time:2022-02-13T18:44:14.1140999Z
ErrorCode:ContainerNotFound
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>ContainerNotFound</Code><Message>The specified container does not exist.
RequestId:4d2df1c1-a01e-008b-4209-21e493000000
Time:2022-02-13T18:44:14.1140999Z</Message></Error>
Ultimately, I want to be able to do two things:
download files from my Blob to my local machine
upload files from my local machine to my Blob.
The error message is that it cannot find the container. It could be:
A problem with rights (connection string)
A problem with the container name
In MY_CONNECTION_STRING and MY_BLOB_CONTAINER the name of your account and the name of your container is the same. If one of these is incorrect it would explain the error.
Here is my final working code.
import os
import uuid
import sys
from azure.storage.blob import BlockBlobService, PublicAccess
try:
# Create the BlockBlobService that is used to call the Blob service for the storage account
blob_service_client = BlockBlobService(account_name='your_account_name_goes_here', account_key='your_account_key_goes_here')
# Create a container called 'quickstartblobs'.
container_name = 'your_container_name_goes_here'
blob_service_client.create_container(container_name)
# Set the permission so the blobs are public.
blob_service_client.set_container_acl(container_name, public_access=PublicAccess.Container)
# Create Sample folder if it not exists, and create a file in folder Sample to test the upload and download.
local_path = os.path.expanduser('C:\\Users\\ryans\\Desktop\\blob\\')
local_file_name = 'charges.csv'
#local_file_name = "QuickStart_" + str(uuid.uuid4()) + ".txt"
full_path_to_file = os.path.join(local_path, local_file_name)
print("Temp file = " + full_path_to_file)
print("\nUploading to Blob storage as blob" + local_file_name)
# Upload the created file, use local_file_name for the blob name
blob_service_client.create_blob_from_path(container_name, local_file_name, full_path_to_file)
# List the blobs in the container
print("\nList blobs in the container")
generator = blob_service_client.list_blobs(container_name)
for blob in generator:
print("\t Blob name: " + blob.name)
except Exception as e:
print(e)
##############################################################################################
import os
import uuid
import sys
from azure.storage.blob import BlockBlobService, PublicAccess
try:
local_path = os.path.expanduser('C:\\Users\\ryans\\Desktop\\blob\\')
container_name = 'your_container_name_goes_here'
local_file_name = 'charges.csv'
# Download the blob(s).
# Add '_DOWNLOADED' as prefix to '.txt' so you can see both files in Documents.
full_path_to_file = os.path.join(local_path, local_file_name)
print("\nDownloading blob to " + full_path_to_file)
blob_service_client.get_blob_to_path(container_name, local_file_name, full_path_to_file)
#sys.stdout.write("Sample finished running. When you hit <any key>, the sample will be deleted and the sample application will exit.")
sys.stdout.flush()
input()
except Exception as e:
print(e)
This link was a great resource for me.
https://github.com/Azure-Samples/azure-sdk-for-python-storage-blob-upload-download/blob/master/example.py
I'm dealing with a transformation from .xlsx file to .csv. I tested locally a python script that downloads .xlsx files from a container in blob storage, manipulate data, save results as .csv file (using pandas) and upload it on a new container. Now I should bring the python script to ADF to build a pipeline to automate the task. I'm dealing with two kind of problems:
First problem: I can't figure out how to complete the task without downloading the file on my local machine.
I found these threads/tutorials but the "azure" v5.0.0 meta-package is deprecated
read excel files from "input" blob storage container and export to csv in "output" container with python
Tutorial: Run Python scripts through Azure Data Factory using Azure Batch
Sofar my code is:
import os
import sys
import pandas as pd
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, PublicAccess
# Create the BlobServiceClient that is used to call the Blob service for the storage account
conn_str = 'XXXX;EndpointSuffix=core.windows.net'
blob_service_client = BlobServiceClient.from_connection_string(conn_str=conn_str)
container_name = "input"
blob_name = "prova/excel/AAA_prova1.xlsx"
container = ContainerClient.from_connection_string(conn_str=conn_str, container_name=container_name)
downloaded_blob = container.download_blob(blob_name)
df = pd.read_excel(downloaded_blob.content_as_bytes(), skiprows = 4)
data = df.to_csv (r'C:\mypath/AAA_prova2.csv' ,encoding='utf-8-sig', index=False)
full_path_to_file = r'C:\mypath/AAA_prova2.csv'
local_file_name = 'prova\csv\AAA_prova2.csv'
#upload in blob
blob_client = blob_service_client.get_blob_client(
container=container_name, blob=local_file_name)
with open(full_path_to_file, "rb") as data:
blob_client.upload_blob(data)
Second problem: with this method I can deal only with the specific name of the blob, but in the future I'll have to parametrize the script (i.e. select only blob names starting with AAA_). I can't understand if I have to deal with this in the python script or if I can manage to filter the file through ADF (i.e. adding a Filter File task before running the python script). I can't find any tutorial/code snippet and any help or hint or documentation would be very appreciated.
EDIT
I modified the code to avoid to download to local machine, now it works (problem #1 solved)
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
from io import BytesIO
import pandas as pd
filename = "excel/prova.xlsx"
container_name="input"
blob_service_client = BlobServiceClient.from_connection_string("XXXX==;EndpointSuffix=core.windows.net")
container_client=blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(filename)
streamdownloader=blob_client.download_blob()
stream = BytesIO()
streamdownloader.download_to_stream(stream)
df = pd.read_excel(stream, skiprows = 5)
local_file_name_out = "csv/prova.csv"
container_name_out = "input"
blob_client = blob_service_client.get_blob_client(
container=container_name_out, blob=local_file_name_out)
blob_client.upload_blob(df.to_csv(path_or_buf = None , encoding='utf-8-sig', index=False))
Azure Functions, Python 3.8 Version of an Azure function. Waits for a blob trigger from Excel. Then does some stuff and used a good chunk of your code for final completion.
Note the split to trim off the .xlsx of the file name.
This is what I ended up with:
source_blob = (f"https://{account_name}.blob.core.windows.net/{uploadedxlsx.name}")
file_name = uploadedxlsx.name.split("/")[2]
container_name = "container"
container_client=blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(f"Received/{file_name}")
streamdownloader=blob_client.download_blob()
stream = BytesIO()
streamdownloader.download_to_stream(stream)
df = pd.read_excel(stream)
file_name_t = file_name.split(".")[0]
local_file_name_out = f"Converted/{file_name_t}.csv"
container_name_out = "out_container"
blob_client = blob_service_client.get_blob_client(
container=container_name_out, blob=local_file_name_out)
blob_client.upload_blob(df.to_csv(path_or_buf = None , encoding='utf-8-sig', index=False))