Showing all blobs in a (foreign) container is possible with the code below, so I know the provide SAS-url is valid
from azure.storage.blob import ContainerClient, BlobServiceClient
sas_url = r'[the sas_token]'
container = ContainerClient.from_container_url(sas_url)
blob_list = container.list_blobs()
for blob in blob_list:
print(blob.name)
How do I download the contents of the container to a local folder?
With our own containers I would connect with a BlobServiceClient using the provided connection-string, which I don't have for this container.
You are almost there. All you need to do is create BlobClient from ContainerClient and blob name using get_blob_client method. Once you have that, you will be able to download the blob using download_blob method.
Your code would be something like:
sas_url = r'[the sas_token]'
container = ContainerClient.from_container_url(sas_url)
blob_list = container.list_blobs()
for blob in blob_list:
print(blob.name)
blob = container.get_blob_client(blob.name)
blob.download_blob();
Please ensure that your SAS URL has Read permission otherwise download operation will fail.
If someone else tries to save csv's from a blob here is the code is used with Gaurav's help
sas_url = r'[SAS_URL]'
sas_token = r'[SAS_token]'
container = ContainerClient.from_container_url(sas_url)
blob_service_client = BlobServiceClient(account_url="[ACCOUNT NAME]", credential=sas_token)
blob_list = container.list_blobs()
for blob in blob_list:
name = blob.name
length = len(name)
nr = length - name.rfind('/') - 1
filename = name[-nr:]
if name[-4:] == '.csv':
try:
blob_client = blob_service_client.get_blob_client(account_url='[CONTAINER]', blob=name)
blob_data = blob_client.download_blob()
file = blob_data.readall()
file = pd.read_csv(BytesIO(file))
file.to_csv(filename)
except:
Exception
Related
I am trying to download files from azure blob storage and I want to download it without using any file handler or open or close method.
This is my approach I want an alternative approach without using "with open()"
`
def download_to_blob_storage(CONTAINERNAME ,remote_path,local_file_path):
client = BlobServiceClient(STORAGEACCOUNTURL, credential=default_credential)
blob_client = client.get_blob_client(container =CONTAINERNAME,blob = remote_path)
with open(local_file_path, "wb") as my_blob:
download_stream = blob_client.download_blob()
my_blob.write(download_stream.readall())
print('downloaded'+remote_path+'file')
download_to_blob_storage(CONTAINERNAME , '/results/wordcount.py',"./wordcount.py")
`
For this question, first make sure what do you want to do.
What kind of concept is the "download" you want to achieve? If you just want to get the file content, I can provide you with a method. But if you want to avoid python's I/O mechanism to implement creating and writing files, then this is not possible. The open method of file is the most basic method of python's native, if you want a file that can be stored on disk, you will inevitably call this method.
A python demo for you:
from azure.storage.blob import BlobClient, ContainerClient
account_url = "https://xxx.blob.core.windows.net"
key = "xxx"
container_name = "test"
blob_name = "test.txt"
# Create the ContainerClient object
container_client = ContainerClient(account_url, container_name, credential=key)
# Create the BlobClient object
blob_client = BlobClient(account_url, container_name, blob_name, credential=key)
#download the file without using open
file_data = blob_client.download_blob().readall()
print("file content is: " + str(file_data))
Result:
As the topic indicates...
I have try two ways and none of them work:
First:
I want to programmatically talk to GCS in Python. such as reading gs://{bucketname}/{blobname} as a path or a file. The only thing I can find is a gsutil module, however it seems used in a commend line instead of a python application.
i find a code here Accessing data in google cloud bucket, but still confused on how to retrieve it to a type i need. there is a jpg file in the bucket, and want to download it for a text detection, this will be deploy on google funtion.
Second:
download_as_bytes()method, Link to the blob document I import the googe.cloud.storage module and provide the GCP key, however the error rise saying the Blob has no attribute of download_as_bytes().
is there anything else i haven't try? Thank you!
for the reference:
def text_detected(user_id):
bucket=storage_client.bucket(
'img_platecapture')
blob=bucket.blob({user_id})
content= blob.download_as_bytes()
image = vision.Image(content=content) #insert a content
response = vision_client.text_detection(image=image)
if response.error.message:
raise Exception(
'{}\nFor more info on error messages, check: '
'https://cloud.google.com/apis/design/errors'.format(
response.error.message))
img = Image.open(input_file) #insert a path
draw = ImageDraw.Draw(img)
font = ImageFont.truetype("simsun.ttc", 18)
for text in response.text_annotations[1::]:
ocr = text.description
draw.text((bound.vertices[0].x-25, bound.vertices[0].y-25),ocr,fill=(255,0,0),font=font)
draw.polygon(
[
bound.vertices[0].x,
bound.vertices[0].y,
bound.vertices[1].x,
bound.vertices[1].y,
bound.vertices[2].x,
bound.vertices[2].y,
bound.vertices[3].x,
bound.vertices[3].y,
],
None,
'yellow',
)
texts=response.text_annotations
a=str(texts[0].description.split())
b=re.sub(u"([^\u4e00-\u9fa5\u0030-u0039])","",a)
b1="".join(b)
print("偵測到的地址為:",b1)
return b1
#handler.add(MessageEvent, message=ImageMessage)
def handle_content_message(event):
message_content = line_bot_api.get_message_content(event.message.id)
user = line_bot_api.get_profile(event.source.user_id)
data=b''
for chunk in message_content.iter_content():
data+= chunk
global bucket_name
bucket_name = 'img_platecapture'
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(f'{user.user_id}.jpg')
blob.upload_from_string(data)
text_detected1=text_detected(user.user_id) ####Here's the problem
line_bot_api.reply_message(
event.reply_token,
messages=TextSendMessage(
text=text_detected1
))
reference code(gcsfs/fsspec):
gcs = gcsfs.GCSFileSystem()
bucket=storage_client.bucket('img_platecapture')
blob=bucket.blob({user_id})
f =fsspec.open("gs://img_platecapture/{user_id}")
with f.open({user_id}, "rb") as fp:
content = fp.read()
image = vision.Image(content=content)
response = vision_client.text_detection(image=image)
You can do that with the Cloud Storage Python client :
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# The ID of your GCS bucket
# bucket_name = "your-bucket-name"
# The ID of your GCS object
# source_blob_name = "storage-object-name"
# The path to which the file should be downloaded
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
# Construct a client side representation of a blob.
# Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
# any content from Google Cloud Storage. As we don't need additional data,
# using `Bucket.blob` is preferred here.
blob = bucket.blob(source_blob_name)
# blob.download_to_filename(destination_file_name)
# blob.download_as_string()
blob.download_as_bytes()
print(
"Downloaded storage object {} from bucket {} to local file {}.".format(
source_blob_name, bucket_name, destination_file_name
)
)
You can use the following methods :
blob.download_to_filename(destination_file_name)
blob.download_as_string()
blob.download_as_bytes()
To be able to correctly use this library, you have to install the expected pip package in your virtual env.
Example of project structure :
my-project
requirements.txt
your_python_script.py
The requirements.txt file :
google-cloud-storage==2.6.0
Run the following command :
pip install -r requirements.txt
In your case maybe the package was not installed correctly in your virtual env, that's why you could not access to the download_as_bytes method.
I'd be using fsspec's GCS filesystem implementation instead.
https://github.com/fsspec/gcsfs/
>>> import gcsfs
>>> fs = gcsfs.GCSFileSystem(project='my-google-project')
>>> fs.ls('my-bucket')
['my-file.txt']
>>> with fs.open('my-bucket/my-file.txt', 'rb') as f:
... print(f.read())
b'Hello, world'
https://gcsfs.readthedocs.io/en/latest/#examples
Because I want to deploy this code as an azure container instance, I don't want to read a file from a local path on my computer. Basically, I am retrieving a file from a container in blob storage in Azure and then saving it in a different container. There would be some processing that changes the file from the blob storage and uploads the processed file into a different container. In my code, for simplicity I am uploading the file as it is without processing it and changing the file. So far I have managed to read a file from blob storage and make a local file out of it and upload the local file to a different container on blob storage but I don't want to create a local file. I want to process the file from blob storage and directly upload it to a different container without storing it in a local path. Can someone please help me figure it out. I have the following code:
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
import json
class FileProcessing:
def __init__(self):
self.file_access()
def file_access(self):
filename = "data_map.json"
container_name="filestorage"
constr = ""
blob_service_client = BlobServiceClient.from_connection_string(constr)
container_client = blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(filename)
streamdownloader = blob_client.download_blob()
fileReader = json.loads(streamdownloader.readall())
#Here it stores it in a local directory on my computer; I want it saved on Azure directly
#For simplicity I am not making any changes to the file yet
with open('json_data.json', 'w') as outfile:
json.dump(fileReader, outfile)
container_name2="filedeposit"
container_client = ContainerClient.from_connection_string(constr, container_name2)
print("Uploading files to blob storage")
blob_client= container_client.get_blob_client("json_data.json")
with open(r"C:\Users\python-test\json_data.json", "rb") as data:
blob_client.upload_blob(data)
print("file uploaded")
if __name__ == "__main__":
FileProcessing()
You don't really need to write the data to the file. You can simply convert the JSON data into a string and then upload it. Something like (untested code though):
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
import json
class FileProcessing:
def __init__(self):
self.file_access()
def file_access(self):
filename = "data_map.json"
container_name="filestorage"
constr = ""
blob_service_client = BlobServiceClient.from_connection_string(constr)
container_client = blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(filename)
streamdownloader = blob_client.download_blob()
fileReader = json.loads(streamdownloader.readall())
#Read the data into string
data = json.dumps(fileReader)
container_name2="filedeposit"
container_client = ContainerClient.from_connection_string(constr, container_name2)
print("Uploading files to blob storage")
blob_client= container_client.get_blob_client("json_data.json")
blob_client.upload_blob(data)
print("file uploaded")
if __name__ == "__main__":
FileProcessing()
I was following the instructions from this link. I did all the steps for uploading the file to the storage account starting from running this command
setx AZURE_STORAGE_CONNECTION_STRING "12334455"
Basically, I just copied the code from the Microsoft site for uploading files. But after doing all the requirements as given on the Microsoft site I am still facing some errors.
The code I wrote is
import os, uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
try:
print("Azure Blob Storage v" + __version__ + " - Python quickstart sample")
# local_path = "C:\Users\shang\Desktop\Trial"
# os.mkdir(local_path)
#
# # Create a file in the local data directory to upload and download
# local_file_name = str(uuid.uuid4()) + ".txt"
# upload_file_path = os.path.join(local_path, local_file_name)
#
# # Write text to the file
# file = open(upload_file_path, 'w')
# file.write("Hello, World!")
# file.close()
upload_file_path = r"C:\Users\shang\Desktop\Trial\Trial.txt"
local_file_name = "Trial.txt"
# Create a blob client using the local file name as the name for the blob
blob_client = BlobServiceClient.get_blob_client(container="testingnlearning", blob=local_file_name)
print("\nUploading to Azure Storage as blob:\n\t" + local_file_name)
# Upload the created file
with open(upload_file_path, "rb") as data:
blob_client.upload_blob(data)
# Quick start code goes here
except Exception as ex:
print('Exception:')
print(ex)
Now while running the code I am getting the error
TypeError Traceback (most recent call last)
<ipython-input-3-3a6b42061e89> in <module>
----> 1 blob_client = BlobServiceClient.get_blob_client(container="testingnlearning", blob="Trial.txt")
TypeError: get_blob_client() missing 1 required positional argument: 'self'
Now I don't know of what I am doing wrong. It will be really wonderful if you can tell me on how to upload text files to Azure Storage container.
Thanks in advance.
The reason you're getting the error is because you are not creating an instance of BlobServiceClient and using it as static here:
blob_client = BlobServiceClient.get_blob_client(container="testingnlearning", blob=local_file_name)
What you would want to do is create an instance of BlobServiceClient and then use that instance. Something like:
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
blob_client = blob_service_client.get_blob_client(container="testingnlearning", blob=local_file_name)
I'm dealing with a transformation from .xlsx file to .csv. I tested locally a python script that downloads .xlsx files from a container in blob storage, manipulate data, save results as .csv file (using pandas) and upload it on a new container. Now I should bring the python script to ADF to build a pipeline to automate the task. I'm dealing with two kind of problems:
First problem: I can't figure out how to complete the task without downloading the file on my local machine.
I found these threads/tutorials but the "azure" v5.0.0 meta-package is deprecated
read excel files from "input" blob storage container and export to csv in "output" container with python
Tutorial: Run Python scripts through Azure Data Factory using Azure Batch
Sofar my code is:
import os
import sys
import pandas as pd
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, PublicAccess
# Create the BlobServiceClient that is used to call the Blob service for the storage account
conn_str = 'XXXX;EndpointSuffix=core.windows.net'
blob_service_client = BlobServiceClient.from_connection_string(conn_str=conn_str)
container_name = "input"
blob_name = "prova/excel/AAA_prova1.xlsx"
container = ContainerClient.from_connection_string(conn_str=conn_str, container_name=container_name)
downloaded_blob = container.download_blob(blob_name)
df = pd.read_excel(downloaded_blob.content_as_bytes(), skiprows = 4)
data = df.to_csv (r'C:\mypath/AAA_prova2.csv' ,encoding='utf-8-sig', index=False)
full_path_to_file = r'C:\mypath/AAA_prova2.csv'
local_file_name = 'prova\csv\AAA_prova2.csv'
#upload in blob
blob_client = blob_service_client.get_blob_client(
container=container_name, blob=local_file_name)
with open(full_path_to_file, "rb") as data:
blob_client.upload_blob(data)
Second problem: with this method I can deal only with the specific name of the blob, but in the future I'll have to parametrize the script (i.e. select only blob names starting with AAA_). I can't understand if I have to deal with this in the python script or if I can manage to filter the file through ADF (i.e. adding a Filter File task before running the python script). I can't find any tutorial/code snippet and any help or hint or documentation would be very appreciated.
EDIT
I modified the code to avoid to download to local machine, now it works (problem #1 solved)
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
from io import BytesIO
import pandas as pd
filename = "excel/prova.xlsx"
container_name="input"
blob_service_client = BlobServiceClient.from_connection_string("XXXX==;EndpointSuffix=core.windows.net")
container_client=blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(filename)
streamdownloader=blob_client.download_blob()
stream = BytesIO()
streamdownloader.download_to_stream(stream)
df = pd.read_excel(stream, skiprows = 5)
local_file_name_out = "csv/prova.csv"
container_name_out = "input"
blob_client = blob_service_client.get_blob_client(
container=container_name_out, blob=local_file_name_out)
blob_client.upload_blob(df.to_csv(path_or_buf = None , encoding='utf-8-sig', index=False))
Azure Functions, Python 3.8 Version of an Azure function. Waits for a blob trigger from Excel. Then does some stuff and used a good chunk of your code for final completion.
Note the split to trim off the .xlsx of the file name.
This is what I ended up with:
source_blob = (f"https://{account_name}.blob.core.windows.net/{uploadedxlsx.name}")
file_name = uploadedxlsx.name.split("/")[2]
container_name = "container"
container_client=blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(f"Received/{file_name}")
streamdownloader=blob_client.download_blob()
stream = BytesIO()
streamdownloader.download_to_stream(stream)
df = pd.read_excel(stream)
file_name_t = file_name.split(".")[0]
local_file_name_out = f"Converted/{file_name_t}.csv"
container_name_out = "out_container"
blob_client = blob_service_client.get_blob_client(
container=container_name_out, blob=local_file_name_out)
blob_client.upload_blob(df.to_csv(path_or_buf = None , encoding='utf-8-sig', index=False))