How do i delete a folder(blob) inside an azure container using delete_blob method of blockblobservice? - python

delete_blob() seems to delete only the files inside the container and from folders and subfolders inside the container. But i'm seeing below error in python while trying to delete a folder from container.
Client-Request-ID=7950669c-2c4a-11e8-88e7-00155dbf7128 Retry policy did not allow for a retry: Server-Timestamp=Tue, 20 Mar 2018 14:25:00 GMT, Server-Request-ID=54d1a5d6-b01e-007b-5e57-c08528000000, HTTP status code=404, Exception=The specified blob does not exist.ErrorCode: BlobNotFoundBlobNotFoundThe specified blob does not exist.RequestId:54d1a5d6-b01e-007b-5e57-c08528000000Time:2018-03-20T14:25:01.2130063Z.
azure.common.AzureMissingResourceHttpError: The specified blob does not exist.ErrorCode: BlobNotFound
BlobNotFoundThe specified blob does not exist.
RequestId:54d1a5d6-b01e-007b-5e57-c08528000000
Time:2018-03-20T14:25:01.2130063Z
Could anyone please help here?

In Azure Blob Storage, as such a folder doesn't exist. It is just a prefix for a blob's name. For example, if you see a folder named images and it contains a blob called myfile.png, then essentially the blob's name is images/myfile.png. Because the folders don't really exist (they are virtual), you can't delete the folder directly.
What you need to do is delete all blobs individually in that folder (or in other words delete the blobs whose name begins with that virtual folder name/path. Once you have deleted all the blobs, then that folder automatically goes away.
In order to accomplish this, first you would need to fetch all blobs whose name starts with the virtual folder path. For that you will use list_blobs method and specify the virtual folder path in prefix parameter. This will give you a list of blobs starting with that prefix. Once you have that list, you will delete the blobs one by one.

There are two things to understand from the process, you could delete specific files,folders,images...(blobs) using delete_blob , But if you want to delete containers, you have to use the delete_container which will delete all blobs within, here's a sample that i created which deletes blobs inside a path/virtual folder:
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='yraccountname', account_key='accountkey')
print("Retreiving blobs in specified container...")
blob_list=[]
container="containername"
def list_blobs(container):
try:
global blob_list
content = block_blob_service.list_blobs(container)
print("******Blobs currently in the container:**********")
for blob in content:
blob_list.append(blob.name)
print(blob.name)
except:
print("The specified container does not exist, Please check the container name or if it exists.")
list_blobs(container)
print("The list() is:")
print(blob_list)
print("Delete this blob: ",blob_list[1])
#DELETE A SPECIFIC BLOB FROM THE CONTAINER
block_blob_service.delete_blob(container,blob_list[1],snapshot=None)
list_blobs(container)
Please refer to the code in my repo with explanation in Readme section, as well as new storage scripts:https://github.com/adamsmith0016/Azure-storage

For others searching for the solution in python. This worked for me.
First make a variable that stores all the files in the folder that you want to remove.
Then for every file in that folder, remove the file by stating the name of the container, and then the actual foldername.name .
By removing all the files in a folder, the folders is deleted in azure.
def delete_folder(self, containername, foldername):
folders = [blob for blob in blob_service.block_blob_service.list_blobs(containername) if blob.name.startswith(foldername)]
if len(folders) > 0:
for folder in folders:
blob_service.block_blob_service.delete_blob(containername, foldername.name)
print("deleted folder",folder name)

Use list_blobs(name_starts_with=folder_name) and delete_blob()
Complete code:
blob_service_client = BlobServiceClient.from_connection_string(conn_str=CONN_STR)
blob_client = blob_service_client.get_container_client(AZURE_BLOBSTORE_CONTAINER)
for blob in blob_client.list_blobs(name_starts_with=FOLDER_NAME):
blob_client.delete_blob(blob.name)

You cannot delete a non-empty folder in Azure blobs, but you can achieve it if you delete the files inside the sub-folders first. The below work around will start deleting it from the files to the parent folder.
from azure.storage.blob import BlockBlobService
blob_client = BlockBlobService(account_name='', account_key='')
containername = 'XXX'
foldername = 'XXX'
def delete_folder(containername, foldername):
folders = [blob.name for blob in blob_client.list_blobs(containername, prefix=foldername)]
folders.sort(reverse=True, key=len)
if len(folders) > 0:
for folder in folders:
blob_client.delete_blob(containername, folder)
print("deleted folder",folder)

Related

Moving files from one bucket to another in GCS

I have written a code to move files from one bucket to another in GCS using python. The bucket has multiple subfolders and I am trying to move the Day folder only to a different bucket. Source Path: /Bucketname/projectname/XXX/Day Target Path: /Bucketname/Archive/Day
Is there a way to directly move/copy the Day folder without moving each file inside it one by one. Im trying to optimize my code which is taking long time if there are multiple Day folders. Sample code below.
from google.cloud import storage
from google.cloud import bigquery
import glob
import pandas as pd
def Archive_JSON(bucket_name, new_bucket_name, source_prefix_arch, staging_prefix_arch, **kwargs):
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
today_execution_date = kwargs['ds_nodash']
source_prefix_new = source_prefix_arch + today_execution_date + '/'
blobs = bucket.list_blobs(prefix=source_prefix_new)
destination_bucket = storage_client.get_bucket(new_bucket_name)
for blob in blobs:
destination_bucket.rename_blob(blob, new_name=blob.name.replace(source_prefix_arch, staging_prefix_arch))
You can't move all the file of a folder to another bucket because folders don't exist in Cloud Storage. All the object are put at the bucket level and the object name is the full path of the object.
By convention, and for (poor) human readability, slash / are folder separator, but it's a fake!
So, you haven't other option than moving all the files with the same prefix (the "folder path"), and iterating on all of them.

How to find the sub folder id in Google Drive using pydrive in Python?

The directory stricture on Google Drive is as follows:
Inside mydrive/BTP/BTP-4
I need to get the folder ID for BTP-4 so that I can transfer a specific file from the folder. How do I do it?
fileList = GoogleDrive(self.driveConn).ListFile({'q': "'root' in parents and trashed=false"}).GetList()
for file in fileList:
if (file['title'] == "BTP-4"):
fileID = file['id']
print(remoteFile, fileID)
return fileID
Will be able to give path like /MyDrive/BTP/BTP-4 and filename as "test.csv" and then directly download the file?
Answer:
Unfortunately, this is not possible.
More Information:
Google Drive supports creating multiple files or folders with the same name in the same location:
As a result of this, in some cases, providing a file path isn't enough to identify a file or folder uniquiely - in this case mydrive/Parent folder/Child folder/Child doc points to two different files, and mydrive/Parent folder/Child folder/Child folder points to five different folders.
You have to either directly search for the folder with its ID, or to get a folder/file's ID you have to search for children recursively through the folders like you are already doing.

How to recursively upload folder to Azure blob storage with Python

I can upload single file to Azure blob storage with Python. But for a folder with multiple folders containing data, is there a way I can try to upload the whole folder with same directory to Azure?
Say I have
FOLDERA
------SUBFOLDERa
----------filea.txt
----------fileb.txt
------SUBFOLDERb
------SUBFOLDERc
I want to put this FOLDERA as above structure to Azure.
Any hints?
#Krumelur is almost right, but here I want to give a working code example, as well as explain some folders are not be able to upload to azure blob storage.
1.Code example:
from azure.storage.blob import BlockBlobService,PublicAccess
import os
def run_sample():
account_name = "your_account_name"
account_key ="your_account_key"
block_blob_service = BlockBlobService(account_name, account_key)
container_name ='test1'
path_remove = "F:\\"
local_path = "F:\\folderA"
for r,d,f in os.walk(local_path):
if f:
for file in f:
file_path_on_azure = os.path.join(r,file).replace(path_remove,"")
file_path_on_local = os.path.join(r,file)
block_blob_service.create_blob_from_path(container_name,file_path_on_azure,file_path_on_local)
# Main method.
if __name__ == '__main__':
run_sample()
2.You should remember that any empty folder can not be created / uploaded to azure blob storage, since there is no real "folder" in azure blob storage. The folder or directory is just a part of the blob name. So without a real blob file like test.txt inside a folder, there is no way to create/upload an empty folder. So in your folder structure, the empty folder SUBFOLDERb and SUBFOLDERc are not be able to upload to azure blob storage.
The test result is as below, all the non-empty folders are uploaded to blob storage in azure:
There is nothing built in, but you can easily write that functionality in your code (see os.walk).
Another option is to use the subprocess module to call into the azcopy command line tool.

How to get a List of Files in IBM COS Bucket using Watson Studio

I have a working Python script for consolidating multiple xlsx files that I want to move to a Watson Studio project. My current code uses a path variable which is passed to glob...
path = '/Users/Me/My_Path/*.xlsx'
files = glob.glob(path)
Since credentials in Watson Studio are specific to individual files, how do I get a list of all files in my IBM COS storage bucket? I'm also wondering how to create folders to separate the files in my storage bucket?
Watson Studio cloud provides a helper library, named project-lib for working with objects in your Cloud Object Storage instance. Take a look at this documentation for using the package in Python: https://dataplatform.cloud.ibm.com/docs/content/analyze-data/project-lib-python.html
For your specific question, get_files() should do what you need. This will return a list of all the files in your bucket, then you can do pattern matching to only keep what you need. Based on this filtered list you can then iterate and use get_file(file_name) for each file_name in your list.
To create a "folder" in your bucket, you need to follow a naming convention for files to create a "pseudo folder". For example, if you want to create a "data" folder of assets, you should prefix file names for objects belonging to this folder with data/.
The credentials in IBM Cloud Object Storage (COS) is at COS instance level, not at individual file level. Each COS instance can have any number of buckets with each bucket containing files.
You can get the credentials for the COS instance from Bluemix console.
https://console.bluemix.net/docs/services/cloud-object-storage/iam/service-credentials.html#service-credentials
You can use boto3 python package to access the files.
https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
import boto3
s3c = boto3.client('s3', endpoint_url='XXXXXXXXX',aws_access_key_id='XXXXXXXXXXX',aws_secret_access_key='XXXXXXXXXX')
s3.list_objects(Bucket=bucket_name, Prefix=file_path)
s3c.download_file(Filename=filename, Bucket=bucket, Key=objectname)
s3c.upload_file(Filename=filename, Bucket=bucket, Key=objectname)
There's probably a more pythonic way to write this but here is the code I wrote using project-lib per the answer provided by #Greg Filla
files = [] # List to hold data file names
# Get list of all file names in storage bucket
all_files = project.get_files() # returns list of dictionaries
# Create list of file names to load based on prefix
for f in all_files:
if f['name'][:3] == DataFile_Prefix:
files.append(f['name'])
print ("There are " + str(len(files)) + " data files in the storage bucket.")

How to check give directory or folder exist in given s3 bucket and if exist how to delete the folder from s3?

I want to check whether folder or directory exist in give s3 bucket, if exist i want delete folder from s3 bucket using python code.
example for : s3:/bucket124/test
Here "bucket124" is bucket and "test" is folder contains some files like test.txt test1.txt
I want to delete folder "test" from my s3 bucket.
Here is how you will do that,
import boto3
s3 = boto3.resource('s3')
bucket=s3.Bucket('mausamrest');
obj = s3.Object('mausamrest','test/hello')
counter=0
for key in bucket.objects.filter(Prefix='test/hello/'):
counter=counter+1
if(counter!=0):
obj.delete()
print(counter)
mausamrest is the bucket and test/hello/ is the directory you want to check for items , but take care of one thing that after checking you have to delete test/hello instead of test/hello/ to delete a particular sub folder and hence the keyname in 5th line is test/hello

Categories

Resources