How to check give directory or folder exist in given s3 bucket and if exist how to delete the folder from s3? - python

I want to check whether folder or directory exist in give s3 bucket, if exist i want delete folder from s3 bucket using python code.
example for : s3:/bucket124/test
Here "bucket124" is bucket and "test" is folder contains some files like test.txt test1.txt
I want to delete folder "test" from my s3 bucket.

Here is how you will do that,
import boto3
s3 = boto3.resource('s3')
bucket=s3.Bucket('mausamrest');
obj = s3.Object('mausamrest','test/hello')
counter=0
for key in bucket.objects.filter(Prefix='test/hello/'):
counter=counter+1
if(counter!=0):
obj.delete()
print(counter)
mausamrest is the bucket and test/hello/ is the directory you want to check for items , but take care of one thing that after checking you have to delete test/hello instead of test/hello/ to delete a particular sub folder and hence the keyname in 5th line is test/hello

Related

Read content of a file located under subfolders of S3 in Python

I'm trying to read a file content's (not to download it) from an S3 bucket. The problem is that the file is located under a multi-level folder. For instance, the full path could be s3://s3-bucket/folder-1/folder-2/my_file.json. How can I get that specific file instead of using my iterative approach that lists all objects?
Here is the code that I want to change:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('s3-bucket')
for obj in my_bucket.objects.all():
key = obj.key
if key == 'folder-1/folder-2/my_file.json':
return obj.get()['Body'].read()
Can it be done in a simpler, more direct way?
Yes - there is no need to enumerate the bucket.
Read the file directly using s3.Object, providing the bucket name as the 1st parameter & the object key as the 2nd parameter.
"Folders" don't really exist in S3 - Amazon S3 doesn't use hierarchy to organize its objects and files. For the sake of organizational simplicity, the Amazon S3 console shows "folders" as a means of grouping objects but they are ultimately baked into your object key.
This should work:
import boto3
s3 = boto3.resource('s3')
obj = s3.Object("s3-bucket", "folder-1/folder-2/my_file.json")
body = obj.get()['Body'].read()

How can I write the output inside the folder of a S3 bucket using Lambda fuction

I am coding in Python Lang. in AWS Lambda and I need to write the output which shows in output logs in a folder of S3 bucket. The code is,
client = boto3.client('s3')
ouput_data = b'Checksums are matching for file'
client.put_object(Body=ouput_data, Bucket='bucket', Key='key')
And now I want to add Checksums are matching for file and then I will need to add the file names. How can I add

Adding s3 bucket folder's address to output bucket + python

How do I link my s3 bucket folder to output bucket in python?
I tried several permutation and combination but it still didn't work out. All I need is to link my folder address to output bucket in python.
I found error when I tried below combination -
output Bucket = "s3-bucket.folder-name"
output Bucket = "s3-bucket/folder-name/"
output Bucket = "s3-bucket\folder-name\"
None from the above worked, throws an error as -
Parameter validation failed:
Invalid bucket name "s3-bucket/folder-name/": Bucket name must match the
regex "^[a-z A-Z 0-9.\-_]{1,255}$"
Is there any alternate way to put the folder address into python script?
Please help!
In AWS, the "folder", or the object file path, is all part of the object key.
So when accessing a bucket you specify the bucket name, which is strictly the bucket name with nothing else, and then the object key would be the file path.

Python: recursive glob in s3

I am trying to get a list of parquet files paths from s3 that are inside of subdirectories and subdirectories of subdirectories (and so on and so forth).
If it was my local file system I would do this:
import glob
glob.glob('C:/Users/user/info/**/*.parquet', recursive=True)
I have tried using the glob method of s3fs however it doesn't have a recursive kwarg.
Is there a function I can use or do I need to implement it myself ?
You can use s3fs with glob:
import s3fs
s3 = s3fs.S3FileSystem(anon=False)
s3.glob('your/s3/path/here/*.parquet')
I also wanted to download the latest file from s3 bucket but located in a specific folder. Initially, I tried using glob but couldn't find a solution to this problem. Finally, I build following function to solve this problem. You can modify this function to work with subfolders.
This function will return dictionary of all filenames and timestamp in key-value pair
(Key: file_name, value: timestamp).
Just pass bucket name and prefix (which is folder name).
import boto3
def get_file_names(bucket_name,prefix):
"""
Return the latest file name in an S3 bucket folder.
:param bucket: Name of the S3 bucket.
:param prefix: Only fetch keys that start with this prefix (folder name).
"""
s3_client = boto3.client('s3')
objs = s3_client.list_objects_v2(Bucket=bucket_name)['Contents']
shortlisted_files = dict()
for obj in objs:
key = obj['Key']
timestamp = obj['LastModified']
# if key starts with folder name retrieve that key
if key.startswith(prefix):
# Adding a new key value pair
shortlisted_files.update( {key : timestamp} )
return shortlisted_files
latest_filename = get_latest_file_name(bucket_name='use_your_bucket_name',prefix = 'folder_name/')
S3 doesn't actually have subdirectories, per se.
boto3's S3.Client.list_objects() supports a prefix argument, which should get you all the objects in a given "directory" in a bucket no matter how "deep" they appear to be.

How do i delete a folder(blob) inside an azure container using delete_blob method of blockblobservice?

delete_blob() seems to delete only the files inside the container and from folders and subfolders inside the container. But i'm seeing below error in python while trying to delete a folder from container.
Client-Request-ID=7950669c-2c4a-11e8-88e7-00155dbf7128 Retry policy did not allow for a retry: Server-Timestamp=Tue, 20 Mar 2018 14:25:00 GMT, Server-Request-ID=54d1a5d6-b01e-007b-5e57-c08528000000, HTTP status code=404, Exception=The specified blob does not exist.ErrorCode: BlobNotFoundBlobNotFoundThe specified blob does not exist.RequestId:54d1a5d6-b01e-007b-5e57-c08528000000Time:2018-03-20T14:25:01.2130063Z.
azure.common.AzureMissingResourceHttpError: The specified blob does not exist.ErrorCode: BlobNotFound
BlobNotFoundThe specified blob does not exist.
RequestId:54d1a5d6-b01e-007b-5e57-c08528000000
Time:2018-03-20T14:25:01.2130063Z
Could anyone please help here?
In Azure Blob Storage, as such a folder doesn't exist. It is just a prefix for a blob's name. For example, if you see a folder named images and it contains a blob called myfile.png, then essentially the blob's name is images/myfile.png. Because the folders don't really exist (they are virtual), you can't delete the folder directly.
What you need to do is delete all blobs individually in that folder (or in other words delete the blobs whose name begins with that virtual folder name/path. Once you have deleted all the blobs, then that folder automatically goes away.
In order to accomplish this, first you would need to fetch all blobs whose name starts with the virtual folder path. For that you will use list_blobs method and specify the virtual folder path in prefix parameter. This will give you a list of blobs starting with that prefix. Once you have that list, you will delete the blobs one by one.
There are two things to understand from the process, you could delete specific files,folders,images...(blobs) using delete_blob , But if you want to delete containers, you have to use the delete_container which will delete all blobs within, here's a sample that i created which deletes blobs inside a path/virtual folder:
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='yraccountname', account_key='accountkey')
print("Retreiving blobs in specified container...")
blob_list=[]
container="containername"
def list_blobs(container):
try:
global blob_list
content = block_blob_service.list_blobs(container)
print("******Blobs currently in the container:**********")
for blob in content:
blob_list.append(blob.name)
print(blob.name)
except:
print("The specified container does not exist, Please check the container name or if it exists.")
list_blobs(container)
print("The list() is:")
print(blob_list)
print("Delete this blob: ",blob_list[1])
#DELETE A SPECIFIC BLOB FROM THE CONTAINER
block_blob_service.delete_blob(container,blob_list[1],snapshot=None)
list_blobs(container)
Please refer to the code in my repo with explanation in Readme section, as well as new storage scripts:https://github.com/adamsmith0016/Azure-storage
For others searching for the solution in python. This worked for me.
First make a variable that stores all the files in the folder that you want to remove.
Then for every file in that folder, remove the file by stating the name of the container, and then the actual foldername.name .
By removing all the files in a folder, the folders is deleted in azure.
def delete_folder(self, containername, foldername):
folders = [blob for blob in blob_service.block_blob_service.list_blobs(containername) if blob.name.startswith(foldername)]
if len(folders) > 0:
for folder in folders:
blob_service.block_blob_service.delete_blob(containername, foldername.name)
print("deleted folder",folder name)
Use list_blobs(name_starts_with=folder_name) and delete_blob()
Complete code:
blob_service_client = BlobServiceClient.from_connection_string(conn_str=CONN_STR)
blob_client = blob_service_client.get_container_client(AZURE_BLOBSTORE_CONTAINER)
for blob in blob_client.list_blobs(name_starts_with=FOLDER_NAME):
blob_client.delete_blob(blob.name)
You cannot delete a non-empty folder in Azure blobs, but you can achieve it if you delete the files inside the sub-folders first. The below work around will start deleting it from the files to the parent folder.
from azure.storage.blob import BlockBlobService
blob_client = BlockBlobService(account_name='', account_key='')
containername = 'XXX'
foldername = 'XXX'
def delete_folder(containername, foldername):
folders = [blob.name for blob in blob_client.list_blobs(containername, prefix=foldername)]
folders.sort(reverse=True, key=len)
if len(folders) > 0:
for folder in folders:
blob_client.delete_blob(containername, folder)
print("deleted folder",folder)

Categories

Resources