How to delete GCS folder from Python? - python

Using https://github.com/googleapis/google-cloud-python/tree/master/storage or https://github.com/GoogleCloudPlatform/appengine-gcs-client, I can delete a files by specifying its file name, but there seems not to be ways to delete folders.
Is there any ways to delete folders ?
I found this(Google Cloud Storage: How to Delete a folder (recursively) in Python) in stackvoerflow, but this answer simply deletes all the files in the folder, not deleting the folder itself.

The code mentioned in the anwser you referred works, the prefix should look like this:
from google.cloud import storage
storage_client = storage.Client()
bucket = storage_client.get_bucket('my-bucket')
blobs = bucket.list_blobs(prefix='my-folder/')
for blob in blobs:
blob.delete()

from google.cloud import storage
def delete_storage_folder(bucket_name, folder):
"""
This function deletes from GCP Storage
:param bucket_name: The bucket name in which the file is to be placed
:param folder: Folder name to be deleted
:return: returns nothing
"""
cloud_storage_client = storage.Client()
bucket = cloud_storage_client.bucket(bucket_name)
try:
bucket.delete_blobs(blobs=list(bucket.list_blobs(prefix=folder)))
except Exception as e:
print(str(e.message))

Related

python delete files from google cloud storage that starts with

Using python I am able to delete files from bucket using prefixes also but in python code, prefix means directory.
I want to delete the files from GCP bucket which starts with example.
For example:
example-2022-12-07
example-2022-12-08
I followed this(Delete Files from Google Cloud Storage) but did not get the answer.
I am trying this, but not working:
blobs = bucket.list_blobs()
fileList = [file.name for file in blobs if 'example' in file.name ]
print(fileList)
for file in fileList:
blob = blobs.blob(file)
blob.delete()
print(f"Blob {blob_name} deleted.")
You can try the following code to delete files from the Google Cloud Storage by using the blob.delete method as suggested in the Documentation.
Below is the example for what you are looking:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket(bucket_name)
# list all objects in the directory
# Add prefix as parameter to bucket.list_blobs
blobs = bucket.list_blobs(prefix=?)
for blob in blobs:
blob.delete()
print(f"Blob {blob_name} deleted.")`
You can check with this thread1 and thread2.

How to retrieve the folder name from Google Cloud Storage bucket with sub-folder

We have a GCS bucket with a subfolder at url https://storage.googleapis.com/our-bucket/path-to-subfolder. This sub-folder contains files
from google.cloud import storage
def list_blobs(bucket_name):
"""Lists all the blobs in the bucket."""
bucket_name = "my_bucket"
storage_client = storage.Client()
# Note: Client.list_blobs requires at least package version 1.17.0.
blobs = storage_client.list_blobs(bucket_name)
for blob in blobs:
print(blob.name)
Using this function i can extract the whole directory
'folder1/data.csv',
' folder1/data2.csv',
' folder1/data4.csv',
folder2/data3.csv',
' folder2/data5.csv',
' folder3/data.csv',
Instead of that is it possible to retrieved the folder name
Output :
folder1
folder2
folder3
As your question description doesn’t match the title of the stackoverflow question. Considering the described issue “is it possible to retrieved the folder name” below is the solution you can try with
def list_blobs_with_prefix(bucket_name, prefix,delimiter=None):
storage_client = storage.Client()
blobs = storage_client.list_blobs(bucket_name,prefix=prefix,delimiter=delimiter)
For details you can check the Document.

FileNotFoundError when trying to access a file in google cloud that exists inside a bucket storage

FileNotFoundError When trying to read/access a file or a folder that exists in the bucket in the google cloud by referencing gs://BUCKET_NAME/FolderName/.
I am using python 3 as the kernel with a jupyter notebook. I have a cluster configured in the google cloud linked to a bucket. When ever I try to read/upload a file I am getting the file not found error
def get_files(bucketName):
files = [f for f in listdir(localFolder) if
isfile(join(localFolder, f))]
for file in files:
print("file path:", file)
get_files("agriculture-bucket-gl")
I should be able to access the folder contents or to reference any file that exists inside any folder in the bucket.
Error Message:
FileNotFoundError: [Errno 2] No such file or directory: 'gs://agriculture-bucket-gl/Data sets/'
You need to access the bucket using the storage library, to get the file and then get content.
You may find this code template helpful.
from google.cloud import storage
# Instantiates a client
client = storage.Client()
bucket_name = 'your_bucket_name'
bucket = client.get_bucket(bucket_name)
blob = bucket.get_blob('route/to/file.txt')
downloaded_blob = blob.download_as_string()
print(downloaded_blob)
To add to the the previous answers, the path in the Error Message: FileNotFoundError: [Errno 2] No such file or directory: 'gs://agriculture-bucket-gl/Data sets/' also contains some issues. I'd try fixing the following:
The folder name "Data sets" has a space. I'd try a name without the space.
There is the / sign is at the end of the path. The path should end without a slash.
If you want to access from storage
from google.cloud import storage
bucket_name = 'your_bucket_name'
blob_path = 'storage/path/fileThatYouWantToAccess'
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(blob_path)
#this is optional if you want to download it to tmp folder
blob.download_to_filename('/tmp/fileThatYouWantToAccess')

Create Folders inside google cloud storage bucket using python

I am trying to create a new bucket with 2 empty folders within it on Google Cloud storage using python client library.
I referred to the python client library API for GCS (https://google-cloud-python.readthedocs.io/en/latest/storage/client.html) and I found a create_bucket() method, but I would also like to create 2 folders - 'processed' and 'unprocessed' within it, but not able to find a method to create folders. Any help would be appreciated.
Thanks
GCS has a flat namespace, i.e., the concept of a 'folder' is not built into the service but rather an abstraction implemented by various clients. For example, both the Cloud Storage web UI (console.cloud.google.com/storage/browser) and gsutil implement the folder abstraction using an object name that ends with "/"
Thus, you could create folders by creating objects like your-bucket/abc/def/
but that would only be a folder to clients that know about/support that naming convention.
def copyFilesInFolder(self, file_name, src_blob_name, destination_blob_name):
"""Copies a blob from one bucket to another with a new name."""
# bucket_name = "your-bucket-name"
# blob_name = "your-object-name"
# destination_bucket_name = "destination-bucket-name"
# destination_blob_name = "destination-object-name"
# storage_client = storage.Client()
srcBlob = src_blob_name + '/' + file_name
destBlob = destination_blob_name + '/' + file_name
source_blob = self.bucket.blob(srcBlob)
destination_bucket = storage_client.bucket(destBlob)
blob_copy = self.bucket.copy_blob(
source_blob, self.bucket, destBlob
)
print(blob_copy)
print(
"File {} in bucket {} copied to blob {} in bucket {}.".format(
file_name,
src_blob_name,
file_name,
destination_blob_name,
)
)
return True
In GCP direct folder creation concept is not there. So we can save a new file in the new folder, this way even the destination folder doesn't exist it'd be created.

How to move files in Google Cloud Storage from one bucket to another bucket by Python

Are there any API function that allow us to move files in Google Cloud Storage from one bucket in another bucket?
The scenario is we want Python to move read files in A bucket to B bucket. I knew that gsutil could do that but not sure Python can support that or not.
Thanks.
Here's a function I use when moving blobs between directories within the same bucket or to a different bucket.
from google.cloud import storage
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="path_to_your_creds.json"
def mv_blob(bucket_name, blob_name, new_bucket_name, new_blob_name):
"""
Function for moving files between directories or buckets. it will use GCP's copy
function then delete the blob from the old location.
inputs
-----
bucket_name: name of bucket
blob_name: str, name of file
ex. 'data/some_location/file_name'
new_bucket_name: name of bucket (can be same as original if we're just moving around directories)
new_blob_name: str, name of file in new directory in target bucket
ex. 'data/destination/file_name'
"""
storage_client = storage.Client()
source_bucket = storage_client.get_bucket(bucket_name)
source_blob = source_bucket.blob(blob_name)
destination_bucket = storage_client.get_bucket(new_bucket_name)
# copy to new destination
new_blob = source_bucket.copy_blob(
source_blob, destination_bucket, new_blob_name)
# delete in old destination
source_blob.delete()
print(f'File moved from {source_blob} to {new_blob_name}')
Using the google-api-python-client, there is an example on the storage.objects.copy page. After you copy, you can delete the source with storage.objects.delete.
destination_object_resource = {}
req = client.objects().copy(
sourceBucket=bucket1,
sourceObject=old_object,
destinationBucket=bucket2,
destinationObject=new_object,
body=destination_object_resource)
resp = req.execute()
print json.dumps(resp, indent=2)
client.objects().delete(
bucket=bucket1,
object=old_object).execute()
you can use GCS Client Library Functions documented at [1] to read to one bucket and write to the other and then delete source file.
You can even use the GCS REST API documented at [2].
Link:
[1] - https://developers.google.com/appengine/docs/python/googlecloudstorageclient/functions
[2] - https://developers.google.com/storage/docs/concepts-techniques#overview
def GCP_BUCKET_A_TO_B():
source_bucket = storage_client.get_bucket("Bucket_A_Name")
filename = [filename.name for filename in
list(source_bucket.list_blobs(prefix=""))]
for i in range (0,len(filename)):
source_blob = source_bucket.blob(filename[i])
destination_bucket = storage_client.get_bucket("Bucket_B_Name")
new_blob = source_bucket.copy_blob(
source_blob, destination_bucket, filename[i])
I just wanted to point out that there's another possible approach and that is using gsutil through the use of the subprocess module.
The advantages of using gsutil like that:
You don't have to deal with individual blobs
gsutil's implementation of the move and especially rsync will probably be much better and more resilient that what we do ourselves.
The disadvantages:
You can't deal with individual blobs easily
It's hacky and generally a library is preferable to executing shell commands
Example:
def move(source_uri: str,
destination_uri: str) -> None:
"""
Move file from source_uri to destination_uri.
:param source_uri: gs:// - like uri of the source file/directory
:param destination_uri: gs:// - like uri of the destination file/directory
:return: None
"""
cmd = f"gsutil -m mv {source_uri} {destination_uri}"
subprocess.run(cmd)

Categories

Resources