Create Folders inside google cloud storage bucket using python - python

I am trying to create a new bucket with 2 empty folders within it on Google Cloud storage using python client library.
I referred to the python client library API for GCS (https://google-cloud-python.readthedocs.io/en/latest/storage/client.html) and I found a create_bucket() method, but I would also like to create 2 folders - 'processed' and 'unprocessed' within it, but not able to find a method to create folders. Any help would be appreciated.
Thanks

GCS has a flat namespace, i.e., the concept of a 'folder' is not built into the service but rather an abstraction implemented by various clients. For example, both the Cloud Storage web UI (console.cloud.google.com/storage/browser) and gsutil implement the folder abstraction using an object name that ends with "/"
Thus, you could create folders by creating objects like your-bucket/abc/def/
but that would only be a folder to clients that know about/support that naming convention.

def copyFilesInFolder(self, file_name, src_blob_name, destination_blob_name):
"""Copies a blob from one bucket to another with a new name."""
# bucket_name = "your-bucket-name"
# blob_name = "your-object-name"
# destination_bucket_name = "destination-bucket-name"
# destination_blob_name = "destination-object-name"
# storage_client = storage.Client()
srcBlob = src_blob_name + '/' + file_name
destBlob = destination_blob_name + '/' + file_name
source_blob = self.bucket.blob(srcBlob)
destination_bucket = storage_client.bucket(destBlob)
blob_copy = self.bucket.copy_blob(
source_blob, self.bucket, destBlob
)
print(blob_copy)
print(
"File {} in bucket {} copied to blob {} in bucket {}.".format(
file_name,
src_blob_name,
file_name,
destination_blob_name,
)
)
return True
In GCP direct folder creation concept is not there. So we can save a new file in the new folder, this way even the destination folder doesn't exist it'd be created.

Related

python delete files from google cloud storage that starts with

Using python I am able to delete files from bucket using prefixes also but in python code, prefix means directory.
I want to delete the files from GCP bucket which starts with example.
For example:
example-2022-12-07
example-2022-12-08
I followed this(Delete Files from Google Cloud Storage) but did not get the answer.
I am trying this, but not working:
blobs = bucket.list_blobs()
fileList = [file.name for file in blobs if 'example' in file.name ]
print(fileList)
for file in fileList:
blob = blobs.blob(file)
blob.delete()
print(f"Blob {blob_name} deleted.")
You can try the following code to delete files from the Google Cloud Storage by using the blob.delete method as suggested in the Documentation.
Below is the example for what you are looking:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket(bucket_name)
# list all objects in the directory
# Add prefix as parameter to bucket.list_blobs
blobs = bucket.list_blobs(prefix=?)
for blob in blobs:
blob.delete()
print(f"Blob {blob_name} deleted.")`
You can check with this thread1 and thread2.

Recursively copy a child directory to the parent in Google Cloud Storage

I need to recursively move the contents of a sub-folder to a parent folder in google cloud storage. This code works for moving a single file from sub-folder to the parent.
client = storage.Client()
bucket = client.get_bucket(BUCKET_NAME)
source_path = Path(parent_dir, sub_folder, filename).as_posix()
source_blob = bucket.blob(source_path)
dest_path = Path(parent_dir, filename).as_posix()
bucket.copy_blob(source_blob, bucket, dest_path)
but I don't know how to properly format the command because if my dest_path is "parent_dir", I get the following error:
google.api_core.exceptions.NotFound: 404 POST https://storage.googleapis.com/storage/v1/b/bucket/o/parent_dir%2Fsubfolder/copyTo/b/geo-storage/o/parent_dir?prettyPrint=false: No such object: geo-storage/parent_dir/subfolder
Note: This works for recursive copy with gsutils but I would prefer to use the blob object:
os.system(f"gsutil cp -r gs://bucket/parent_dir/subfolder/* gs://bucket/parent_dir")
GCS does not have the concept of "directory" - just a flat namespace of objects. You can have an object named "foo/a.txt" and "foo/b.txt", but there is not actual thing representing "foo/" - its' just a prefix on the object names.
gsutil uses prefixes on the object names to pretend directories exist, but under the hood it is really just acting on all the individual objects with that prefix.
You need to do the same, with a copy for each object:
client = storage.Client()
bucket = client.Bucket(BUCKET_NAME)
for source_blob in client.list_blobs(BUCKET_NAME, prefix="old/prefix/"):
dest_name = source_blob.name.replace("old/prefix/", "new/prefix/")
# do these in parallel for more speed
bucket.copy_blob(source_blob, bucket, new_name=dest_name)

Automating running Python code using Azure services

Hi everyone on Stackoverflow,
I wrote two python scripts. One script is for picking up local files and sending them to GCS (Google Cloud Storage). Another one is opposite - for taking files from GCS that were uploaded and saving locally.
I want to automate process using Azure.
What would you recommend to use? Azure Function App, Azure Logic App or other services?
*
I'm now trying to use Logic App. I made .exe file using pyinstaller and looking for connector in Logic App that will run my program (.exe file). I have trigger in Logic App - "When a file is added or modified", but now I stack when selecting next step (connector)..
Kind regards,
Anna
Adding code as requested:
from google.cloud import storage
import os
import glob
import json
# Finding path to config file that is called "gcs_config.json" in directory C:/
def find_config(name, path):
for root, dirs, files in os.walk(path):
if name in files:
return os.path.join(root, name)
def upload_files(config_file):
# Reading 3 Parameters for upload from JSON file
with open(config_file, "r") as file:
contents = json.loads(file.read())
print(contents)
# Setting up login credentials
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = contents['login_credentials']
# The ID of GCS bucket
bucket_name = contents['bucket_name']
# Setting path to files
LOCAL_PATH = contents['folder_from']
for source_file_name in glob.glob(LOCAL_PATH + '/**'):
# For multiple files upload
# Setting destination folder according to file name
if os.path.isfile(source_file_name):
partitioned_file_name = os.path.split(source_file_name)[-1].partition("-")
file_type_name = partitioned_file_name[0]
# Setting folder where files will be uploaded
destination_blob_name = file_type_name + "/" + os.path.split(source_file_name)[-1]
# Setting up required variables for GCS
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
# Running upload and printing confirmation message
blob.upload_from_filename(source_file_name)
print("File from {} uploaded to {} in bucket {}.".format(
source_file_name, destination_blob_name, bucket_name
))
config_file = find_config("gcs_config.json", "C:/")
upload_files(config_file)
config.json:
{
"login_credentials": "C:/Users/AS/Downloads/bright-velocity-___-53840b2f9bb4.json",
"bucket_name": "staging.bright-velocity-___.appspot.com",
"folder_from": "C:/Users/AS/Documents/Test2/",
"folder_for_downloaded_files": "C:/Users/AnnaShepilova/Documents/DownloadedFromGCS2/",
"given_date": "",
"given_prefix": ["Customer", "Account"] }
Currently, there is no built-in connector in Logic Apps for interacting with Google Cloud Services. however, you can use Google Cloud Storage does provide REST API in your Logic app or Function app.
But my suggestion is you can use the Azure Function to do these things. Because the azure Function can be more flexible to write your own flow to do the task.
Refer to run your .exe file in the Azure function. If you are using Local EXE or using Cloud Environment exe.
Refer here for more information

How to delete GCS folder from Python?

Using https://github.com/googleapis/google-cloud-python/tree/master/storage or https://github.com/GoogleCloudPlatform/appengine-gcs-client, I can delete a files by specifying its file name, but there seems not to be ways to delete folders.
Is there any ways to delete folders ?
I found this(Google Cloud Storage: How to Delete a folder (recursively) in Python) in stackvoerflow, but this answer simply deletes all the files in the folder, not deleting the folder itself.
The code mentioned in the anwser you referred works, the prefix should look like this:
from google.cloud import storage
storage_client = storage.Client()
bucket = storage_client.get_bucket('my-bucket')
blobs = bucket.list_blobs(prefix='my-folder/')
for blob in blobs:
blob.delete()
from google.cloud import storage
def delete_storage_folder(bucket_name, folder):
"""
This function deletes from GCP Storage
:param bucket_name: The bucket name in which the file is to be placed
:param folder: Folder name to be deleted
:return: returns nothing
"""
cloud_storage_client = storage.Client()
bucket = cloud_storage_client.bucket(bucket_name)
try:
bucket.delete_blobs(blobs=list(bucket.list_blobs(prefix=folder)))
except Exception as e:
print(str(e.message))

How to move files in Google Cloud Storage from one bucket to another bucket by Python

Are there any API function that allow us to move files in Google Cloud Storage from one bucket in another bucket?
The scenario is we want Python to move read files in A bucket to B bucket. I knew that gsutil could do that but not sure Python can support that or not.
Thanks.
Here's a function I use when moving blobs between directories within the same bucket or to a different bucket.
from google.cloud import storage
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="path_to_your_creds.json"
def mv_blob(bucket_name, blob_name, new_bucket_name, new_blob_name):
"""
Function for moving files between directories or buckets. it will use GCP's copy
function then delete the blob from the old location.
inputs
-----
bucket_name: name of bucket
blob_name: str, name of file
ex. 'data/some_location/file_name'
new_bucket_name: name of bucket (can be same as original if we're just moving around directories)
new_blob_name: str, name of file in new directory in target bucket
ex. 'data/destination/file_name'
"""
storage_client = storage.Client()
source_bucket = storage_client.get_bucket(bucket_name)
source_blob = source_bucket.blob(blob_name)
destination_bucket = storage_client.get_bucket(new_bucket_name)
# copy to new destination
new_blob = source_bucket.copy_blob(
source_blob, destination_bucket, new_blob_name)
# delete in old destination
source_blob.delete()
print(f'File moved from {source_blob} to {new_blob_name}')
Using the google-api-python-client, there is an example on the storage.objects.copy page. After you copy, you can delete the source with storage.objects.delete.
destination_object_resource = {}
req = client.objects().copy(
sourceBucket=bucket1,
sourceObject=old_object,
destinationBucket=bucket2,
destinationObject=new_object,
body=destination_object_resource)
resp = req.execute()
print json.dumps(resp, indent=2)
client.objects().delete(
bucket=bucket1,
object=old_object).execute()
you can use GCS Client Library Functions documented at [1] to read to one bucket and write to the other and then delete source file.
You can even use the GCS REST API documented at [2].
Link:
[1] - https://developers.google.com/appengine/docs/python/googlecloudstorageclient/functions
[2] - https://developers.google.com/storage/docs/concepts-techniques#overview
def GCP_BUCKET_A_TO_B():
source_bucket = storage_client.get_bucket("Bucket_A_Name")
filename = [filename.name for filename in
list(source_bucket.list_blobs(prefix=""))]
for i in range (0,len(filename)):
source_blob = source_bucket.blob(filename[i])
destination_bucket = storage_client.get_bucket("Bucket_B_Name")
new_blob = source_bucket.copy_blob(
source_blob, destination_bucket, filename[i])
I just wanted to point out that there's another possible approach and that is using gsutil through the use of the subprocess module.
The advantages of using gsutil like that:
You don't have to deal with individual blobs
gsutil's implementation of the move and especially rsync will probably be much better and more resilient that what we do ourselves.
The disadvantages:
You can't deal with individual blobs easily
It's hacky and generally a library is preferable to executing shell commands
Example:
def move(source_uri: str,
destination_uri: str) -> None:
"""
Move file from source_uri to destination_uri.
:param source_uri: gs:// - like uri of the source file/directory
:param destination_uri: gs:// - like uri of the destination file/directory
:return: None
"""
cmd = f"gsutil -m mv {source_uri} {destination_uri}"
subprocess.run(cmd)

Categories

Resources