Google Cloud Storage with gspythonlibrary - python

I have to write a python script with which I should be able list other protected buckets owned by other people who are ready grand access. For this I created a project and enabled Google Cloud Storage in my Google API console, then I installed gsutil and stored 'my credentials' in '.boto' file. now I have to list metadata of all buckets where I have access. My question is what I or other buckets owners has to do to grant access to me/my project so that my script can list metadata of all buckets/object inside bucket?
I'm following this doc for python scripting: https://developers.google.com/storage/docs/gspythonlibrary

You can only list all buckets you have access to within a project.
There is no way to list all buckets which you might have access to. The list of buckets you have access to would be extremely large, because it would include all buckets marked as public.
I would suggest you have people who give you access to their buckets give you their bucket name explicitly.

Related

S3 to Google Storage using transfer service

In AWS I have folder format like eg : Bucketname/Data/files/abc_01-02-2022.csv
In a increment order I have files for each dates for all the months in year.
In Google Cloud Storage I am trying to create folder structure like eg:Bucketname/data/202202/files/abc_01-02-2022.csv for whole year
So, I am trying to use storage transfer service which will take dynamically or from object itself and create a folder structure automatically by getting trigger automatically 2nd of the month.
Can we achieve this by using transfer service.
what is the best way to achieve this I am trying to make it simple as possible
Storage Transfer Service does not support destination object prefixes, the reason behind it is, Storage Transfer Service doesn’t support remapping, that is, you cannot copy the path Bucketname/Data/files/ to Bucketname/data/202202/files
My recommendation would be to first use the Storage Transfer Service to copy everything from one bucket to another and later use any of the available methods to rename the object in the new bucket to Bucketname/data/202202/files.
Also the Cloud Storage Objects are flat namespaces, that is, Cloud Storage does not have folders and sub folders. There are a few documents that you can refer to for more information on this Object name considerations and Folders
This is possible via using STS API. You can specify "path" at the destination bucket.

Google cloud storage cycling through a directory

I have a storage bucket in google cloud. I have a few directories which I created with files in them.
I know that if I want to cycle through all the files in all the directories, I can use the following command:
for file in list(source_bucket.list_blobs()):
file_path=f"gs://{file.bucket.name}/{file.name}"
print(file_path)
Is there a way to only cycle through one of the directories?
I suggest studying the Cloud Storage list API in more detail. It looks like you have only experimented with the most basic use of list_blobs(). As you can see from the linked API documentation, you can pass a prefix parameter to limit the scope of the list to some path. source_bucket.list_blobs(prefix="path"):

Python: Syncing between two s3 buckets in different accounts when I can't modify the source bucket

This is a similar question to this: Is it possible to copy between AWS accounts using AWS CLI? The difference is, I want to do this in python code, and I can't change the s3 bucket policies in the source bucket (it's owned by a 3rd party). I do have the credentials to both buckets.
How do I run a sync command between these two buckets in python code?
To directly copy (eg with CopyObject) objects between Amazon S3 buckets in different accounts, you will need to use a single set of credentials that have:
Read permission on the source bucket
Write permission on the destination bucket
These credentials can come from either account. However, since you cannot change the Bucket policy on the source bucket to reference credentials from your account, you will need to use the credentials that they have provided to you.
Let's say the scenario is:
The source is Bucket-A in Account-A
The destination is Bucket-B in Account-B
You have IAM credentials from Account-A — let's call it User-A
User-A has permission to read from Bucket-A
You will need to :
Add a Bucket Policy to Bucket-B that permits User-A to write to the bucket (PutObject)
When performing the copy, specify "ACL": "bucket-owner-full-control", which will make the objects owned by the destination account. Without this, the objects will continue to be 'owned' by the Account-A even though it is in a bucket owned by Account-B
Finally, please note that boto3 does not natively provide a sync command. You will be responsible for all the sync logic, copying one object at a time.
Do it in Python, like this to call the AWS CLI
import subprocess
cmd='aws s3 sync s3://mybucket s3://mybucket2'
push=subprocess.Popen(cmd, shell=True, stdout = subprocess.PIPE)
print push.returncode
Or there abouts. :-) Wherever you run this from, say an EC2 instance, make sure it has the user or role that has valid permissions to access both buckets.

Access google cloud storage account objects from app engine

Overview
I have a GCP storage bucket, which has a .json file and 5 jpeg files. In the .json file the image names match the jpeg file names. I want to know a way which i can access each of the object within the storage account based upon the image name.
Method 1 (Current Method):
Currently, a python script is been used to to get the images from the storage bucket. This is been done by looping through the .json file of image names, getting each individual image name, then building a URL based on the bucket/image name and retrieving the image and displaying it on a flask App Engine site.
This current method requires the bucket objects to be public, which poses a security issue with the internet granted access to this bucket, secondly it is computational expensive, with each image having to be pulled down from the bucket separately. The bucket will eventually contain 10000 images, which will result in the images been slow to load and display on the web page.
Requirement (New Method):
Is there a method in which i can pull down images from the bucket, not all the images at once, and display them on a web page. I want to be able to access individual images from the bucket and display their corresponding image data, retrieved from the .json file.
Lastly i want to ensure that neither the bucket or the objects are public and can only be accessed via the app engine.
Thanks
Would be helpful to see the Python code that's doing the work right now. You shouldn't need the storage objects to be public. They should be able to be retrieved using the Google Cloud Storage (GCS) API and a service account token that has view-only permissions on storage (although depending on whether or not you know the object names and need to get the bucket name, it might require more permissions on the service account).
As for the performance, you could either do things on the front end to be smart about how many you're showing and fetch only what you want to display as the user scrolls, or you could paginate your results from the GCS bucket.
Links to the service account and API pieces here:
https://cloud.google.com/iam/docs/service-accounts
https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-python
Information about pagination for retrieving GCS objects here:
How does paging work in the list_blobs function in Google Cloud Storage Python Client Library

How to properly use create_anonymous_client() function in google cloud storage python library for access on public buckets?

I made a publicly listable bucket on google cloud storage. I can see all the keys if I try to list the bucket objects in the browser. I was trying to use the create_anonymous_client() function so that I can list the bucket keys in the python script. It is giving me an exception. I looked up everywhere and still can't find the proper way to use the function.
from google.cloud import storage
client = storage.Client.create_anonymous_client()
a = client.lookup_bucket('publically_listable_bucket')
a.list_blobs()
Exception I am getting:
ValueError: Anonymous credentials cannot be refreshed.
Additional Query: Can I list and download contents of public google cloud storage buckets using boto3, If yes, how to do it anonymously?
I was also struggling with thing and couldn't find an answer anywhere online. Turns out you can access the bucket with just the bucket() method.
I'm not sure why, but this method can take several seconds sometimes.
client = storage.Client.create_anonymous_client()
bucket = client.bucket('publically_listable_bucket')
blobs = list(bucket.list_blobs())
This error means the bucket you are attempting to list does not grant the right permission. You must Give "Storage Object Viewer" or "Storage Legacy Bucket Reader" role to "allUsers".

Categories

Resources