Overview
I have a GCP storage bucket, which has a .json file and 5 jpeg files. In the .json file the image names match the jpeg file names. I want to know a way which i can access each of the object within the storage account based upon the image name.
Method 1 (Current Method):
Currently, a python script is been used to to get the images from the storage bucket. This is been done by looping through the .json file of image names, getting each individual image name, then building a URL based on the bucket/image name and retrieving the image and displaying it on a flask App Engine site.
This current method requires the bucket objects to be public, which poses a security issue with the internet granted access to this bucket, secondly it is computational expensive, with each image having to be pulled down from the bucket separately. The bucket will eventually contain 10000 images, which will result in the images been slow to load and display on the web page.
Requirement (New Method):
Is there a method in which i can pull down images from the bucket, not all the images at once, and display them on a web page. I want to be able to access individual images from the bucket and display their corresponding image data, retrieved from the .json file.
Lastly i want to ensure that neither the bucket or the objects are public and can only be accessed via the app engine.
Thanks
Would be helpful to see the Python code that's doing the work right now. You shouldn't need the storage objects to be public. They should be able to be retrieved using the Google Cloud Storage (GCS) API and a service account token that has view-only permissions on storage (although depending on whether or not you know the object names and need to get the bucket name, it might require more permissions on the service account).
As for the performance, you could either do things on the front end to be smart about how many you're showing and fetch only what you want to display as the user scrolls, or you could paginate your results from the GCS bucket.
Links to the service account and API pieces here:
https://cloud.google.com/iam/docs/service-accounts
https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-python
Information about pagination for retrieving GCS objects here:
How does paging work in the list_blobs function in Google Cloud Storage Python Client Library
Related
I want to migrate files from Digital Ocean Storage into Google Cloud Storage programatically without rclone.
I know the exact location file that resides in the Digital Ocean Storage(DOS), and I have the signed url for the Google Cloud Storage(GCS).
How can I modify the following code so I can copy the DOS file directly into GCS without intermediate download to my computer ?
def upload_to_gcs_bucket(blob_name, path_to_file, bucket_name):
""" Upload data to a bucket"""
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json(
'creds.json')
#print(buckets = list(storage_client.list_buckets())
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(blob_name)
blob.upload_from_filename(path_to_file)
#returns a public url
return blob.public_url
Google's Storage Transfer Servivce should be an answer for this type of problem (particularly because DigitalOcean Spaces like most is S3-compatible. But (!) I think (I'm unfamiliar with it and unsure) it can't be used for this configuration.
There is no way to transfer files from a source to a destination without some form of intermediate transfer but what you can do is use memory rather than using file storage as the intermediary. Memory is generally more constrained than file storage and if you wish to run multiple transfers concurrently, each will consume some amount of storage.
It's curious that you're using Signed URLs. Generally Signed URLs are provided by a 3rd-party to limit access to 3rd-party buckets. If you own the destination bucket, then it will be easier to use Google Cloud Storage buckets directly from one of Google's client libraries, such as Python Client Library.
The Python examples include uploading from file and from memory. It will likely be best to stream the files into Cloud Storage if you'd prefer to not create intermediate files. Here's a Python example
In AWS I have folder format like eg : Bucketname/Data/files/abc_01-02-2022.csv
In a increment order I have files for each dates for all the months in year.
In Google Cloud Storage I am trying to create folder structure like eg:Bucketname/data/202202/files/abc_01-02-2022.csv for whole year
So, I am trying to use storage transfer service which will take dynamically or from object itself and create a folder structure automatically by getting trigger automatically 2nd of the month.
Can we achieve this by using transfer service.
what is the best way to achieve this I am trying to make it simple as possible
Storage Transfer Service does not support destination object prefixes, the reason behind it is, Storage Transfer Service doesn’t support remapping, that is, you cannot copy the path Bucketname/Data/files/ to Bucketname/data/202202/files
My recommendation would be to first use the Storage Transfer Service to copy everything from one bucket to another and later use any of the available methods to rename the object in the new bucket to Bucketname/data/202202/files.
Also the Cloud Storage Objects are flat namespaces, that is, Cloud Storage does not have folders and sub folders. There are a few documents that you can refer to for more information on this Object name considerations and Folders
This is possible via using STS API. You can specify "path" at the destination bucket.
I am building a mosaic using Google cloud storage, and I store the user uploaded image in a bucket. However, to see any images, you must log in(a google account) first. I have set the bucket to public and any user who had already logged in can view the image. But I want the images to show up even if they haven't logged in. What should I do?
I store the url of the image in my database:
#this part is the return function of my image upload
imageurl= 'https://storage.cloud.google.com/fortest098.appspot.com/{}'.format(mosaicLocation)
# and this is how i put my image into the database:
ImageInfo(...,image_url=imageurl).put
am I suppose to get a public url or something?
By default objects uploaded to GCS are access controlled.
You can make your objects publicly readable by setting a predefinedAcl at upload time or after they are uploaded (e.g., see https://cloud.google.com/storage/docs/json_api/v1/objects/update).
You can also do this using the gsutil command:
gsutil acl set public-read gs://your-bucket/your-object
In Google Appengine, is it possible to retrieve the blob_key from google cloud storage filepath? The file is upload to cloud storage directly.
The document only shows how to create a file in cloud storage and get the blob_key.
https://developers.google.com/appengine/docs/python/blobstore/#Python_Using_the_Blobstore_API_with_Google_Cloud_Storage
It's possible you don't actually need such a key: if you uploaded the file directly, I assume you know the bucket name and object name/path? With this you can serve the object by constructing a URL (e.g. http://storage.googleapis.com/<bucket-name>/<object-path>) and you can read it using the Cloud Storage Client Library's open() function.
I've not used blob_keys myself, but the use case for them seems to be where an uploaded file needs to be served from a location which is unknown and therefore a GCS URL cannot be formed.
I have to write a python script with which I should be able list other protected buckets owned by other people who are ready grand access. For this I created a project and enabled Google Cloud Storage in my Google API console, then I installed gsutil and stored 'my credentials' in '.boto' file. now I have to list metadata of all buckets where I have access. My question is what I or other buckets owners has to do to grant access to me/my project so that my script can list metadata of all buckets/object inside bucket?
I'm following this doc for python scripting: https://developers.google.com/storage/docs/gspythonlibrary
You can only list all buckets you have access to within a project.
There is no way to list all buckets which you might have access to. The list of buckets you have access to would be extremely large, because it would include all buckets marked as public.
I would suggest you have people who give you access to their buckets give you their bucket name explicitly.