get blob_key from an existed file in cloudstorage - python

In Google Appengine, is it possible to retrieve the blob_key from google cloud storage filepath? The file is upload to cloud storage directly.
The document only shows how to create a file in cloud storage and get the blob_key.
https://developers.google.com/appengine/docs/python/blobstore/#Python_Using_the_Blobstore_API_with_Google_Cloud_Storage

It's possible you don't actually need such a key: if you uploaded the file directly, I assume you know the bucket name and object name/path? With this you can serve the object by constructing a URL (e.g. http://storage.googleapis.com/<bucket-name>/<object-path>) and you can read it using the Cloud Storage Client Library's open() function.
I've not used blob_keys myself, but the use case for them seems to be where an uploaded file needs to be served from a location which is unknown and therefore a GCS URL cannot be formed.

Related

How to upload from Digital Ocean Storage to Google Cloud Storage, directly, programatically without rclone

I want to migrate files from Digital Ocean Storage into Google Cloud Storage programatically without rclone.
I know the exact location file that resides in the Digital Ocean Storage(DOS), and I have the signed url for the Google Cloud Storage(GCS).
How can I modify the following code so I can copy the DOS file directly into GCS without intermediate download to my computer ?
def upload_to_gcs_bucket(blob_name, path_to_file, bucket_name):
""" Upload data to a bucket"""
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json(
'creds.json')
#print(buckets = list(storage_client.list_buckets())
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(blob_name)
blob.upload_from_filename(path_to_file)
#returns a public url
return blob.public_url
Google's Storage Transfer Servivce should be an answer for this type of problem (particularly because DigitalOcean Spaces like most is S3-compatible. But (!) I think (I'm unfamiliar with it and unsure) it can't be used for this configuration.
There is no way to transfer files from a source to a destination without some form of intermediate transfer but what you can do is use memory rather than using file storage as the intermediary. Memory is generally more constrained than file storage and if you wish to run multiple transfers concurrently, each will consume some amount of storage.
It's curious that you're using Signed URLs. Generally Signed URLs are provided by a 3rd-party to limit access to 3rd-party buckets. If you own the destination bucket, then it will be easier to use Google Cloud Storage buckets directly from one of Google's client libraries, such as Python Client Library.
The Python examples include uploading from file and from memory. It will likely be best to stream the files into Cloud Storage if you'd prefer to not create intermediate files. Here's a Python example

How to write raw json data [as pyhon Dictionary] to Google cloud storage using python?

My use case is very simple, I'm fetching raw JSON response form my REST API and keeping it as dictionary in python, i have to write this data into google cloud storage. is there any approach other than "upload_from_string" option ?
For uploading data to Cloud Storage, you have only 3 methods in Python blobs object:
Blob.upload_from_file()
Blob.upload_from_filename()
Blob.upload_from_string()
From dict, it's up to you to choose to convert it into a string and use upload_from_string method. Or you can also store it locally, in the /tmp directory (in memory file system), and then use file purpose methods.
You maybe have more capabilities with files, if you want to zip the content and/or use dedicated library that dump a dict into a file.
I have a similar use case. I want to throw some data, which is in a dictionary format into a google cloud storage bucket.
I'm assuming you already have created a bucket(it is a simple task if you are trying to do it programmatically).
from google.cloud import storage
import json
import os
def upload_to_gcloud(data: dict):
"""
this function take a dictionary as input and uploads
it in a google cloud storage bucket
"""
## your service-account credentials as JSON file
os.environ['GOOLE_APPLICATION_CREDENTAILS'] = "YOUR-SERVICE-ACCOUNT-CREDENTIALS-AS-JSON"
## instane of the storage client
storage_client = storage.Client()
## instance of a bucket in your google cloud storage
bucket = storage_client.get_bucket("your-bucket-name")
## if you want to create a new file
blob = bucket.blob("filename-you-want-here")
## if there already exists a file
blob = bucket.get_blob("filename-of-that-file")
## uploading data using upload_from_string method
## json.dumps() serializes a dictionary object as string
blob.upload_from_string(json.dumps(data))
this approach will work with any data which you can pose as a string. if you want to directly upload a file from your local filesystem, use upload_from_filename() instead.
Hope this helps!!

Python: Upload file to Google Cloud Storage using URL

I have a URL (https://example.com/myfile.txt) of a file and I want to upload it to my bucket (gs://my-sample-bucket) on Google Cloud Storage.
What I am currently doing is:
Downloading the file to my system using the requests library.
Uploading that file to my bucket using python function.
Is there any way I can upload the file directly using the URL.
You can use urllib2 or requests library to get the file from HTTP, then your existing python code to upload to Cloud Storage. Something like this should work:
import urllib2
from google.cloud import storage
client = storage.Client()
filedata = urllib2.urlopen('http://example.com/myfile.txt')
datatoupload = filedata.read()
bucket = client.get_bucket('bucket-id-here')
blob = Blob("myfile.txt", bucket)
blob.upload_from_string(datatoupload)
It still downloads the file into memory on your system, but I don't think there's a way to tell Cloud Storage to do that for you.
There is a way to do this, using a Cloud Storage Transfer job, but depending on your use case, it may be worth doing or not. You would need to create a transfer job to transfer a URL list.
I marked this question as duplicated from this.

Access google cloud storage account objects from app engine

Overview
I have a GCP storage bucket, which has a .json file and 5 jpeg files. In the .json file the image names match the jpeg file names. I want to know a way which i can access each of the object within the storage account based upon the image name.
Method 1 (Current Method):
Currently, a python script is been used to to get the images from the storage bucket. This is been done by looping through the .json file of image names, getting each individual image name, then building a URL based on the bucket/image name and retrieving the image and displaying it on a flask App Engine site.
This current method requires the bucket objects to be public, which poses a security issue with the internet granted access to this bucket, secondly it is computational expensive, with each image having to be pulled down from the bucket separately. The bucket will eventually contain 10000 images, which will result in the images been slow to load and display on the web page.
Requirement (New Method):
Is there a method in which i can pull down images from the bucket, not all the images at once, and display them on a web page. I want to be able to access individual images from the bucket and display their corresponding image data, retrieved from the .json file.
Lastly i want to ensure that neither the bucket or the objects are public and can only be accessed via the app engine.
Thanks
Would be helpful to see the Python code that's doing the work right now. You shouldn't need the storage objects to be public. They should be able to be retrieved using the Google Cloud Storage (GCS) API and a service account token that has view-only permissions on storage (although depending on whether or not you know the object names and need to get the bucket name, it might require more permissions on the service account).
As for the performance, you could either do things on the front end to be smart about how many you're showing and fetch only what you want to display as the user scrolls, or you could paginate your results from the GCS bucket.
Links to the service account and API pieces here:
https://cloud.google.com/iam/docs/service-accounts
https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-python
Information about pagination for retrieving GCS objects here:
How does paging work in the list_blobs function in Google Cloud Storage Python Client Library

Extract zip archive in Google Cloud Storage Bucket using Python api client

I'm trying to create a cloud function using api, the source code will be provided in a zip archive which includes an index.js & package.json files. I have uploaded this archive to storage bucket and create a cloud function via API request but now I need to extract this zip archive to point out the source for cloud function, how can I achieve that?
Here's what I have done:
From views.py
sclient = storage.Client()
bucket = sclient.get_bucket(func_obj.bucket)
blob = bucket.blob(func_obj.sourceFile.name)
print('Uploading archive...')
print(blob.upload_from_filename(file_name))
name = "projects/{}/locations/us-central1/functions/{}".format(func_obj.project, func_obj.fname,)
print(name)
req_body = {
"name": name,
"entryPoint": func_obj.entryPoint,
"timeout": "3.5s",
"availableMemoryMb": func_obj.fmemory,
# "sourceUploadUrl": upload_url,
"sourceArchiveUrl": "gs://newer_bucket/func.zip",
"httpsTrigger": {},
}
service = discovery.build('cloudfunctions', 'v1')
func_api = service.projects().locations().functions()
response = func_api.create(location='projects/' + func_obj.project + '/locations/us-central1',
body=req_body).execute()
pprint.pprint(response)
According to the Cloud Function Docs, the request body for the functions.create REST endpoint must contain an instance of CloudFunction. One of the fields of a cloud function JSON resource, as you can see in the second link, is "sourceArchiveUrl" and it points to the zip archive which contains the function. From my understanding, you don't need to extract the zip archive for Google Cloud Functions to access the code. Check the non-REST API deployment of Cloud Functions; it requires the archive, not an extracted archive.
But anyway, there is no direct way to extract the archive inside the bucket. You will need to download/transfer the archive somewhere (i.e. local machine, Google Compute Engine VM instance, etc.), perform the extraction operation on it and upload the results back in the bucket.
Why should you upload an archive to a Google Cloud Storage bucket if you are then required to extract it? Why not uploading the files?
You might wanna check this answer as well. Same question, different context.

Categories

Resources