Object metadata keys are lowercased when uploading to GCS with Apache Libcloud - python

I'm using Apache Libcloud to upload files to a Google Cloud Storage bucket together with object metadata.
In the process, the keys in my metadata dict are being lowercased. I'm not sure whether this is due to Cloud Storage or whether this happens in Libcloud.
The issue can be reproduced following the example from the Libcloud docs:
from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver
cls = get_driver(Provider.GOOGLE_STORAGE)
driver = cls('SA-EMAIL', './SA.json') # provide service account credentials here
FILE_PATH = '/home/user/file'
extra = {'meta_data': {'camelCase': 'foo'}}
# Upload with metadata
with open(FILE_PATH, 'rb') as iterator:
obj = driver.upload_object_via_stream(iterator=iterator,
container=container,
object_name='file',
extra=extra)
The file uploads succesfully, but resulting metadata will look as follows:
Where camelCase has been turned into camelcase.
I don't think GCS disallows camelcase for object metadata, since it's possible to edit the metadata manually in that sense:
I went through Libcloud's source code, but I don't see any explicit lowercasing going on. Any pointers on how to upload camelcased metadata with libcloud are most welcome.

I also checked the library and wasn't able to see anything obvious. But I guess to open a new issue there will be a great start.
As far as what's concerned on the Google Cloud Storage side, and as you could verify by yourself it does admit camelcase. I was able to successfully edit the metadata of a file by using the code offered on their public docs (but wasn't able to figure out something on libcloud itself):
from google.cloud import storage
def set_blob_metadata(bucket_name, blob_name):
"""Set a blob's metadata."""
# bucket_name = 'your-bucket-name'
# blob_name = 'your-object-name'
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.get_blob(blob_name)
metadata = {'camelCase': 'foo', 'NaMe': 'TeSt'}
blob.metadata = metadata
blob.patch()
print("The metadata for the blob {} is {}".format(blob.name, blob.metadata))
So, I believe that this could be a good workaround on your case if you are not able to work it out with libcloud. Do notice that the Cloud Storage Client Libraries base their authentication on environment variables and the following docs should be followed.
Addition by question author: As hinted at in the comments, metadata can be added to a blob before uploading a file as follows:
from google.cloud import storage
gcs = storage.Client()
bucket = gcs.get_bucket('my-bucket')
blob = bucket.blob('document')
blob.metadata = {'camelCase': 'foobar'}
blob.upload_from_file(open('/path/to/document', 'rb'))
This allows to set metadata without having to patch an existing blob, and provides an effective workaround for the issue with libcloud.

Related

How to create Google Cloud storage access token programmatically python

I need to have a public URL for a file that I am creating inside a google function.
I want therefore to create an access token :
I am able to upload the file from a python google function with the function blob.upload_from_string(blob_text), but I do not know how I can create a public url (or create an access token) for it.
Could you help me with it ?
EDITING WITH THE ANSWER (almost copy paste from Marc Anthony B answer )
blob = bucket.blob(storage_path)
token = uuid4()
metadata = {"firebaseStorageDownloadTokens": token}
blob.metadata = metadata
download_url = 'https://firebasestorage.googleapis.com/v0/b/{}/o/{}?alt=media&token={}' \
.format(bucket.name, storage_path.replace("/", "%2F"), token)
with open(video_file_path, 'rb') as f:
blob.upload_from_file(f)
Firebase Storage for Python still doesn't have its own SDK but you can use firebase-admin instead. Firebase Admin SDKs depend on the Google Cloud Storage client libraries to provide Cloud Storage access. The bucket references returned by the Admin SDK are objects defined in these libraries.
When uploading an object to Firebase Storage, you must incorporate a custom access token. You may use UUID4 for this case. See code below:
import firebase_admin
from firebase_admin import credentials
from firebase_admin import storage
from uuid import uuid4
projectId = '<PROJECT-ID>'
storageBucket = '<BUCKET-NAME>'
cred = credentials.ApplicationDefault()
firebase_admin.initialize_app(cred, {
'projectId': projectId,
'storageBucket': storageBucket
})
bucket = storage.bucket()
# E.g: "upload/file.txt"
bucket_path = "<BUCKET-PATH>"
blob = bucket.blob(bucket_path)
# Create a token from UUID.
# Technically, you can use any string to your token.
# You can assign whatever you want.
token = uuid4()
metadata = {"firebaseStorageDownloadTokens": token}
# Assign the token as metadata
blob.metadata = metadata
blob.upload_from_filename(filename="<FILEPATH>")
# Make the file public (OPTIONAL). To be used for Cloud Storage URL.
blob.make_public()
# Fetches a public URL from GCS.
gcs_storageURL = blob.public_url
# Generates a URL with Access Token from Firebase.
firebase_storageURL = 'https://firebasestorage.googleapis.com/v0/b/{}/o/{}?alt=media&token={}'.format(storageBucket, bucket_path, token)
print({
"gcs_storageURL": gcs_storageURL,
"firebase_storageURL": firebase_storageURL
})
As you can see from the code above, I've mentioned GCS and Firebase URLs. If you want a public URL from GCS then you should make the object public by using the make_public() method. If you want to use the access token generated, then just concatenate the default Firebase URL with the variables required.
If the objects are already in the Firebase Storage and already have access tokens incorporated on it, then you can get it by getting the objects metadata. See code below:
# E.g: "upload/file.txt"
bucket_path = "<BUCKET-PATH>"
blob = bucket.get_blob(bucket_path)
# Fetches object metadata
metadata = blob.metadata
# Firebase Access Token
token = metadata['firebaseStorageDownloadTokens']
firebase_storageURL = 'https://firebasestorage.googleapis.com/v0/b/{}/o/{}?alt=media&token={}'.format(storageBucket, bucket_path, token)
print(firebase_storageURL)
For more information, you may check out this documentation:
Google Cloud Storage Library for Python
Introduction to the Admin Cloud Storage API

download_to_filename makes an empty file (google cloud storage)

Answer:
I needed to grant ownership access to the storage policy and not just grant a general API ownership to the IAM connected to my instance.
I have simply followed the tutorial on the website here: https://cloud.google.com/storage/docs/downloading-objects#storage-download-object-python
Which says
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print('Blob {} downloaded to {}.'.format(
source_blob_name,
destination_file_name))
download_blob([BUCKETNAME],[FILENAME],"/home/me/documents/file.png")
I don't receive an error but the last line to be executed is blob.download_to_filename(destination_file_name)
Which creates an empty file.
Additional info:
My bucket is in the format "mybucketname"
my file is in the format "sdg-1234-fggr-34234.png"
I hope anyone has knowledge about my issue.
Is it encoding? Or doesn't the download_to_filename execute? Or something else?
When working with API's it's important to grant ownership access to the storage policy and not just grant a general API ownership to the IAM connected to an instance.

Get content_type from Google Cloud file

I have two api endpoints, one that takes a file from an http request and uploads it to a google cloud bucket using the python api, and another that downloads it again. in the first view, i get the file content type from the http request and upload it to the bucket,setting that metadata:
from google.cloud import storage
file_obj = request.FILES['file']
client = storage.Client.from_service_account_json(path.join(
path.realpath(path.dirname(__file__)),
'..',
'settings',
'api-key.json'
))
bucket = client.get_bucket('storage-bucket')
blob = bucket.blob(filename)
blob.upload_from_string(
file_text,
content_type=file_obj.content_type
)
Then in another view, I download the file:
...
bucket = client.get_bucket('storage-bucket')
blob = bucket.blob(filename)
blob.download_to_filename(path)
How can I access the file metadata I set earlier (content_type) ? It's not available on the blob object anymore since a new one was instantiated, but it still holds the file.
You should try
blob = bucket.get_blob(blob_name)
blob.content_type

Set metadata in Google Cloud Storage using gcloud-python

I am trying to upload a file to Google Cloud Storage using gcloud-python and set some custom metadata properties. To try this I have created a simple script.
import os
from gcloud import storage
client = storage.Client('super secret app id')
bucket = client.get_bucket('super secret bucket name')
blob = bucket.get_blob('kirby.png')
blob.metadata = blob.metadata or {}
blob.metadata['Color'] = 'Pink'
with open(os.path.expanduser('~/Pictures/kirby.png'), 'rb') as img_data:
blob.upload_from_file(img_data)
I am able to upload the file contents. After uploading the file I am able to manually set metadata from the developer console and retrieve it.
I can't figure out how to upload the metadata programmatically.
We discussed on the issue tracker and it surfaced a "bug" in the implementation, or at the very least something which catches users off guard.
Accessing metadata via blob.metadata is read-only. Thus when mutating that result via
blob.metadata['Color'] = 'Pink'
it doesn't actually change the metadata stored on blob.
The current "fix" is to just build up
metadata = {'Color': 'Pink'}
blob.metadata = metadata
blob.content_disposition = "attachment"
blob.patch()
blob.cache_control = "no-store"
blob.patch()
https://cloud.google.com/storage/docs/viewing-editing-metadata#storage-set-object-metadata-python

Python Boto3 AWS Multipart Upload Syntax

I am successfully authenticating with AWS and using the 'put_object' method on the Bucket object to upload a file. Now I want to use the multipart API to accomplish this for large files. I found the accepted answer in this question:
How to save S3 object to a file using boto3
But when trying to implement I am getting "unknown method" errors. What am I doing wrong? My code is below. Thanks!
## Get an AWS Session
self.awsSession = Session(aws_access_key_id=accessKey,
aws_secret_access_key=secretKey,
aws_session_token=session_token,
region_name=region_type)
...
# Upload the file to S3
s3 = self.awsSession.resource('s3')
s3.Bucket('prodbucket').put_object(Key=fileToUpload, Body=data) # WORKS
#s3.Bucket('prodbucket').upload_file(dataFileName, 'prodbucket', fileToUpload) # DOESNT WORK
#s3.upload_file(dataFileName, 'prodbucket', fileToUpload) # DOESNT WORK
The upload_file method has not been ported over to the bucket resource yet. For now you'll need to use the client object directly to do this:
client = self.awsSession.client('s3')
client.upload_file(...)
Libcloud S3 wrapper transparently handles all the splitting and uploading of the parts for you.
Use upload_object_via_stream method to do so:
from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver
# Path to a very large file you want to upload
FILE_PATH = '/home/user/myfile.tar.gz'
cls = get_driver(Provider.S3)
driver = cls('api key', 'api secret key')
container = driver.get_container(container_name='my-backups-12345')
# This method blocks until all the parts have been uploaded.
extra = {'content_type': 'application/octet-stream'}
with open(FILE_PATH, 'rb') as iterator:
obj = driver.upload_object_via_stream(iterator=iterator,
container=container,
object_name='backup.tar.gz',
extra=extra)
For official documentation on S3 Multipart feature, refer to AWS Official Blog.

Categories

Resources