download_to_filename makes an empty file (google cloud storage) - python

Answer:
I needed to grant ownership access to the storage policy and not just grant a general API ownership to the IAM connected to my instance.
I have simply followed the tutorial on the website here: https://cloud.google.com/storage/docs/downloading-objects#storage-download-object-python
Which says
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print('Blob {} downloaded to {}.'.format(
source_blob_name,
destination_file_name))
download_blob([BUCKETNAME],[FILENAME],"/home/me/documents/file.png")
I don't receive an error but the last line to be executed is blob.download_to_filename(destination_file_name)
Which creates an empty file.
Additional info:
My bucket is in the format "mybucketname"
my file is in the format "sdg-1234-fggr-34234.png"
I hope anyone has knowledge about my issue.
Is it encoding? Or doesn't the download_to_filename execute? Or something else?

When working with API's it's important to grant ownership access to the storage policy and not just grant a general API ownership to the IAM connected to an instance.

Related

Download Blob From Blob Storage Using Python

I am trying to download an excel file on a blob. However, it keeps generating the error "The specified blob does not exist". This error happens at blob_client.download_blob() although I can get the blob_client. Any idea why or other ways I can connect using managed identity?
default_credential = DefaultAzureCredential()
blob_url = BlobServiceClient('url', credential = default_credential)
container_client = blob_url.get_container_client('xx-xx-data')
blob_client = container_client.get_blob_client('TEST.xlsx')
downloaded_blob = blob_client.download_blob()
df=pd.read_excel(downloaded_blob.content_as_bytes(), sheet_name='Test',skiprows=2)
Turns out that I have to also provide 'Reader' access on top of 'Storage Blob Data Contributor' to be able to identify the blob. There was no need for SAS URL.
The reason you're getting this error is because each request to Azure Blob Storage must be an authenticated request. Only exception to this is when you're reading (downloading) a blob from a public blob container. In all likelihood, the blob container holding this blob is having a Private ACL and since you're sending an unauthenticated request, you're getting this error.
I would recommend using a Shared Access Signature (SAS) URL for the blob with Read permission instead of simple blob URL. Since a SAS URL has authorization information embedded in the URL itself (sig portion), you should be able to download the blob provided SAS is valid and has not expired.
Please see this for more information on Shared Access Signature: https://learn.microsoft.com/en-us/rest/api/storageservices/delegate-access-with-shared-access-signature.

Downloading Files from Google Cloud Storage to Remote Server

Requirement :
I need to execute a script from a remote server which download a particular file from google cloud storage.
Script should use a service account where key should be in hashicorp vault.
i have already a hashicorp vault setup established
but not sure how to invoke it from shell/python script
so if some can help me in how to download file from GCS by using service account key in hashicorp vault
either python/shell script would be fine
you can follow the google documentation to download an objects using:
Console
gsutil
Programming Language Script
curl + rest api
for more information visit this Link
For a python utility you can use this code
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# bucket_name = "your-bucket-name"
# source_blob_name = "storage-object-name"
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
# Construct a client side representation of a blob.
# Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
# any content from Google Cloud Storage. As we don't need additional data,
# using `Bucket.blob` is preferred here.
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print(
"Blob {} downloaded to {}.".format(
source_blob_name, destination_file_name
)
)

Object metadata keys are lowercased when uploading to GCS with Apache Libcloud

I'm using Apache Libcloud to upload files to a Google Cloud Storage bucket together with object metadata.
In the process, the keys in my metadata dict are being lowercased. I'm not sure whether this is due to Cloud Storage or whether this happens in Libcloud.
The issue can be reproduced following the example from the Libcloud docs:
from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver
cls = get_driver(Provider.GOOGLE_STORAGE)
driver = cls('SA-EMAIL', './SA.json') # provide service account credentials here
FILE_PATH = '/home/user/file'
extra = {'meta_data': {'camelCase': 'foo'}}
# Upload with metadata
with open(FILE_PATH, 'rb') as iterator:
obj = driver.upload_object_via_stream(iterator=iterator,
container=container,
object_name='file',
extra=extra)
The file uploads succesfully, but resulting metadata will look as follows:
Where camelCase has been turned into camelcase.
I don't think GCS disallows camelcase for object metadata, since it's possible to edit the metadata manually in that sense:
I went through Libcloud's source code, but I don't see any explicit lowercasing going on. Any pointers on how to upload camelcased metadata with libcloud are most welcome.
I also checked the library and wasn't able to see anything obvious. But I guess to open a new issue there will be a great start.
As far as what's concerned on the Google Cloud Storage side, and as you could verify by yourself it does admit camelcase. I was able to successfully edit the metadata of a file by using the code offered on their public docs (but wasn't able to figure out something on libcloud itself):
from google.cloud import storage
def set_blob_metadata(bucket_name, blob_name):
"""Set a blob's metadata."""
# bucket_name = 'your-bucket-name'
# blob_name = 'your-object-name'
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.get_blob(blob_name)
metadata = {'camelCase': 'foo', 'NaMe': 'TeSt'}
blob.metadata = metadata
blob.patch()
print("The metadata for the blob {} is {}".format(blob.name, blob.metadata))
So, I believe that this could be a good workaround on your case if you are not able to work it out with libcloud. Do notice that the Cloud Storage Client Libraries base their authentication on environment variables and the following docs should be followed.
Addition by question author: As hinted at in the comments, metadata can be added to a blob before uploading a file as follows:
from google.cloud import storage
gcs = storage.Client()
bucket = gcs.get_bucket('my-bucket')
blob = bucket.blob('document')
blob.metadata = {'camelCase': 'foobar'}
blob.upload_from_file(open('/path/to/document', 'rb'))
This allows to set metadata without having to patch an existing blob, and provides an effective workaround for the issue with libcloud.

Video Reading Problem from Google Cloud Bucket

I am trying to deploy my website to Google Cloud. However, I have a problem with video processing. My website takes video from the user and then it shows that video or previously updated videos to the user. I can show the video on the template page. And I also need to process that video in the background. So, I should read the corresponding video with OpenCV. My code works locally. However, in Google Cloud part, the video is stored as a URL and OpenCV cannot read using URL as expected. According to the sources, the solution is to download the video into local file system:
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# bucket_name = "your-bucket-name"
# source_blob_name = "storage-object-name"
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print(
"Blob {} downloaded to {}.".format(
source_blob_name, destination_file_name
)
)
https://cloud.google.com/storage/docs/downloading-objects#code-samples
I have two problems with this code:
1-First, I do not want to download the video into my local computer. I need to keep the video in the Google Cloud and I have to read the video with OpenCV from there.
2-When I try to run the above code, I still get an error because it cannot download the video into the “destination_file_name”.
Could anyone help me with this problem?
Best.
Edit: I solved the problem with the help of answers. Thank you. I download video file to /tmp folder than use with OpenCV. Here is my function:
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# bucket_name = "your-bucket-name"
# source_blob_name = "path/file in Google Cloud Storage bucket"
# destination_file_name = "/tmp/path/to/file(without extension, for example in my case ".mp4")"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_blob_name)
with open(destination_file_name, "wb") as file_obj:
blob.download_to_file(file_obj)
This is a duplicate question. You cannot write to the production server in the cloud. From: https://cloud.google.com/appengine/docs/standard/php/runtime#filesystem
An App Engine application cannot:
write to the filesystem. Applications can use Google Cloud Storage for
storing persistent files. Reading from the filesystem is allowed, and
all application files uploaded with the application are available.
You want to use Google Cloud Storage to upload your videos.
You can write to the /tmp directory temporarily, but that will not persist. But it may work for your need:
# destination_file_name = "/tmp/path/to/file"
This is a duplicate question. You cannot write to the production server in the cloud. From: https://cloud.google.com/appengine/docs/standard/php/runtime#filesystem
An App Engine application cannot:
write to the filesystem. Applications can use Google Cloud Storage for
storing persistent files. Reading from the filesystem is allowed, and
all application files uploaded with the application are available.
You want to use Google Cloud Storage to upload photos. You can write to the /tmp directory temporarily, but that will not persist.

How to read data in cloud storage data using clouds functions

I'm trying to make a cloud function using python, which reads json files containing schemas of tables from a directory in the cloud storage and from these schemas I need to create tables in bigquery.
I had some attempts to access cloud storage, but without success, previously I developed something similar in google colab, reading these schemas from a directory on the drive, but now things seem quite different.
Can someone help me?
You can check the Streaming data from Cloud Storage into BigQuery using Cloud Functions solution guide of GCP.
If you'd like a different approach you can refer to the download object guide at the GCP doc to retrieve the data from GCS, see the sample code below.
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# bucket_name = "your-bucket-name"
# source_blob_name = "storage-object-name"
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print(
"Blob {} downloaded to {}.".format(
source_blob_name, destination_file_name
)
)
You can create Cloud Function and read the download data from the file in Cloud Storage
def loader(event, context):
"""Triggered by a change to a Cloud Storage bucket.
Args:
event (dict): Event payload.
context (google.cloud.functions.Context): Metadata for the event.
"""
try:
file_name = event['name']
bucket_name = event['bucket']
client = storage.Client()
bucket = client.get_bucket(bucket_name)
file_blob = storage.Blob(file_name, bucket)
data = file_blob.download_as_string().decode()
Once you get the data you can create table in BigQuery.

Categories

Resources