Write file from GCP Cloud Function to Bucket - python

I am having a very difficult time simply writing a file from a Cloud Function to a Bucket.
I am using this Medium post: https://medium.com/#Tim_Ebbers/import-a-file-to-gcp-cloud-storage-using-cloud-functions-9cf81db353dc
This is the code of the Cloud Function:
#Create function that is triggered by http request
def importFile(request):
#import libraries
from google.cloud import storage
from urllib import request
#set storage client
client2 = storage.Client()
# get bucket
bucket = client2.get_bucket('YOUR-TEST-BUCKET') #without gs://
blob = bucket.blob('animals-1.json')
#See if json exists
if blob.exists() == False :
#copy file to google storage
try:
ftpfile = request.urlopen('https://raw.githubusercontent.com/LearnWebCode/json-example/master/animals-1.json')
#for non public ftp file: ftpfile = request.urlopen('ftp://account:password#ftp.domain.com/folder/file.json')
blob.upload_from_file(ftpfile)
print('copied animals-1.json to google storage')
#print error if file doesn't exists
except:
print('animals-1.json does not exist')
#print error if file already exists in google storage
else:
print('file already exists in google storage')
The function deploys successfully. When I go to "Test" it, I get the very unhelpful error:
Error: function terminated. Recommended action: inspect logs for termination reason. Additional troubleshooting documentation can be found at https://cloud.google.com/functions/docs/troubleshooting#logging Details:
500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
and
Logs: Not Available
What am I doing wrong here?

Related

Is there a temporary directory or direct way to upload a file in azure storage?

so I try to make a python API so the user can upload a pdf file then the API directly sends it to Azure storage. what I found is I must have a directory i.e.
container_client = ContainerClient.from_connection_string(conn_str=conn_str,container_name='mycontainer')
with open('mylocalpath/myfile.pdf',"rb") as data:
container_client.upload_blob(name='myblockblob.pdf', data=data)
another solution is I have to store it on VM and then replace the local path to it, but I don't want to make my VM full.
If you want to upload it directly from the client-side to azure storage blob instead of receiving that file to your API you can use azure shared access signature inside your storage account and from your API you can make a function to generate Pre-Signed URL using that shared access signature service and return that URL to your client it will allow the client to upload file to your blob via that URL.
To generate URL can you follow the below code:
from datetime import datetime, timedelta
from azure.storage.blob import generate_blob_sas, BlobSasPermissions
blobname= "<blobname>"
accountkey="<accountkey>" #get this from access key section in azure storage.
containername = "<containername>"
def getpushurl(filename):
token = generate_blob_sas(
account_name=blobname,
container_name=containername,
account_key=accountkey,
permission=BlobSasPermissions(write=True),
expiry=datetime.utcnow() + timedelta(seconds=100),
blob_name=filename,
)
url = f"https://{blobname}.blob.core.windows.net/{containername}/{filename}?{token}"
return url
pdfpushurl = getpushurl("demo.text")
print(pdfpushurl)
So after generating this URL give it to the client so client could directly send the file to the URL received with PUT request and it will get uploaded directly to azure storage.
You can generate a SAS token with write permission for your users so that your users could upload .pdf files directly on their side without storing them on the server. For details, pls see my previous post here.
Try the code below to generate a SAS token with container write permission:
from azure.storage.blob import BlobServiceClient,ContainerSasPermissions,generate_container_sas
from datetime import datetime, timedelta
storage_connection_string=''
container_name = ''
block_blob_service = BlobServiceClient.from_connection_string(storage_connection_string)
container_client = block_blob_service.get_container_client(container_name)
sasToken = generate_container_sas(account_name=container_client.account_name,
container_name=container_client.container_name,
account_key= container_client.credential.account_key,
#grant write permission only
permission=ContainerSasPermissions(write=True),
start=datetime.utcnow() - timedelta(minutes=1),
#1 hour vaild time
expiry=datetime.utcnow() + timedelta(hours=1)
)
print(sasToken)
After you have replied to this SAS token to your user, just see this official guide to upload files from a HTML page, I think it would be helpful if you are developing a web app.

Trying to authenticate google cloud platform (GCP) to use speech to text APi

I am trying to use the google cloud platform (GCP) for the speech to text API in python but for some reason I can't seem to get access to the GCP to use the API. How do I authenticate my credentials?
I have tried to follow the instructions provided by google to authenticate my credentials but I am just so lost as nothing seems to be working.
I have created a GCP project, set-up billing information, enabled API and created service account without any problems.
I have tried to set my environment using command line to set GOOGLE_APPLICATION_CREDENTIALS=[PATH]
and then run the following code which has been taken straight from the google tutorial page:
def transcribe_streaming(stream_file):
"""Streams transcription of the given audio file."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
with io.open(stream_file, 'rb') as audio_file:
content = audio_file.read()
# In practice, stream should be a generator yielding chunks of audio data.
stream = [content]
requests = (types.StreamingRecognizeRequest(audio_content=chunk)
for chunk in stream)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US')
streaming_config = types.StreamingRecognitionConfig(config=config)
# streaming_recognize returns a generator.
responses = client.streaming_recognize(streaming_config, requests)
for response in responses:
# Once the transcription has settled, the first result will contain the
# is_final result. The other results will be for subsequent portions of
# the audio.
for result in response.results:
print('Finished: {}'.format(result.is_final))
print('Stability: {}'.format(result.stability))
alternatives = result.alternatives
# The alternatives are ordered from most likely to least.
for alternative in alternatives:
print('Confidence: {}'.format(alternative.confidence))
print(u'Transcript: {}'.format(alternative.transcript))
I get the following error message:
DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
You can also set credentials directly in your script
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file("/path/to/your/crendentials.json")
client = speech.SpeechClient(credentials=credentials)

Error while Downloading file to my local device from S3

I am trying to download a file from Amazon S3 bucket to my local device using the below code but I got an error saying "Unable to locate credentials"
Given below is the code I have written:
import boto3
import botocore
BUCKET_NAME = 'my-bucket'
KEY = 'my_image_in_s3.jpg'
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
Could anyone help me on this. Thanks in advance.
AWS use a shared credentials system for AWS CLI and all other AWS SDKs this way there is no risk of leaking your AWS credentials to some code repository, AWS security practices recommend to use a shared credentials file which is located usually on linux
~/.aws/credentials
this file contains an access key and secret key which is used by all sdk and aws cli the file the file can be created manually or automatically using this command
aws configure
it will ask few questions and create the credentials file for you, note that you need to create a user with appropiate permissions before accessing aws resources.
For more information click on the link below -:
AWS cli configuration
You are not using the session you created to download the file, you're using s3 client you created. If you want to use the client you need to specify credentials.
your_bucket.download_file('k.png', '/Users/username/Desktop/k.png')
or
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
s3.download_file('your_bucket','k.png','/Users/username/Desktop/k.png')

Uploading large files to Google Storage GCE from a Kubernetes pod

We get this error when uploading a large file (more than 10Mb but less than 100Mb):
403 POST https://www.googleapis.com/upload/storage/v1/b/dm-scrapes/o?uploadType=resumable: ('Response headers must contain header', 'location')
Or this error when the file is more than 5Mb
403 POST https://www.googleapis.com/upload/storage/v1/b/dm-scrapes/o?uploadType=multipart: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>)
It seems that this API is looking at the file size and trying to upload it via multi part or resumable method. I can't imagine that is something that as a caller of this API I should be concerned with. Is the problem somehow related to permissions? Does the bucket need special permission do it can accept multipart or resumable upload.
from google.cloud import storage
try:
client = storage.Client()
bucket = client.get_bucket('my-bucket')
blob = bucket.blob('blob-name')
blob.upload_from_filename(zip_path, content_type='application/gzip')
except Exception as e:
print(f'Error in uploading {zip_path}')
print(e)
We run this inside a Kubernetes pod so the permissions get picked up by storage.Client() call automatically.
We already tried these:
Can't upload with gsutil because the container is Python 3 and gsutil does not run in python 3.
Tried this example: but runs into the same error: ('Response headers must contain header', 'location')
There is also this library. But it is basically alpha quality with little activity and no commits for a year.
Upgraded to google-cloud-storage==1.13.0
Thanks in advance
The problem was indeed the credentials. Somehow the error message was very miss-leading. When we loaded the credentials explicitly the problem went away.
# Explicitly use service account credentials by specifying the private key file.
storage_client = storage.Client.from_service_account_json(
'service_account.json')
I found my node pools had been spec'd with
oauthScopes:
- https://www.googleapis.com/auth/devstorage.read_only
and changing it to
oauthScopes:
- https://www.googleapis.com/auth/devstorage.full_control
fixed the error. As described in this issue the problem is an uninformative error message.

Google cloud storage client with remote api on local client

I'm trying to use the google remote api on my app engine project to upload local files to my app's default cloud storage bucket.
I have configured my app.yaml to have remote api on. I'm able to access my bucket and upload/access files from it. I run my local python console and try to write to the bucket with the following code:
from google.appengine.ext.remote_api import remote_api_stub
from google.appengine.api import app_identity
import cloudstorage
def auth_func():
return ('user#gmail.com', '*******')
remote_api_stub.ConfigureRemoteApi('my-app-id', '/_ah/remote_api', auth_func,'my-app-id.appspot.com')
filename = "/"+app_identity.get_default_gcs_bucket_name()+ "/myfile.txt"
gcs_file = cloudstorage.open(filename,'w',content_type='text/plain',options={'x-goog-meta-foo': 'foo','x-goog-meta-bar': 'bar'})
I see the following reponse:
WARNING:root:suspended generator urlfetch(context.py:1214) raised DownloadError(Unable to fetch URL: http://None/_ah/gcs/my-app-id.appspot.com/myfile.txt)
Notice the
http://None/_ah/gcs.....
I don't think None should be part of the url. Is there an issue with the GoogleAppEngineCloudStorageClient, v1.9.0.0? I'm also using Google App Engine 1.9.1.
Any ideas?
Google Cloud Storage client does not respect remote_api_stub and considers you are running script locally
os.environ['SERVER_SOFTWARE'] = 'Development (remote_api)/1.0'
or even
os.environ['SERVER_SOFTWARE'] = ''
will help.
The function, checking your environment from common.py
def local_run():
"""Whether we should hit GCS dev appserver stub."""
server_software = os.environ.get('SERVER_SOFTWARE')
if server_software is None:
return True
if 'remote_api' in server_software:
return False
if server_software.startswith(('Development', 'testutil')):
return True
return False
If I understand correctly, you want to upload a local text file to a specific bucket. I do not think what you're doing will work.
The alternative would be to ditch the RemoteAPI and upload it using the Cloud Storage API.

Categories

Resources