Access from gcloud ml-engine jobs to Big Query - python

I have a python ML process which connects to BigQuery using a local json file which the env variable GOOGLE_APPLICATION_CREDENTIALS is pointing to (The file contains my keys supplied by google, see authentication getting-started )
When Running it locally its works great.
Im now looking to deploy my model through Google's Ml engine, specifically using the shell command gcloud ml-engine jobs submit training.
However, after i ran my process and looked at the logs in console.cloud.google.com/logs/viewer i saw that gcloud cant access Bigquery and i'm getting the following error:
google.auth.exceptions.DefaultCredentialsError: File:
/Users/yehoshaphatschellekens/Desktop/google_cloud_xgboost/....-.....json was not found.
Currently i don't think that the gcloud ml-engine jobs submit training takes the Json file with it (I thought that gcloud has access automatically to BigQuery, i guess not)
One optional workaround to this is to save my personal .json into my python dependancies in the other sub-package folder (see packaging-trainer) and import it.
Is this solution feasible / safe ?
Is there any other workaround to this issue?

What i did eventually is to upload the json to a gcloud storage bucket and then uploading it into my project each time i launch the ML-engine train process:
os.system('gsutil cp gs://secured_bucket.json .')
os.environ[ "GOOGLE_APPLICATION_CREDENTIALS"] = "......json"

the path should be absolute and with backslashes in Windows:
GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\[FILE_NAME].json"
set it this way in your Python code:
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "C:\PATH.JSON"
Example with the Google Translate API here.

Related

Unable to write files in a GCP bucket using gcsfuse

I have mounted a storage bucket on a VM using the command:
gcsfuse my-bucket /path/to/mount
After this I'm able to read files from the bucket in Python using Pandas, but I'm not able to write files nor create new folders. I have tried with Python and from the terminal using sudo but get the same error.
I have also tried Using the key_file from the bucket:
sudo mount -t gcsfuse -o implicit_dirs,allow_other,uid=1000,gid=1000,key_file=Notebooks/xxxxxxxxxxxxxx10b3464a1aa9.json <BUCKET> <PATH>
It does not through errors when I run the code, but still I'm not able to write in the bucket.
I have also tried:
gcloud auth login
But still have the same issue.
I ran into the same thing a while ago, which was really confusing. You have to set the correct access scope for the virtual machine so that anyone using the VM is able to call the storage API. The documentation shows that the default access scope for storage on a VM is read-only:
When you create a new Compute Engine instance, it is automatically
configured with the following access scopes:
Read-only access to Cloud Storage:
https://www.googleapis.com/auth/devstorage.read_only
All you have to do is change this scope so that you are also able to write to storage buckets from the VM. You can find an overview of different scopes here. To apply the new scope to your VM, you have to first shut it down. Then from your local machine execute the following command:
gcloud compute instances set-scopes INSTANCE_NAME \
--scopes=storage-rw \
--zone=ZONE
You can do the same thing from the portal if you go to the settings of your VM, scroll all the way down, and choose "Set Access for each API". You have the same options when you create the VM for the first time. Below is an example of how you would do this:

How to get list of the Postgres databases on Google Cloud using Django/Python

I intend to get the list of databases that are available PostGreSQL instance on Google Cloud.
With command line gcloud tool the following provides the expected result for my project :
gcloud sql databases list --instance=mysqlinstance
How can I get the same result through python / django while using google cloud ?
This is a good one for kind of anything you do via gcloud and want to do it within a script.
With any gcloud call, you can add the flag --log-http and it will spit out a bunch of useful stuff, including a uri tag which will be the REST API call that gcloud uses to fetch the information.
So in your case, you can run:
gcloud --log-http sql databases list --instance=mysqlinstance and the uri will come back with:
https://sqladmin.googleapis.com/sql/v1beta4/projects/<project-name>/instances/mysqlinstance/databases?alt=json
You can now use that REST call within the Python/Django script you have to fetch the same data. You'll need to handle credentials now of course because your script won't be authorized necessarily the same way gcloud will, unless you're always going to run the script from within your own environment. If you do, then it'll work fine as it will fetch your user credentials from within the same environment as you were running gcloud. But if you want this script to run elsewhere you'll need to manage credentials for it.

Can I Upload a Jar from My Local System using Cloud Dataproc Python API?

With the Google Cloud command line CLI running you can specify a local jar with the --jars flag. However I want to submit a job using the Python API. I have that working but when I specify the jar, if I use the file: prefix, it looks on the Dataproc master cluster rather than on my local workstation.
There is an easy workaround which is to just upload the jar using the GCS library first but I wanted to check if the Dataproc client libraries already supported this convenience feature.
Not at the moment. As you mentioned, the most convenient way to do this strictly using the Python client libraries would be to use the GCS client first and then point to your job file in GCS.

GCP deployment of ai model for NLP text classification

I'm trying to deploy a model on the Google Cloud Platform. But I've been running into same issues I created the bucket and as specified on the docs I ran:
gcloud ai-platform local predict --model-dir gs://bucket/ \
--json-instances input.json \
--framework SCIKIT_LEARN
But for same reason it doesn't find the input file on the same bucket of the model. So I've followed the instructions on another question . I've tried coping the input.json into the main directory but for some other reason is not categorizing the json as a json file...
In reality the model was created using a library called simpletransformers that I've tried to install to test with no sucess.
I wish to know how is the best way to proceed?
input.json:
{ "document":{ "type":"PLAIN_TEXT", "content":"Protection plan costs, half of any delivery fee, and any Extras or young driver fee costs are always refunded."},"encodingType":"UTF8"}
As specified in the documentation, this command:
gcloud ai-platform local predict --model-dir local-or-cloud-storage-path-to-model-directory/ \
--json-instances local-path-to-prediction-input.json \
--framework name-of-framework
Is to test your model with local predictions and it's expecting to find your input.json file in your local machine rather than a GCS bucket. Based on what you've mentioned:
But for same reason it doesn't find the input file on the same bucket of the model
I'm assuming that you're expecting it to read it from GCS bucket. But it should actually be a local directory path; in your case, the command you executed doesn't specify a path so it's expecting to find your input.json file in the same directory where you've executed the command. I've just tried it and it worked fine for me.
I'm not sure what you mean by:
I've tried coping the input.json into the main directory but for some other reason is not categorizing the json as a json file...
But I'm assuming that you're referring to a GCS bucket as well, however, as mentioned before, your input.json file should be a local path rather than a GCS path.

Run a gsutil command in a Google Cloud Function

I would like to run a gsutil command every x minutes as a cloud function. I tried the following:
# main.py
import os
def sync():
line = "gsutil -m rsync -r gs://some_bucket/folder gs://other_bucket/other_folder"
os.system(line)
While the Cloud Function gets triggered, the execution of the line does not work (or i.e. the files are not copied from one bucket to another). However, it does work fine when I run it locally in Pycharm or with cmd. What is the difference with cloud functions?
You can use Cloud Run for this. You have very few change to perform in your code.
Create a container with gsutil installed and python also, for example gcr.io/google.com/cloudsdktool/cloud-sdk as base image
Take care of the service account used when you deploy Cloud Run, grant the correct permission for accessing to your bucket
Let me know if you need more guidance
Cloud Functions server instances don't have gsutil installed. It works on your local machine because you do have it installed and configured there.
I suggest trying to find a way to do what you want with the Cloud Storage SDK for python. Or figure out how to deploy gsutil with your function and figure out how to configure and invoke it from your code, but that might be very difficult.
There's no straightforward option for that.
I think the best for Cloud Functions is to use google-cloud-storage python library

Categories

Resources