Connect to Google Compute Engine instance to run Python script

Connect to Google Compute Engine instance to run Python script - python

I'm very new to cloud computing and I don't come from a software engineering background, so excuse me if some things I say are incorrect.
I'm used to work on an IDE like Spyder and I'd like to keep it that way. Lately, in my organization we're experimenting with Google Cloud and what I'm trying to do is to run a simple script on the cloud instead of on my computer using Google Cloud's APIs.
Say I want to run this on the cloud through Spyder:
x=3
y=2
print(f'your result is {x+y}')
I'm guessing I could do something like:
from googleapiclient import discovery
compute = discovery.build('compute', 'v1')
request = compute.instances().start(project=project, zone=zone, instance=instance)
request.execute()
#Do something to connect to instance
x=3
y=2
print(f'your result is {x+y}')
Is there any way to do this? Or tell python to run script.py? Thanks, and please tell me if I'm not being clear.

You needn't apologize; everyone is new to cloud computing at some point.
I encourage to read around on cloud computing to get more of a feel for what it is and how it compares with your current experience.
The code you included won't work as-is.
There are 2 modes of interaction with Compute Engine which is one of several compute services in Google Cloud Platform.
Fundamentally, interacting with Compute Engine instances is similar to how you'd interact with your laptop. To run the python program, you'd either start Python's REPL or create a script and then run the script through the python interpreter. This is also how this would work on a Compute Engine instance.
You can do this on Linux in a single line:
python -c "x=2; y=3; print(x+y)"
But, first, you have to tell Compute Engine to create you an instance. You may do this using Google Cloud Console (http://console.cloud.google.com), Google Cloud SDK aka "gcloud", or using e.g. Google's Python library for Compute Engine (this is what your code does). Regardless of which of these approaches you use, all of them ultimately make REST calls against Google Cloud to e.g. provision an instance:
from googleapiclient import discovery
compute = discovery.build('compute', 'v1')
request = compute.instances().start(project=PROJECT, zone=ZONE, instance=INSTANCE)
request.execute()
#Do something to connect to instance
Your example ends connect to instance and this marks the transition between provisioning an instance and interacting with it. An alternative to your code above would be to use Google's command-line often called "gcloud", e.g.:
gcloud compute instances create ${INSTANCE} \
--project=${PROJECT} \
--zone=${ZONE}
gcloud provides a convenience command that allows you to use ssh but it takes care of authentication for you:
gcloud compute ssh ${INSTANCE} \
--project=${PROJECT} \
--zone=${ZONE} \
--command='python -c "x=2; y=3; print(x+y)"'
NB This command ssh's into the Compute Engine instance and then runs your Python program.
This is not the best way to achieve this but I hope it shows you one way that you could achieve it.
As you learn about Google Cloud Platform, you'll learn that there are other compute services. These other compute services provide a higher-level of abstraction. Instead of provisioning a virtual machine, you can deploy code directly to e.g. a Python runtime. Google App Engine and Google Cloud Functions both provide a way by which you could deploy your program directly to a compute service without provisioning instances. Because these services operate at a higher-level, you may work from write, test and even deploy code from within an IDE too.
Google Cloud Platform provides a myriad of compute services depending on your requirements. These are accompanied by storage services, machine-learning, analytics, internet-of-things, developer tools etc. etc. It can be overwhelming but you should start with the basics (follow some "hello world" tutorials) and take it from there.
HTH!

Related

On Premise MLOps Pipeline Stack

My motive is to build a MLOps pipeline which is 100% independnt from Cloud service like AWS, GCP and Azure. I have a project for a client in a production factory and would like to build a Camera based Object Tracking ML service for them. I want to build this pipeline in my own server or (on-premise computer). I am really confused with what stacks i should use. I keep ending up with a Cloud component based solution. It would be great to get some advice on what are the components that i can use and preferably open source.

Assuming your main objective is to build a 100% no cloud MLOps pipeline you can do that with mostly open source tech. All of the following can be installed on prem / without cloud services
For Training: You can use whatever you want. I'd recommend Pytorch because it plays nicer with some of the following suggestions, but Tensorflow is also a popular choice.
For CI/CD: if this is going to be on prem and you are going to retrain the model with production data / need to trigger updates to your deployment with each code update you can use Jenkins (open source) or CircleCI (commercial)
For Model Packaging: Chassis (open source) is the only project I am aware of for generically turning AI/ ML model files into something useful that can be run on your intended hardware. It basically takes an AI / ML model file as input and creates a docker image as its output. It's open source and supports Intel, ARM, CPU, and GPU. The website is here: http://www.chassis.ml and the git repo is here: https://github.com/modzy/chassis
For Deployment: Chassis model containers are automatically built with internal gRPC servers that can be deployed locally as docker containers. If you just want to stream a single source of data through them, the SDK has methods for doing that. If you want something that accepts multiple streams or auto scales to available resources on infrastructure you'll need a Kubernetes cluster with a deployment solution like Modzy or KServe. Chassis containers work out of the box with either.
KServe (https://github.com/kserve/kserve) is free, but basically just
gives you a centralized processing platform hosting a bunch of copies
of your running model. It doesn't allow later triage of the model's processing history.
Modzy (https://www.modzy.com/) is commercial, but also adds in all
the RBAC, Job history preservation, auditing, etc. Modzy also has an
edge deployment feature if you want to mange your models centrally, but run them in a distributed manner on the camera hardware instead of on a centralized
server.

As per your requirement for on prem solution, you may go ahead with
Kubeflow ,
Also use the following
default storage class : nfs-provisioner
on prem load balancing : metallb

Can I check if a script is running inside a Compute Engine or in a local environment?

I just wanted to know if there is a way to check whether a Python script is running inside a Compute Engine or in a local environment?
I want to check that in order to know how to authenticate, for example when a script runs on a Compute Engine and I want to initiate a BigQuery client I do not need to authenticate but when it comes to running a script locally I need to authenticate using a service account JSON file.
If I knew whether a script is running locally or in a Compute Engine I would be able to initiate Google services accordingly.
I could put initialization into a try-except statement but maybe there is another way?
Any help is appreciated.

If I understand your question correctly, I think a better solution is provided by Google called Application Default Credentials. See Best practices to securely auth apps in Google Cloud (thanks #sethvargo) and Application Default Credentials
Using this mechanism, authentication becomes consistent regardless of where you run your app (on- or off-GCP). See finding credentials automatically
When you run off-GCP, you set GOOGLE_APPLICATION_CREDENTIALS to point to the Service Account. When you run on-GCP (and, to be clear, you are still authenticating, it's just transparent), you don't set the environment variable because the library obtains the e.g. Compute Engine instance's service account for you.

So I read a bit on the Google Cloud authentication and came up with this solution:
import google.auth
from google.oauth2 import service_account
try:
credentials, project = google.auth.default()
except:
credentials = service_account.Credentials.from_service_account_file('/path/to/service_account_json_file.json')
client = storage.Client(credentials=credentials)
What this does is it tries to retrieve the default Google Cloud credentials (in environments such as Compute Engine) and if it fails it tries to authenticate using a service account JSON file.
It might not be the best solution but it works and I hope it will help someone else too.

Alternative to Cloud Functions for Long Duration Tasks on Google

I have been using Google Cloud Functions (GCF) to setup a serverless environment. This works fine and it covers most of the required functionality that I need.
However, for one specific module, extracting data from FTP servers, the duration of parsing the files from a provider takes longer than 540s. For this reason, the task that I execute gets timed out when deploying it as a cloud function.
In addition, some FTP servers require that they whitelist an ip address that is making these requests. When using Cloud functions, unless you reserve somehow a static address or a range, this is not possible.
I am therefore looking for an alternative solution to execute a Python script in the cloud on the Google platform. The requirements are:
It needs to support Python 3.7
It has to have the possibility to associate a static IP address to it
One execution should be able to take longer than 540s
Ideally, it should be possible to easily deploy the script (as it is the case with GCF)
What is the best option out there for these kind of needs?

The notion of a Cloud Function is primarily that of a Microservice ... something that runs for a relatively short period of time. In your story, we seem to have actions that can run for an extended period of time. This would seem to lend itself to the notion of running some form of compute engine. The two that immediately come to mind are Google Compute Engine (CE) and Google Kubernetes Engine (GKE). Let us think about the Compute Engine. Think of this as a Linux VM where you have 100% control over it. This needn't be a heavyweight thing ... Google provides micro compute engines which are pretty darn tiny. You can have one or more of these including the ability to dynamically scale out the number of instances if load on the set becomes too high. On your compute engine, you can create any environment you wish ... including installing a Python environment and running Flask (or other) to process incoming requests. You can associate your compute engine with a static IP address or associate a static IP address with a load balancer front-ending your engines.

Here is how I download files from FTP with Google Cloud Functions to Google Cloud Storage. It takes less than 30 secs (depending on the file size).
#import libraries
from google.cloud import storage
import wget
def importFile(request):
#set storage client
client = storage.Client()
# get bucket
bucket = client.get_bucket('BUCKET-NAME') #without gs://
blob = bucket.blob('file-name.csv')
#See if file already exists
if blob.exists() == False:
#copy file to google storage
try:
link = 'ftp://account:password#ftp.domain.com/folder/file.csv' #for non-public ftp files
ftpfile = wget.download(link, out='/tmp/destination-file-name.csv') #save downloaded file in /tmp folder of Cloud Functions
blob.upload_from_filename(ftpfile)
print('Copied file to Google Storage!')
#print error if file doesn't exists
except BaseException as error:
print('An exception occurred: {}'.format(error))
#print error if file already exists in Google Storage
else:
print('File already exists in Google Storage')

setting up a python http function in firestore

I have an app that is meant to integrate with third-party apps. These apps should be able to trigger a function when data changes.
The way I was envisioning this, I would use a node function to safely prepare data for the third parties, and get the url to call from the app's configuration on firestore. I would call that url from the node function, and wait for it to return, updating results as necessary (actually, triggering a push notification). -- these third-party functions would tend to be python functions, so my demo should be in python.
I have the initial node function and firestore setup so that I am currently triggering a ECONNREFUSED -- because I don't know how to set up the third-party function.
Let's say this is the function I need to trigger:
def hello_world(request):
request_json = request.get_json()
if request_json and 'name' in request_json:
name = request_json['name']
else:
name = 'World'
return 'Hello, {}!\n'.format(name)
Do I need to set up a separate gcloud account to host this function, or can I include it in my firestore functions? If so, how do I deploy this to firestore? Typically with my node functions, I am running firebase deploy and it automagically finds my functions from my index.js file.

If you're asking whether Cloud Functions that are triggered by Cloud Firestore can co-exist in a project with Cloud Functions that are triggered by HTTP(S) requests, then the answer is "yes they can". There is no need to set up a separate (Firebase or Cloud) project for each function type.
However: when you deploy your Cloud Functions through the Firebase CLI with firebase deploy, it will remove any functions that it finds in the project, that are not in the code. If you have functions both in Python and in Node.js, there is never a single codebase that contains both, so a blanket deploy would always delete some of your functions. So in that case, you should use the granular deploy option of the Firebase CLI.

Discovering peer instances in Azure Virtual Machine Scale Set

Problem: Given N instances launched as part of VMSS, I would like my application code on each azure instance to discover the IP address of the other peer instances. How do I do this?
The overall intent is to cluster the instances so, as to provide active passive HA or keep the configuration in sync.
Seems like there is some support for REST API based querying : https://learn.microsoft.com/en-us/rest/api/virtualmachinescalesets/
Would like to know any other way to do it, i.e. either python SDK or instance meta data URL etc.

The RestAPI you mentioned has a Python SDK, the "azure-mgmt-compute" client
https://learn.microsoft.com/python/api/azure.mgmt.compute.compute.computemanagementclient

One way to do this would be to use instance metadata. Right now instance metadata only shows information about the VM it's running on, e.g.
curl -H Metadata:true "http://169.254.169.254/metadata/instance/compute?api-version=2017-03-01"
{"compute":
{"location":"westcentralus","name":"imdsvmss_0","offer":"UbuntuServer","osType":"Linux","platformFaultDomain":"0","platformUpdateDomain":"0",
"publisher":"Canonical","sku":"16.04-LTS","version":"16.04.201703300","vmId":"e850e4fa-0fcf-423b-9aed-6095228c0bfc","vmSize":"Standard_D1_V2"},
"network":{"interface":[{"ipv4":{"ipaddress":[{"ipaddress":"10.0.0.4","publicip":"52.161.25.104"}],"subnet":[{"address":"10.0.0.0","dnsservers":[],"prefix":"24"}]},
"ipv6":{"ipaddress":[]},"mac":"000D3AF8BECE"}]}}
You could do something like have each VM send the info to a listener on VM#0, or to an external service, or you could combine this with Azure Files, and have each VM output to a common share. There's an Azure template proof of concept here which outputs information from each VM to an Azure File share.. https://github.com/Azure/azure-quickstart-templates/tree/master/201-vmss-azure-files-linux - every VM has a mountpoint which contains info written by every VM.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.