Deploy multiple functions to Google Cloud Functions in the terminal? - python

To deploy a single function for a single trigger event we can follow the instructions as outlined in the documentation on deploying Google Cloud Functions:
gcloud functions deploy NAME --runtime RUNTIME TRIGGER [FLAGS...]
It takes on average 30s-2m to deploy, which is fine and reasonable.
However, I was wondering if its possible to write a script (e.g. in python) to deploy multiple functions at once?
e.g. :
//somefile.py
gcloud functions deploy function_1 --runtime RUNTIME TRIGGER [FLAGS...]
gcloud functions deploy function_2 --runtime RUNTIME TRIGGER [FLAGS...]

I really like to use the invoke library for problems like this. In particular, it is well suited for running bash commands (e.g. gcloud) within a Python script without mucking about in subprocess.
In your case, you could make a tasks.py file that looks like
from invoke import task
#task
def deploy_cloud_functions(c):
c.run('gcloud functions deploy function_1 --runtime RUNTIME TRIGGER [FLAGS...]')
c.run('gcloud functions deploy function_2 --runtime RUNTIME TRIGGER [FLAGS...]')
and then run it by calling
invoke deploy-cloud-functions
Note that if you name your function deploy_cloud_functions you have to call it using: invoke deploy-cloud-functions (note the -). You can find a list of current available tasks in your directory using invoke --list
You can also parallelize it using the threading library (though I haven't tested using it within invoke myself). It will definitely make for ugly output in the console though. I.e.
from threading import Thread
from invoke import task
#task
def deploy_cloud_functions(c):
Thread(lambda x:
c.run('gcloud functions deploy function_1 --runtime RUNTIME TRIGGER [FLAGS...]')
).start()
Thread(lambda x:
c.run('gcloud functions deploy function_2 --runtime RUNTIME TRIGGER [FLAGS...]')
).start()

If you don't want to just use a python script to do calls to the gcloud command, since it is the same as doing a bash script, you can use the Cloud Functions API Client Library for Python.
What this library does, is create and execute HTTP calls to the Cloud Functions API. You can check Cloud Functions REST reference to see how these calls are structured, and how to build them.
For example, I did a quick example to test this API library, to list the functions running in my project:
import httplib2
import pprint
from googleapiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
credentials = ServiceAccountCredentials.from_json_keyfile_name(
"key.json",
scopes="https://www.googleapis.com/auth/cloud-platform")
http = httplib2.Http()
http = credentials.authorize(http)
service = build("cloudfunctions", "v1", http=http)
operation = service.projects().locations().functions().list(parent='projects/wave16-joan/locations/europe-west1')
pprint.pprint(operation)
You will have to install the modules oauth2client, google-api-python-client and httplib2. As you can see, you will need to create a service account in order to execute the REST API calls, which needs "https://www.googleapis.com/auth/cloud-platform" scopes to create the CF. I created a service account with project/editor permissions myself, which I believe that are the required roles to create CFs.
Finally, to execute this script, you can just do python <script_name>.py
Now, since you want to create multiple functions (see how this API call needs to be structured here), the service to call should be the following, instead:
operation = service.projects().locations().functions().create(
location='projects/wave16-joan/locations/europe-west1',
body={
"name":"...",
"entryPoint":"..."
"httpsTrigger": {
"url":"..."
}
}
)
You will have to populate the body of the request with the some of the parameters listed here. For example, the "name" key should read:
"name":"projects/YOUR_PROJECT/locations/YOUR_PROJECT_LOCATION/functions/FUNCTION_NAME"
As a side note, most of the body parameters listed in the previous documentation are optional, but you will require the name, entryPoint, source, trigger, etc.
Of course this requires more work than creating a bash script, but the result is more portable and reliable, and it will allow you to create multiple operations to deploy multiple functions in the same way.

Related

Cloud Scheduler invokes Cloud Function more than once during schedule

I currently have a Cloud Function that is executing some asynchronous code. It is making a Get request to an Endpoint to retrieve some data and then it storing that data into a Cloud Storage. I have set up the Cloud Function to be triggered using Cloud Scheduler via HTTP. When I use the test option that Cloud Function has, everything works fine, but when I set up Cloud Scheduler to invoke the Cloud Function, it gets invoked more than once. I was able to tell by looking at the logs and it showing multiple execution id's and print statements I have in place. Does anyone know why the Cloud Scheduler is invoking more than once? I have the Max Retry Attempts set to 0. There is a portion in my code where I use asyncio's create_task and sleep in order to put make sure the tasks get put into the event loop to slow down the number of requests and I was wondering if this is causing Cloud Scheduler to do something unexpected?
async with aiohttp.ClientSession(headers=headers) as session:
tasks = []
for i in range(1, total_pages + 1):
tasks.append(asyncio.create_task(self.get_tasks(session=session,page=i)))
await asyncio.sleep(delay_per_request)
For my particular case, when natively testing (using the test option cloud function has built-in) my Cloud Function was performing as expected. However, when I set up Cloud Scheduler to trigger the Cloud Function via a HTTP, it unexpectedly ran more than once. As #EdoAkse mentioned in original thread here my event with Cloud Scheduler was running more than once. My solution was to set up Pub/Sub topic that the Cloud Function subscribes to and that topic will trigger that Cloud Function. The Cloud Scheduler would then invoke that Pub/Sub Trigger. It is essentially how Google describes it in their docs.
Cloud Scheduler -> Pub/Sub Trigger -> Cloud Function
Observed a behavior where cloud functions were being called twice by Cloud Scheduler. Apparently, despite them being located/designated as eu-west1, a duplicate entry in schedule was present for each scheduled function located in us-central1. Removing the duplicated calls in us-central1 resolved my issue.

Invoke AWS SAM local function from another SAM local function

I am trying to create an AWS SAM app with multiple AWS Serverless functions.
The app has 1 template.yaml file which have resource of 2 different serverless lambda functions, for instance "Consumer Lambda" and "Worker Lambda". Consumer gets triggered at a rate of 5 minutes. The consumer uses boto3 library to trigger the worker lambda function. This code works when the worker lambda is deployed on AWS.
But I want to test both the functions locally with Sam local invoke "Consumer" which invokes "Worker" also locally.
Here's a screenshot of the YAML file:
I am using Pycharm to run the project. There is an option to run only 1 function at a time which then creates only one folder in the build folder.
I have to test if Consumer is able to invoke worker locally in pycharm before deployment. I think there is some way to do it but not sure how to. I did some extensive search but didn't yield anything.
Any help is appreciated. Thanks in advance
You can start the lambda invoke endpoint in the following way (official docs):
sam local start-lambda
Now you can point your AWS resource client to port 3001 and trigger the functions locally.
For eg. If you are doing this on Python, it can be acheived in the following way with boto3:
boto3
# Create a lambda client
lambda_client = boto3.client('lambda',
region_name="<localhost>",
endpoint_url="<http://127.0.0.1:3001>",
use_ssl=False,
verify=False)
# Invoke the function
lambda_client.invoke(FunctionName=<function_name>,
Payload=<lambda_payload>)

Best practise to orchestrate small python task (mostly executing SQL in BigQuery)

We are using pubsub and cloud functions in GCP to orchestrate our data workflow.
Our workflow is something like :
workflow_gcp
pubsub1 and pubsub3 can be triggered at different times (ex: 1am and 4am). They are triggered daily, from an external source (our ETL, Talend).
Our cloud functions basically execute SQL in BigQuery.
This is working well but we had to manually create a orchestration database to log when functions start and end (to answer the question "function X executed ok?"). And the orchestration logic is strongly coupled with our business logic, since our cloud function must know what functions has to be executed before, and what pubsub to trigger after.
So we're looking for a solution that separate the orchestration logic and the business logic.
I found that composer (airflow) could be a solution, but :
it can't run cloud function natively (and with API it's very limited, 16 calls par 100 seconds per project)
we can use BigQuery inside airflow with BigQuery operators, but orchestration and business logics would be strongly coupled again
So what is the best practise in our case?
Thanks for your help
You can use Cloud Composer (Airflow) and still reutilise most of your existing set-up.
Firstly, you can keep all your existing Cloud Functions and use HTTP triggers (or others you prefer) to trigger them in Airflow. The only change you will need to do is to implement a PubSub Sensor in Airflow, so it triggers your Cloud Functions (therefore ensuring you can control orchestration from end to end of your process).
Your solution will be an Airflow DAG that triggers the Cloud Functions based on the PubSub messages, reports back to Airflow if the functions were successful and then, if both were successful, trigger the third Cloud Function with an HTTP trigger or similar, just the same.
A final note, which is not immediately intuitive. Airflow is not meant to run the jobs itself, it is meant to orchestrate and manage dependencies. The fact that you use Cloud Functions triggered by Airflow is not an anti-pattern, is actually a best practice.
In your case, you could 100% rewrite a few things and use the BigQuery operators, as you don't do any processing, just triggering of queries/jobs, but the concept stays true, the best practice is leveraging Airflow to make sure things happen when and in the order you need, not to process those things itself. (Hope that made any sense)
As an alternative to airflow I would have looked at "argo workflows" -> https://github.com/argoproj/argo
It doesnt have the cost overhead the composer has, especially for smaller workloads.
I would have:
Created a deployment that read pubsub messages from external tool and deployed this to kubernetes.
Based on message executed a workflow. Each step in the workflow could be a cloud function, packaged in docker.
(I would have replaced the cloud function with a kubernetes job, which is then triggered by the workflow.)
It is pretty straight forward to package a cloud function with docker and run it in kuberentes.
There exists prebuilt docker images with gsutil/bq/gcloud, so you could create bash scripts that uses "bq" command line to execute stuff inside bigquery.

Python HTTP Server to activate Code

I am trying to setup a python script that I can have running all the time and then use a HTTP command to activate an action in the script. So that when I type a command like this into a web browser:
http://localhost:port/open
The script executes a piece of code.
The idea is that I will run this script on a computer on my network and activate the code remotely from elsewhere on the network.
I know this is possible with other programming languages as I've seen it before, but I can't find any documentation on how to do it in python.
Is there an easy way to do this in Python or do I need to look into other languages?
First, you need to select a web framework. I will recommand using Flask, since it is lightweight and really easy to start using it fast.
We begin by initializing your app and setting a route. your_open_func() (in the code below) which is decorated with the #app.route("/open") decorator will be triggered and run when you will send a request to that preticular url (for example http://127.0.0.1:5000/open)
As Flask's website says: flask is fun. The very first example (with minor modifications) from there suits your needs:
from flask import Flask
app = Flask(__name__)
#app.route("/open")
def your_open_func():
# Do your stuff right here.
return 'ok' # Remember to return or a ValueError will be raised.
In order to run your app app.run() is usually enough, but in your case you want other computers on your network to be able to access the app, so you should call the run() method like so: app.run(host="0.0.0.0").
By passing that parameter you are making the server publicly available.

Execute lambda function on local device using greengrass

I am trying to learn AWS greengrass and so I was following this tutorial https://docs.aws.amazon.com/greengrass/latest/developerguide/gg-gs.html which explains step by step on setting up with greengrass on raspberry pi and publishing some messages using a lambda function.
A simple lambda function is as following :
import greengrasssdk
import platform
from threading import Timer
import time
# Creating a greengrass core sdk client
client = greengrasssdk.client('iot-data')
# Retrieving platform information to send from Greengrass Core
my_platform = platform.platform()
def greengrass_hello_world_run():
if not my_platform:
client.publish(topic='hello/world', payload='hello Sent from Greengrass Core.')
else:
client.publish(topic='hello/world', payload='hello Sent from Greengrass Core running on platform: {}'.format(my_platform))
# Asynchronously schedule this function to be run again in 5 seconds
Timer(5, greengrass_hello_world_run).start()
# Execute the function above
greengrass_hello_world_run()
# This is a dummy handler and will not be invoked
# Instead the code above will be executed in an infinite loop for our example
def function_handler(event, context):
return
Here this works but I am trying to understand it better by having a lambda function to do some extra work for example opening a file and writing to it.
I modified the greengrass_hello_world_run() function as following
def greengrass_hello_world_run():
if not my_platform:
client.publish(topic='hello/world', payload='hello Sent from Greengrass Core.')
else:
stdout = "hello from greengrass\n"
with open('/home/pi/log', 'w') as file:
for line in stdout:
file.write(line)
client.publish(topic='hello/world', payload='hello Sent from Greengrass Core running on platform: {}'.format(my_platform))
I expect upon deploying, the daemon running on my local pi should create that file in the given directory coz I believe greengrass core tries to run this lambda function on local device. However it doesnt create any file nor it publish anything coz I believe this code might be breaking. Not sure how though, I tried looking into cloudwatch but I dont see any events or errors being reported.
Any help on this would be really appreciated,
cheers !
A few thoughts on this...
If you turn on the local logging in your GG group set up it will start to write logs locally on your PI. The settings are:
The logs are located in: /greengrass/ggc/var/log/system
If you tail the python_runtime.log you can see any errors from the lambda execution.
If you want to access local resources you will need to create a resource in your GG group definition. You can then give this access to a volume which is where you can write your file.
You do need to deploy your group once this has been done for the changes to take effect.
I think I found the answer, we have to create the resource in lambda environment and also make sure to give read and write access to lambda for accessing that resource. By default lambda can only access /tmp folder.
Here is the link to the documentation
https://docs.aws.amazon.com/greengrass/latest/developerguide/access-local-resources.html

Categories

Resources