Hosting webhook target GCP Cloud Function - python

I am very new to GCP, my plan is create a webhook target on GCP to listen for events on a thirdparty application, kick off scripts to download files from webhook event and push to JIRA/Github. During my research read alot about cloud functions, but there were also cloud run, app engine and PubSub. Any suggestions on which path to follow?
Thanks!

There are use cases in which Cloud Functions, Cloud Run and App Engine can be used indistinctively (not Pubsub as it is a messaging service). There are however use cases that do not fit some of them properly.
CloudFunctions must be triggered and each execution is (should be) isolated, that implies you can not expect it to keep a connection alive to your third party. Also they have limited time per execution. They tend to be atomic in a way that if you have complex logic between them you must be careful in your design otherwise you will end with a very difficult to manage distributed solution.
App Engine is an application you deploy and it is permanently active, therefore you can mantain a connection to your third party app.
Cloud Run is somewhere in the middle, being triggered when is used but it can share a context and different requests benefit from that (keeping alive connections temporarily or caching, for instance). It also has more capabilities in terms of technologies you can use.
PubSub, as mentioned, is a service where you can send information (fire and forget) and allows you to have one or more listeners on the other side that may be your Cloud Function, App Engine or Cloud Run to process the information and proceed.
BTW consider using Cloud Storage for your files, specially if you expect to be there between different service calls.

Related

Run & scale simple python scripts on Google Cloud Platform

I have a simple python script that I would like to run thousands of it's instances on GCP (at the same time). This script is triggered by the $Universe scheduler, something like "python main.py --date '2022_01'".
What architecture and technology I have to use to achieve this.
PS: I cannot drop $Universe but I'm not against suggestions to use another technologies.
My solution:
I already have a $Universe server running all the time.
Create Pub/Sub topic
Create permanent Compute Engine that listen to Pub/Sub all the time
$Universe send thousand of events to Pub/Sub
Compute engine trigger the creation of a Python Docker Image on another Compute Engine
Scale the creation of the Docker images (I don't know how to do it)
Is it a good architecture?
How to scale this kind of process?
Thank you :)
It might be very difficult to discuss architecture and design questions, as they usually are heavy dependent on the context, scope, functional and non functional requirements, cost, available skills and knowledge and so on...
Personally I would prefer to stay with entirely server-less approach if possible.
For example, use a Cloud Scheduler (server less cron jobs), which sends messages to a Pub/Sub topic, on the other side of which there is a Cloud Function (or something else), which is triggered by the message.
Should it be a Cloud Function, or something else, what and how should it do - depends on you case.
As I understand, you will have a lot of simultaneous call on a custom python code trigger by an orchestrator ($Universe) and you want it on GCP platform.
Like #al-dann, I would go to serverless approach in order to reduce the cost.
As I also understand, pub sub seems to be not necessary, you will could easily trigger the function from any HTTP call and will avoid Pub Sub.
PubSub is necessary only to have some guarantee (at least once processing), but you can have the same behaviour if the $Universe validate the http request for every call (look at http response code & body and retry if not match the expected result).
If you want to have exactly once processing, you will need more tooling, you are close to event streaming (that could be a good use case as I also understand). In that case in a full GCP, I will go to pub / sub & Dataflow that can guarantee exactly once, or Kafka & Kafka Streams or Flink.
If at least once processing is fine for you, I will go http version that will be simple to maintain I think. You will have 3 serverless options for that case :
App engine standard: scale to 0, pay for the cpu usage, can be more affordable than below function if the request is constrain to short period (few hours per day since the same hardware will process many request)
Cloud Function: you will pay per request(+ cpu, memory, network, ...) and don't have to think anything else than code but the code executed is constrain on a proprietary solution.
Cloud run: my prefered one since it's the same pricing than cloud function but you gain the portability, the application is a simple docker image that you can move easily (to kubernetes, compute engine, ...) and change the execution engine depending on cost (if the load change between the study and real world).

Deliver a message from google cloud functions to virtual machine

currently I automatically start a VM after running a cloud function via this code:
def start_vm(context, event):
compute = googleapiclient.discovery.build('compute', 'v1')
result = compute.instances().start(project='PROJECT', zone='ZONE', instance='NAME').execute()
Now I am looking for a way to deliver a message or a parameter at the same time. After the VM starts and based on the added message/parameter, a different code runs. Does anyone know how to achieve this?
Appreciate every help.
Thank you.
You can use the Guest attributes. The Cloud Functions add the guest attribute and then run the VM.
In the startup script, you read the data in the guest attributes and then you use them to perform stuff.
The other solution is to start a webserver in the VM and then to POST a request to this webserver.
This solution is better is you have several task to perform on the VM. But, take care of the security is you expose a webserver. Expose it only internally and use a VPC connector on your Cloud Function to reach your VM.

Best practise to orchestrate small python task (mostly executing SQL in BigQuery)

We are using pubsub and cloud functions in GCP to orchestrate our data workflow.
Our workflow is something like :
workflow_gcp
pubsub1 and pubsub3 can be triggered at different times (ex: 1am and 4am). They are triggered daily, from an external source (our ETL, Talend).
Our cloud functions basically execute SQL in BigQuery.
This is working well but we had to manually create a orchestration database to log when functions start and end (to answer the question "function X executed ok?"). And the orchestration logic is strongly coupled with our business logic, since our cloud function must know what functions has to be executed before, and what pubsub to trigger after.
So we're looking for a solution that separate the orchestration logic and the business logic.
I found that composer (airflow) could be a solution, but :
it can't run cloud function natively (and with API it's very limited, 16 calls par 100 seconds per project)
we can use BigQuery inside airflow with BigQuery operators, but orchestration and business logics would be strongly coupled again
So what is the best practise in our case?
Thanks for your help
You can use Cloud Composer (Airflow) and still reutilise most of your existing set-up.
Firstly, you can keep all your existing Cloud Functions and use HTTP triggers (or others you prefer) to trigger them in Airflow. The only change you will need to do is to implement a PubSub Sensor in Airflow, so it triggers your Cloud Functions (therefore ensuring you can control orchestration from end to end of your process).
Your solution will be an Airflow DAG that triggers the Cloud Functions based on the PubSub messages, reports back to Airflow if the functions were successful and then, if both were successful, trigger the third Cloud Function with an HTTP trigger or similar, just the same.
A final note, which is not immediately intuitive. Airflow is not meant to run the jobs itself, it is meant to orchestrate and manage dependencies. The fact that you use Cloud Functions triggered by Airflow is not an anti-pattern, is actually a best practice.
In your case, you could 100% rewrite a few things and use the BigQuery operators, as you don't do any processing, just triggering of queries/jobs, but the concept stays true, the best practice is leveraging Airflow to make sure things happen when and in the order you need, not to process those things itself. (Hope that made any sense)
As an alternative to airflow I would have looked at "argo workflows" -> https://github.com/argoproj/argo
It doesnt have the cost overhead the composer has, especially for smaller workloads.
I would have:
Created a deployment that read pubsub messages from external tool and deployed this to kubernetes.
Based on message executed a workflow. Each step in the workflow could be a cloud function, packaged in docker.
(I would have replaced the cloud function with a kubernetes job, which is then triggered by the workflow.)
It is pretty straight forward to package a cloud function with docker and run it in kuberentes.
There exists prebuilt docker images with gsutil/bq/gcloud, so you could create bash scripts that uses "bq" command line to execute stuff inside bigquery.

Azure functions: Can I implement my architecture and how do I minimize cost?

I am interested in implementing a compute service for an application im working on in the cloud. The idea is there are 3 modules in the service. A compute manager that receives requests (with input data), triggers azure function computes (the computes are the 2nd 'module'). Both modules share same blob storage for the scripts to be run and the input / output data (json) for the compute.
I'm wanting to draw up a basic diagram but need to understand a few things first. Is the thing I described above possible, or must azure functions have their own separate storage. Can azure functions have concurrent executions of same script with different data.
I'm new to Azure so what I've been learning about Azure functions hasn't yet answered my questions. I'm also unsure how to minimise cost. The functions wont run often.
I hope someone could shed some light on this for me :)
Thanks
In fact, Azure function itself has many kinds of triggers. For example: HTTP trigger, Storage trigger, or Service Bus trigger.
So, I think you can use it without your computer manager if there is one inbuilt trigger meets your requirements.
At the same time, all functions can share same storage account. You just need to use the correct storage account connection string.
And, at the end, as your function will not run often, I suggest you use azure function consumption plan. When you're using the Consumption plan, instances of the Azure Functions host are dynamically added and removed based on the number of incoming events.

Azure infrastructure for a Python script triggered by a http request

I'm a bit lost in the jungle of documentation, offers, and services. I'm wondering how the infrastructure should look like, and it would be very helpful to get a nudge in the right direction.
We have a python script with pytorch that runs a prediction. The script has to be triggered from a http request. Preferably, the samples to do a prediction on also has to come from the same requester. It has to return the prediction as fast as possible.
What is the best / easiest / fastest way of doing this?
We have the script laying in a Container Registry for now. Can we use it? Azure Kubernetes Service? Azure Container Instances (is this fast enough)?
And about the trigger, should we use Azure function, or logic app?
Thank you!
Azure Functions V2 has just launched a private preview for writing Functions using Python. You can find some instructions for how to play around with it here. This would probably be one of the most simple ways to execute this script with an HTTP request. Note that since it is in private preview, I would hesitate to recommend using it in a production scenario.
Another caveat to note with Azure Functions is that there will be a cold start whenever we create a new instance of your function application. This should be in the order of magnitude of ~2-4 seconds, and should only happen on the first request after the application has not seen much traffic for a while, or if a new instance has been created to scale up your application to receive more traffic. You can avoid this cold start by making your function on a dedicated App Service Plan, but at that point you are losing a lot of the benefits of Azure Functions.

Categories

Resources