how to keep a cloud function running until an event occurs? - python

My google cloud function uses a weather api to return the weather in any input location. I am currently using an http trigger. Instead of calling the http trigger manually to check if the data is less than 0°C, how can I make the cloud function notify me whenever the weather drops below that threshold?

Cloud Functions are for relatively short-lived processes that are spun up in response to a supported event that occurs. Unless you weather API is a supported Cloud Functions event type, there is no way to use Cloud Functions to listen on the API until the temperature drops below zero.
The closest you can get is to run a Cloud Function periodically or use a Cloud Scheduler task to periodically trigger a Cloud Function you write that then calls the weather API, checks the temperature, and performs whatever logic you want in response. And if the weather API supports calling web hooks on temperature changes, you could also use those web hooks to call into Cloud Functions.

Related

Hosting webhook target GCP Cloud Function

I am very new to GCP, my plan is create a webhook target on GCP to listen for events on a thirdparty application, kick off scripts to download files from webhook event and push to JIRA/Github. During my research read alot about cloud functions, but there were also cloud run, app engine and PubSub. Any suggestions on which path to follow?
Thanks!
There are use cases in which Cloud Functions, Cloud Run and App Engine can be used indistinctively (not Pubsub as it is a messaging service). There are however use cases that do not fit some of them properly.
CloudFunctions must be triggered and each execution is (should be) isolated, that implies you can not expect it to keep a connection alive to your third party. Also they have limited time per execution. They tend to be atomic in a way that if you have complex logic between them you must be careful in your design otherwise you will end with a very difficult to manage distributed solution.
App Engine is an application you deploy and it is permanently active, therefore you can mantain a connection to your third party app.
Cloud Run is somewhere in the middle, being triggered when is used but it can share a context and different requests benefit from that (keeping alive connections temporarily or caching, for instance). It also has more capabilities in terms of technologies you can use.
PubSub, as mentioned, is a service where you can send information (fire and forget) and allows you to have one or more listeners on the other side that may be your Cloud Function, App Engine or Cloud Run to process the information and proceed.
BTW consider using Cloud Storage for your files, specially if you expect to be there between different service calls.

Cloud Scheduler invokes Cloud Function more than once during schedule

I currently have a Cloud Function that is executing some asynchronous code. It is making a Get request to an Endpoint to retrieve some data and then it storing that data into a Cloud Storage. I have set up the Cloud Function to be triggered using Cloud Scheduler via HTTP. When I use the test option that Cloud Function has, everything works fine, but when I set up Cloud Scheduler to invoke the Cloud Function, it gets invoked more than once. I was able to tell by looking at the logs and it showing multiple execution id's and print statements I have in place. Does anyone know why the Cloud Scheduler is invoking more than once? I have the Max Retry Attempts set to 0. There is a portion in my code where I use asyncio's create_task and sleep in order to put make sure the tasks get put into the event loop to slow down the number of requests and I was wondering if this is causing Cloud Scheduler to do something unexpected?
async with aiohttp.ClientSession(headers=headers) as session:
tasks = []
for i in range(1, total_pages + 1):
tasks.append(asyncio.create_task(self.get_tasks(session=session,page=i)))
await asyncio.sleep(delay_per_request)
For my particular case, when natively testing (using the test option cloud function has built-in) my Cloud Function was performing as expected. However, when I set up Cloud Scheduler to trigger the Cloud Function via a HTTP, it unexpectedly ran more than once. As #EdoAkse mentioned in original thread here my event with Cloud Scheduler was running more than once. My solution was to set up Pub/Sub topic that the Cloud Function subscribes to and that topic will trigger that Cloud Function. The Cloud Scheduler would then invoke that Pub/Sub Trigger. It is essentially how Google describes it in their docs.
Cloud Scheduler -> Pub/Sub Trigger -> Cloud Function
Observed a behavior where cloud functions were being called twice by Cloud Scheduler. Apparently, despite them being located/designated as eu-west1, a duplicate entry in schedule was present for each scheduled function located in us-central1. Removing the duplicated calls in us-central1 resolved my issue.

Azure functions: Can I implement my architecture and how do I minimize cost?

I am interested in implementing a compute service for an application im working on in the cloud. The idea is there are 3 modules in the service. A compute manager that receives requests (with input data), triggers azure function computes (the computes are the 2nd 'module'). Both modules share same blob storage for the scripts to be run and the input / output data (json) for the compute.
I'm wanting to draw up a basic diagram but need to understand a few things first. Is the thing I described above possible, or must azure functions have their own separate storage. Can azure functions have concurrent executions of same script with different data.
I'm new to Azure so what I've been learning about Azure functions hasn't yet answered my questions. I'm also unsure how to minimise cost. The functions wont run often.
I hope someone could shed some light on this for me :)
Thanks
In fact, Azure function itself has many kinds of triggers. For example: HTTP trigger, Storage trigger, or Service Bus trigger.
So, I think you can use it without your computer manager if there is one inbuilt trigger meets your requirements.
At the same time, all functions can share same storage account. You just need to use the correct storage account connection string.
And, at the end, as your function will not run often, I suggest you use azure function consumption plan. When you're using the Consumption plan, instances of the Azure Functions host are dynamically added and removed based on the number of incoming events.

HttpIO - Consuming external resources in a Dataflow transform

I'm trying to write a custom Source using the Python Dataflow SDK to read JSON data in parallel from a REST endpoint.
E.g. for a given set of IDs, I need to retrieve the data from:
https://foo.com/api/results/1
https://foo.com/api/results/2
...
https://foo.com/api/results/{maxID}
The key features I need are monitoring & rate limiting : even though I need parallelism (either thread/process-based or using async/coroutines), I also need to make sure my job stays "polite" towards the API endpoint - effectively avoiding involuntary DDoS.
Using psq, I should be able to implement some kind of rate-limit mechanism, but then I'd lose the ability to monitor progress & ETA using the Dataflow Service Monitoring
It seems that, although they work well together, monitoring isn't unified between Google Cloud Dataflow and Google Cloud Pub/Sub (which uses Google Stackdriver Monitoring)
How should I go about building a massively-parallel HTTP consumer workflow which implements rate limiting and has web-based monitoring ?
Dataflow does not currently have a built-in way of doing global rate-limiting, but you can use the Source API to do this. The key concept is that each split of a Source will be processed by a single thread at most, so you can implement local rate limiting separately for each split.
This solution does not use Pub/Sub at all, so you can exclusively use the Dataflow Monitoring UI. If you want to set up alerts based on particular events in your pipeline, you could do something like this

How to use AWS EC2 to make a low-latency outputer for AWS kinesis

I'm making a client to put a record for kinesis in the ec2, for testing. I'm only sending records such as {"name":"abc","birthday":"123"}, but it takes 100ms+ to send. I put all services in Singapore. How do I improve it?
Each call to Kinesis API must be committed to 3 availability zones to prevent event lost. You should expect latency of about 50ms in most cases.
If you want to reduce the latency you can batch multiple events into a single call using the PutRecords call (instead of PutRecord). With this API call you can put up to 500 events with a single API call.
Another popular option is to use the Kinesis Producer Library (KPL). It can help with latency (async mode), performance (batching and multithreading), ease of use and cost (aggregation).
Another option is to use an agent installed on your server. The agent is monitoring some log files and can tail them to Kinesis.

Categories

Resources