How to trigger async job from a Cloud Function - python

I have a Cloud Function (Python) which is triggered by http from web client, it has to calculate something and respond FAST. I would like to save the http request parameters into a database (for analytics).
If i just initiate a WRITE to my postgresql, the function will have to wait for it, and it will be slower.
Using PubSub, the Function also need to publish and wait for respond (Docs example) :
# Publishes a message
try:
publish_future = publisher.publish(topic_path, data=message_bytes)
publish_future.result() # Verify the publish succeeded
return 'Message published.'
Google does not offer a solution which trigger in the background automatically a pubSub when a http function is called.
How can I take information from my function and save it to my DB (any DB) without affecting the function execution at all.
def cloud_func(request):
request_json = request.get_json()
//save to db async without waiting for respond
//calculate my stuff..
return (result, 200, headers) //to client

If you use a Cloud Functions trigger in HTTP, I recommend you to migrate to Cloud Run, and to activate the always on CPU parameter.
That parameter is designed exactly for that: continue a process in background even outside a request handling context
EDIT 1
You can also imagine an async mechanism by staying on Cloud Functions. The sync function get the data from the user, publish a message in PubSub and answer to the user.
The PubSub message publication is very fast, and won't take too much time.
Then write another function, that listen the PubSub topic and save the message data into your database. Because that function is async, you are not time constraints.

Related

Best way to handle webhook response timeouts in Flask?

I have a Flask application running on a Google Cloud Function that receives a Webhook from Shopify when an order is created. The problem is I'm timing out very often, here's what I mean by that:
#app.route('/', methods=['POST'])
def connectToSheets(request):
print('Webhook received...')
# Verify request is coming from Shopify
data = request.data
hmac_header = request.headers.get('X-Shopify-Hmac-SHA256')
verify_webhook(data, hmac_header)
print('Request validated...')
# Do some stuff...
Shopify's docs states that there is a 5 sec timeout period and a retry period for subscriptions. After I validate the request, there is quite a lot of code so I'm timing out almost every time.
Is there a way I can send a 200 status code to Shopify after I validate the Webhook and before I start processing the Webhook? Or is there a work-around to this?
One way to do this entirely w/in Cloud Functions is to set up two functions:
one that handles the initial request
a second one that does the processing and then follows up with the response
In addition to handling the initial request, the first function also invokes the second function via Cloud Pub/Sub.
See https://dev.to/googlecloud/getting-around-api-timeouts-with-cloud-functions-and-cloud-pub-sub-47o3 for a complete example (this uses Slack's webhook, but the behavior should be similar).
I used to face the same issue as yours. So, we moved the processing code from being executed inline to be executed in a background task by using celery and rabbitMq. RabbitMq was used for queue management. You can use Redis for queue management also.
Celery - https://docs.celeryproject.org/en/stable/getting-started/index.html
RabbitMq - https://www.rabbitmq.com/documentation.html
Asynchronous Tasks Using Flask, Redis, and Celery - https://stackabuse.com/asynchronous-tasks-using-flask-redis-and-celery/
How to Set Up a Task Queue with Celery and RabbitMQ - https://www.linode.com/docs/development/python/task-queue-celery-rabbitmq/

Interupt backend function and wait for frontend response

I'm looking for a good solution to implement a handshake between a python backend server and a react frontend connected through a websocket.
The frontend allows the user to upload a file and then send it to the backend for processing. Now it might be possible that the processing encounters some issues and likes the user's confirmation to proceed or cancel - and that's where I'm stuck.
My current implementation has different "endpoints" in the backend which call then different function implementations and a queue which is continuously processed and content (messages) is sent to the frontend. But these are always complete actions, they either succeed or fail and the returned message is accordingly. I have no system in place to interupt a running task (e.g. file processing), send a request to the frontend and then wait for response before I continue the function.
Is there a design pattern or common approach for this kind of problem?
How long it takes to process? Maybe a good solution is set up a message broker like RabbitMQ and create a queue for this process. In the front-end you have to create a panel to see the state of the process, which is running in an async task, and if it has found some issues, let the user know and ask what to do.

Is there a way to execute non-blocking load_job from BigQuery Python Client library?

I have a Flask API that uses Flask_restful, Flask_CORS and Marshmallow. The API does some work to get a *.csv file into Cloud Storage (using signedURL's), confirms that it has uploaded, and then creates and executes a load job to transfer the csv from Storage to BigQuery.
The part of the API that is exacerbating my hair loss is the call to execute a load job in GCP that loads a csv file to BigQuery. Here is a snippet of the code:
...
dataset_ref = bq_client.dataset(target_dataset)
job_config.schema = bq_schema
job_config.source_format = SOURCE_FORMAT
job_config.field_delimiter = DELIM
job_config.destination_table_description = TARGET_TABLE
job_config.encoding = ENCODING
job_config.max_bad_records = MAX_BAD_RECORDS
job_config.autodetect = False # Do not autodetect schema
load_job = bq_client.load_table_from_uri(
uri, dataset_ref.table(target_table), job_config=job_config
) # API request
load_job.result() # **<-- This is the concern**
return {"message": "Successfully uploaded to Bigquery"}, 200
The file can take some time to transfer, and my concern is that during periods where there is some latency, the webserver will timeout whilst waiting for the transfer to take place. I would much prefer to have load_job.result() execute, get the job ID and return a 201 response. Then I can use the job ID to poll GCP to determine whether it was successful or not, rather than have there be a risk of the request timing out for the client-side front-end and leave the user confused as to whether it succeeded or not.
I understand that load_job.result() is async, but with Flask that doesn't help. I was going to change over to Quart to use async/await but my other dependencies are not supported and therefore I will have a lot of refactoring to do. Is there another way that anyone has used to approach this type of problem?
Cheers
Quart solves nothing. Indeed, Quart still needs to a running environment, it waits and oversees the blocking function and calls you callback at the end. Your function must still running for performing this.
There is a better design for this. I recommend you to have a look to Cloud Task. The process is the following:
Run your load job
Create your task with the load job ID in parameter
Exit the function
Task will trigger another function that will check if the job is over
If not yet finish, return an error code (different than 2XX).
If finish, return a OK return code (2XX)
You have to set up your Cloud Task with retry policy to not retry immediately (for example set the min-backoff to 30s)

Progress bar in Google App Engine

I have a Google App Engine application that performs about 30-50 calls to a remote API. Each call takes about a second, so the whole operation can easily take a minute. Currently, I do this in a loop inside the post() function of my site, so the response isn't printed until the whole operation completes. Needless to say, the app isn't very usable at the moment.
What I would like to do is to print the response immediately after the operation is started, and then update it as each individual API call completes. How would I achieve this? On a desktop application, I would just kick off a worker thread that would periodically update the front-end. Is there a similar mechanism in the Google App Engine?
I googled around for "progress bar" and "google app engine" but most results are from people that want to monitor the progress of uploading a file. My situation is different: the time-consuming task is being performed on the server, so there isn't much the client can do to monitor its progress. This guy is the closest thing I could find, but he works in Java.
Send the post logic to a task using http://code.google.com/appengine/docs/python/taskqueue
Change the logic of the process to set a status (it could be using memcache)
Using AJAX query memcache status each 10 seconds, more or less, it's up to you
You could return immediately from your post, and do one of two things:
Poll from your client every second or so to ask your service for its status
Use the Channel API to push status updates down to your client
Short version: Use a task queue that writes to a memcache key as the operation progresses. Your page can then either use the channel API or repeatedly poll the server for a progress report.
Long version: In your post you delegate the big job to a task. The task will periodically update a key that resides in memcache. If you don't have the time to learn the channel API, you can make the page returned by your post to periodically GET some URL in the app that returns a progress report based on the memcache data and you can then update your progress bar. When the job is complete your script can go to a results page.
If you have the time, learning the Channel API is worth the effort. In this case, the task would receive the channel token so it could communicate with the JavaScript channel client in your page without the polling thing.

Indicating that GET response is complete w/ Python AppEngine

When I get a GET request from a user, I send them the response and then spend maybe a second logging stuff about that request. Is there a way to close the connection when I have the response ready, but continue doing that logging part, so that the user wouldn't have to wait for it to complete?
From the Google App Engine docs for the Response object:
App Engine does not support sending
data to the user's browser before
exiting the handler. Some web servers
use this technique to "stream" data to
the user's browser over a period of
time in response to a single request.
App Engine does not support this
streaming technique.
So there's no easy way. If you have a bundle of data that you can pass to a longer-running "process and log" method, try using the deferred library. Note that this will requiring bundling your data up and sending it to the task queue to do your processing and logging, so
you may not save much time, and
the results may not look much like you'd want - for example, you'd be logging from a different request, so might need to radically alter the logging
Still, you could try.
You have two options:
Use the Task Queue API. Enqueueing a task should be fast, so long as you have less than 10k of data (which is the limit on a Task Queue payload).
Use the 'sneaky' trick described by Rafe in this video to do processing after the response completes.

Categories

Resources