I have a Flask application running on a Google Cloud Function that receives a Webhook from Shopify when an order is created. The problem is I'm timing out very often, here's what I mean by that:
#app.route('/', methods=['POST'])
def connectToSheets(request):
print('Webhook received...')
# Verify request is coming from Shopify
data = request.data
hmac_header = request.headers.get('X-Shopify-Hmac-SHA256')
verify_webhook(data, hmac_header)
print('Request validated...')
# Do some stuff...
Shopify's docs states that there is a 5 sec timeout period and a retry period for subscriptions. After I validate the request, there is quite a lot of code so I'm timing out almost every time.
Is there a way I can send a 200 status code to Shopify after I validate the Webhook and before I start processing the Webhook? Or is there a work-around to this?
One way to do this entirely w/in Cloud Functions is to set up two functions:
one that handles the initial request
a second one that does the processing and then follows up with the response
In addition to handling the initial request, the first function also invokes the second function via Cloud Pub/Sub.
See https://dev.to/googlecloud/getting-around-api-timeouts-with-cloud-functions-and-cloud-pub-sub-47o3 for a complete example (this uses Slack's webhook, but the behavior should be similar).
I used to face the same issue as yours. So, we moved the processing code from being executed inline to be executed in a background task by using celery and rabbitMq. RabbitMq was used for queue management. You can use Redis for queue management also.
Celery - https://docs.celeryproject.org/en/stable/getting-started/index.html
RabbitMq - https://www.rabbitmq.com/documentation.html
Asynchronous Tasks Using Flask, Redis, and Celery - https://stackabuse.com/asynchronous-tasks-using-flask-redis-and-celery/
How to Set Up a Task Queue with Celery and RabbitMQ - https://www.linode.com/docs/development/python/task-queue-celery-rabbitmq/
Related
I have a Cloud Function (Python) which is triggered by http from web client, it has to calculate something and respond FAST. I would like to save the http request parameters into a database (for analytics).
If i just initiate a WRITE to my postgresql, the function will have to wait for it, and it will be slower.
Using PubSub, the Function also need to publish and wait for respond (Docs example) :
# Publishes a message
try:
publish_future = publisher.publish(topic_path, data=message_bytes)
publish_future.result() # Verify the publish succeeded
return 'Message published.'
Google does not offer a solution which trigger in the background automatically a pubSub when a http function is called.
How can I take information from my function and save it to my DB (any DB) without affecting the function execution at all.
def cloud_func(request):
request_json = request.get_json()
//save to db async without waiting for respond
//calculate my stuff..
return (result, 200, headers) //to client
If you use a Cloud Functions trigger in HTTP, I recommend you to migrate to Cloud Run, and to activate the always on CPU parameter.
That parameter is designed exactly for that: continue a process in background even outside a request handling context
EDIT 1
You can also imagine an async mechanism by staying on Cloud Functions. The sync function get the data from the user, publish a message in PubSub and answer to the user.
The PubSub message publication is very fast, and won't take too much time.
Then write another function, that listen the PubSub topic and save the message data into your database. Because that function is async, you are not time constraints.
Is there a way to use Celery for:
Queue a HTTP call to external URL with Form parameters (HTTP Post to
URL)
The external URL will respond HTTP response, 200, 404, 400 etc, if
response is in form of error non-200-ish response it will retry for
a certain number of retry and will retire as needed
Add Task / Job / Work queue into Celery using REST API, passing the URL to call and Form parameters
For that you need to create a task in your celery application that would perform that request for you and return the result.
Handling the errors and retries can be done within the code of your task, or can alternatively be taken care by celery if you schedule the task with the right arguments: see the arguments of .apply_async()
You can schedule new tasks via REST API if you run Celery Flower. It has a REST API (see documentation), in particular a POST endpoint to schedule a task.
Yes, create an I/O class that handle your http requests and process.
Read about celery tasks and remember to set connect_timeout= 5.0, read_timeout = 30.0 timeouts to your I/O ops to not block your workers.
There is a precise example of using requests in the celery worker tasks.
you can use flower rest API to do the same, as flower is a monitoring tool for celery. But it comes with rest api to add task and all
https://flower.readthedocs.io/en/latest/index.html
We're creating a web service (RESTful API) in Django. Our API will wrap both our own internal data as well as some other APIs that our web services layer will be accessing.
One of the APIs we're using has some long-running calls that don't return an HTTP response for on the order of a minute. The API has a separate API call to get status of the current operation, but that means that the user has to initiate the long-running operation, then have a separate process poll for status. We don't want our API to work that way, we want the initial request to just return a response that says that it's in progress.
So what we want to do is when we get a long-running request, we kick off an HTTP request of our own to the API asynchronously, then return a response. Then every time we get a status poll we just pass that through and respond with the response we got. When we get the callback that the operation is complete, then the next time we get a status poll we'll just respond that the operation is complete and return the data. This means that we'll need handlers for incoming status requests to check the list of in-progress long-running requests to respond with the status.
Does this seem like a reasonable way to approach this? Which python libraries we should be looking at to make this sort of thing easier? We're not sure whether to go with something low-level like eventlet or twisted, or something a little heavier-weight like celery. Celery seems to be the normal recommendation for this sort of thing, but I'm not 100% sure what its place would be.
Thanks,
Spencer
I faced the same situation a couple of months ago, probably you already solved your problem, but for other person facing the same situation I'll post what I did at that time.
Basically I used the http://www.celeryproject.org/ library, dispatching in a asynchronous way a long running operation returning a successful HTTP response the celery job id, the asynch operation would register the status and job id in a sqlite database (was enough for what I was doing), and a client was querying (using rest) the status of the job.
I have a Google App Engine application that performs about 30-50 calls to a remote API. Each call takes about a second, so the whole operation can easily take a minute. Currently, I do this in a loop inside the post() function of my site, so the response isn't printed until the whole operation completes. Needless to say, the app isn't very usable at the moment.
What I would like to do is to print the response immediately after the operation is started, and then update it as each individual API call completes. How would I achieve this? On a desktop application, I would just kick off a worker thread that would periodically update the front-end. Is there a similar mechanism in the Google App Engine?
I googled around for "progress bar" and "google app engine" but most results are from people that want to monitor the progress of uploading a file. My situation is different: the time-consuming task is being performed on the server, so there isn't much the client can do to monitor its progress. This guy is the closest thing I could find, but he works in Java.
Send the post logic to a task using http://code.google.com/appengine/docs/python/taskqueue
Change the logic of the process to set a status (it could be using memcache)
Using AJAX query memcache status each 10 seconds, more or less, it's up to you
You could return immediately from your post, and do one of two things:
Poll from your client every second or so to ask your service for its status
Use the Channel API to push status updates down to your client
Short version: Use a task queue that writes to a memcache key as the operation progresses. Your page can then either use the channel API or repeatedly poll the server for a progress report.
Long version: In your post you delegate the big job to a task. The task will periodically update a key that resides in memcache. If you don't have the time to learn the channel API, you can make the page returned by your post to periodically GET some URL in the app that returns a progress report based on the memcache data and you can then update your progress bar. When the job is complete your script can go to a results page.
If you have the time, learning the Channel API is worth the effort. In this case, the task would receive the channel token so it could communicate with the JavaScript channel client in your page without the polling thing.
When I get a GET request from a user, I send them the response and then spend maybe a second logging stuff about that request. Is there a way to close the connection when I have the response ready, but continue doing that logging part, so that the user wouldn't have to wait for it to complete?
From the Google App Engine docs for the Response object:
App Engine does not support sending
data to the user's browser before
exiting the handler. Some web servers
use this technique to "stream" data to
the user's browser over a period of
time in response to a single request.
App Engine does not support this
streaming technique.
So there's no easy way. If you have a bundle of data that you can pass to a longer-running "process and log" method, try using the deferred library. Note that this will requiring bundling your data up and sending it to the task queue to do your processing and logging, so
you may not save much time, and
the results may not look much like you'd want - for example, you'd be logging from a different request, so might need to radically alter the logging
Still, you could try.
You have two options:
Use the Task Queue API. Enqueueing a task should be fast, so long as you have less than 10k of data (which is the limit on a Task Queue payload).
Use the 'sneaky' trick described by Rafe in this video to do processing after the response completes.