Google Cloud Tasks Celery migration confusion - python

I'm having trouble understanding how to use google cloud tasks to replace celery...
Currently, when you hit endpoint api/v1/longtask celery asyncs the task immediately and returns 200. Task runs, updates, ends.
So with cloud tasks, I would call api/v1/tasks invoke the specific task, and return 200. But the api endpoint api/v1/longtask will timeout now as the task takes 1 hour.
So do I need to adjust the timeout on the endpoint.
At this point I think a better solution is cloud functions but I would like to learn what the other side of the task looks like as the documentation only shows calling a URL. It never shows the long task api endpoint which in my experience times out at 60 seconds.
Thank you,

Related

Cloud Run suddenly starts timing out when processing any request

We've been running a backend application on Cloud Run for about a year and a half now, and a month ago it suddenly stopped properly handling all requests at seemingly random times (about every couple of days), only working again once we redeploy from the latest image from Cloud Build. The application will actually receive the request, however it just doesn't do anything and eventually the request will just time out (504) after 59m59s (the max timeout), even a test endpoint that just returns 'Hello World' times out without sending a response.
The application is written in Python and uses Flask to handle requests. We have a Cloud SQL instance that is used as its database, however we're confident this is not the source of the issue as even requests that don't involve the DB in any form do not work and the Cloud SQL instance is accessible even when the application stops working. Cloud Run is deployed with the following configuration:
CPU: 2
Memory: 8Gi
Timeout: 59m59s
VPC connector
VPC egress: private-ranges-only
Concurrency: 100
The vast majority of endpoints should produce some form of log when they first start, so we're confident that the application isn't executing any of the code after being triggered. We're not seeing any useful error messages in Logs Explorer either, simply just 504 errors from the requests timing out. It's deployed with a 59m59s timeout, so it's not the case that the timeout has been entered incorrectly and even then, that wouldn't explain why it works again when it's redeployed.
We have a Cloud Scheduler schedule that triggers the application every 15 minutes, which sends to an endpoint in the application that checks if any tasks are due to run and creates Cloud Tasks tasks (which send HTTP requests to an endpoint on the same application) for any tasks that need performing at that point in time. Every time the application stops working, it does seem to be during one of these runs, however we're not certain it's the cause as the Cloud Scheduler schedule is the most frequent trigger anyway. There doesn't seem to be a specific time of day that the crashes take place either.
This is a (heavily redacted) screenshot of the logs. The Cloud Scheduler schedule hits the endpoint at 21:00 and creates a number of tasks but then hits the default 3m Cloud Scheduler timeout limit at 21:03. The tasks it created then hit the default 10m Cloud Tasks timeout limit at 21:10 without their endpoint having done anything. After that point, all requests to the service timeout without doing anything.
The closest post I could find on SO was this one, their problem is also temporarily fixed by redeployment, however ours isn't sending 200 responses when it stops working and is instead just timing out without doing anything. We've tried adding retries to Cloud Scheduler + increasing its timeout limit, and we've also tried increasing the CPU and RAM allocation.
Any help is appreciated!

Cloud Scheduler invokes Cloud Function more than once during schedule

I currently have a Cloud Function that is executing some asynchronous code. It is making a Get request to an Endpoint to retrieve some data and then it storing that data into a Cloud Storage. I have set up the Cloud Function to be triggered using Cloud Scheduler via HTTP. When I use the test option that Cloud Function has, everything works fine, but when I set up Cloud Scheduler to invoke the Cloud Function, it gets invoked more than once. I was able to tell by looking at the logs and it showing multiple execution id's and print statements I have in place. Does anyone know why the Cloud Scheduler is invoking more than once? I have the Max Retry Attempts set to 0. There is a portion in my code where I use asyncio's create_task and sleep in order to put make sure the tasks get put into the event loop to slow down the number of requests and I was wondering if this is causing Cloud Scheduler to do something unexpected?
async with aiohttp.ClientSession(headers=headers) as session:
tasks = []
for i in range(1, total_pages + 1):
tasks.append(asyncio.create_task(self.get_tasks(session=session,page=i)))
await asyncio.sleep(delay_per_request)
For my particular case, when natively testing (using the test option cloud function has built-in) my Cloud Function was performing as expected. However, when I set up Cloud Scheduler to trigger the Cloud Function via a HTTP, it unexpectedly ran more than once. As #EdoAkse mentioned in original thread here my event with Cloud Scheduler was running more than once. My solution was to set up Pub/Sub topic that the Cloud Function subscribes to and that topic will trigger that Cloud Function. The Cloud Scheduler would then invoke that Pub/Sub Trigger. It is essentially how Google describes it in their docs.
Cloud Scheduler -> Pub/Sub Trigger -> Cloud Function
Observed a behavior where cloud functions were being called twice by Cloud Scheduler. Apparently, despite them being located/designated as eu-west1, a duplicate entry in schedule was present for each scheduled function located in us-central1. Removing the duplicated calls in us-central1 resolved my issue.

How do I limit access to a cloud storage bucket to one process at a time?

I am fairly new to GCP.
I have some items in a cloud storage bucket.
I have written some python code to access this bucket and perform update operations.
I want to make sure that whenever the python code is triggered, it has exclusive access to the bucket so that I do not run into some sort of race condition.
For example, if I put the python code in a cloud function and trigger it, I want to make sure it completes before another trigger occurs. Is this automatically handled or do I have to do something to prevent this? If I have to add something like a semaphore, will subsequent triggers happen automatically after the the semaphore is released?
Google Cloud
Scheduler is a
fully managed cron jobs scheduling service available in GCP. It's
basically the cron jobs which trigger at a given time. All you need
to do is specify the frequency(The time when the job needs to be
triggered) and the target(HTTP, Pub/Sub, App Engine HTTP) and you can
specify the retry configuration like Max retry attempts, Max retry
duration etc..
App Engine has a built-in cron service that allows you to write a simple cron.yaml containing the time at which you want the job to run and which endpoint it should hit. App Engine will ensure that the cron is executed at the time which you have specified. Here’s a sample cron.yaml that hits the /tasks/summary endpoint in AppEngine deployment every 24 hours.
cron:
- description: "daily summary job"
url: /tasks/summary
schedule: every 24 hours
All of the info supplied has been helpful. The best answer has been to use a max-concurrent-dispatches setting of 1 so that only one task is dispatched at a time.

If Google App Engine cron jobs have a 10 minute limit, then why do I get a DeadlineExceededError after the normal 30 seconds?

According to https://developers.google.com/appengine/docs/python/config/cron cron jobs can run for 10 minutes. However, when I try and test it by going to the url for the cron job when signed in as an admin, it times out with a DeadlineExceededError. Best I can tell this happens about 30 seconds in, which is the non-cron limit for requests. Do I need to do something special to test it with the cron rules versus the normal limits?
Here's what I'm doing:
Going to the url for the cron job
This calls my handler which calls a single function in my py script
This function does a database call to google's cloud sql and loops through the resulting rows, calling a function on each row that use's ebay's api to get some data
The data from the ebay api call is stored in an array to all be written back to the database after all the calls are done.
Once the loop is done, it writes the data to the database and returns back to the handler
The handler prints a done message
It always has issues during the looping ebay api calls. It's something like 500 api calls that have to be made in the loop.
Any idea why I'm not getting the full 10 minutes for this?
Edit: I can post actual code if you think it would help, but I'm assuming it's a process that I'm doing wrong, rather than an error in the code since it works just fine if I limit the query to about 60 api calls.
The way GAE executes a cron job allows it to run for 10 min. This is probably done (i'm just guessing here) through checking the user-agent, IP address, or some other method. Just because you setup a cron job to hit a URL in your application doesn't mean a standard HTTP request from your browser will allow it to run for 10 minutes.
The way to test if the job works is to do so on the local dev server where there is no limit. Or wait until your cron job executes and check the logs for any errors.
Hope this helps!
Here is how you can clarify the exception and tell if it's a urlfetch problem. If the exception is:
* google.appengine.runtime.DeadlineExceededError: raised if the overall request times out, typically after 60 seconds, or 10 minutes for task queue requests;
* google.appengine.runtime.apiproxy_errors.DeadlineExceededError: raised if an RPC exceeded its deadline. This is typically 5 seconds, but it is settable for some APIs using the 'deadline' option;
* google.appengine.api.urlfetch_errors.DeadlineExceededError: raised if the URLFetch times out.
then see https://developers.google.com/appengine/articles/deadlineexceedederrors as it's a urlfetch issue.
If it's the urlfetch that's timing out, try setting a longer duration (ex 60 sec.):
result = urlfetch.fetch(url, deadline=60)

Progress bar in Google App Engine

I have a Google App Engine application that performs about 30-50 calls to a remote API. Each call takes about a second, so the whole operation can easily take a minute. Currently, I do this in a loop inside the post() function of my site, so the response isn't printed until the whole operation completes. Needless to say, the app isn't very usable at the moment.
What I would like to do is to print the response immediately after the operation is started, and then update it as each individual API call completes. How would I achieve this? On a desktop application, I would just kick off a worker thread that would periodically update the front-end. Is there a similar mechanism in the Google App Engine?
I googled around for "progress bar" and "google app engine" but most results are from people that want to monitor the progress of uploading a file. My situation is different: the time-consuming task is being performed on the server, so there isn't much the client can do to monitor its progress. This guy is the closest thing I could find, but he works in Java.
Send the post logic to a task using http://code.google.com/appengine/docs/python/taskqueue
Change the logic of the process to set a status (it could be using memcache)
Using AJAX query memcache status each 10 seconds, more or less, it's up to you
You could return immediately from your post, and do one of two things:
Poll from your client every second or so to ask your service for its status
Use the Channel API to push status updates down to your client
Short version: Use a task queue that writes to a memcache key as the operation progresses. Your page can then either use the channel API or repeatedly poll the server for a progress report.
Long version: In your post you delegate the big job to a task. The task will periodically update a key that resides in memcache. If you don't have the time to learn the channel API, you can make the page returned by your post to periodically GET some URL in the app that returns a progress report based on the memcache data and you can then update your progress bar. When the job is complete your script can go to a results page.
If you have the time, learning the Channel API is worth the effort. In this case, the task would receive the channel token so it could communicate with the JavaScript channel client in your page without the polling thing.

Categories

Resources