App Engine Python UrlFetch.set_default_fetch_deadline - python

I have looked through the docs here:
https://cloud.google.com/appengine/docs/python/urlfetch/
and here:
https://cloud.google.com/appengine/articles/deadlineexceedederrors?hl=en
I also found this stack overflow question which relates to my question:
How to set timeout for urlfetch in Google App Engine?
I am connecting from my app engine app to an outside web service, that I do not have control over. Sometimes the requests take longer than 60 seconds. I set my application up to use the deferred app engine task queue api.
I am so confused. In the docs I've read it seems as though urlfetch has a maximum deadline of 60 seconds. But if its running in a task_queue it's 10 minutes? I really just need someone to clarify this for me.
Does that mean the task has 10 minutes to complete, but the urlfetch inside the task is still limited to the 60 seconds?
Pseudocode:
myTask = newTask()
deffered.defer(myTask.long_process, _queue="myqueue")
class newTask:
url = "https://example.com"
def long_process(self):
#will setting the deadline to more than 60 seconds work or not?
urlfetch.set_default_fetch_deadline(120)
data = {}
resp = urlfetch.fetch(self.url, method="POST", payload=data)
#do something with resp....

You're on the right track. Tiny correction: there is no 60s max for urlfetch.set_default_fetch_deadline(), it you might have been misled by the context of the discussion.
You can bump the 120 value up to 600, see the OP's comment to the selected answer in this recent Q&A: Appengine task runs for 5 seconds before throwing DeadlineExceededError

You can control both the urlfetch and the deferred task deadline.
Both can run for up to 600s AFAIK.
The one thing you shoudn't do is setting the urfetch deadline to a higher value than the task ;)

Related

Google App Engine urlfetch DeadlineExceededError in push task handler running apiclient batch request

I have a task handler that is making a batch request to the Google Calendar API. After 5 seconds, the request fails with DeadlineExceededError: The API call urlfetch.Fetch() took too long to respond and was cancelled. I have changed urlfetch.set_default_fetch_deadline(60) near where I make the batch request, as suggested here but it does not seem to make a difference: the deadline seems to remain 5 seconds.
I am using the Python Google API Client library which sits on top of oauth2client and httplib2. But my understanding is that GAE intercepts the underlying calls to use urlfetch.Fetch. This is what the stack trace seems to show as well.
Can you see any reason why urlfetch.set_default_fetch_deadline does not seem to be working?
EDIT:
This is the code used to build the batch request:
# note `http` is a oauth2client authorized http client
cal = apiclient.discovery.build('calendar','v3',http=http)
req = cal.new_batch_http_request(callback=_callback)
for event in events: # anything larger than ~5 events in batch takes >5 secs
req.add(
cal.events().patch(calendarId=calid, eventId=event["id"], body=self._value)
)
urlfetch.set_default_fetch_deadline(60) # has no effect
req.execute()
So, urlfetch.set_default_fetch_deadline() did eventually work for me. The problem was my underlying http client (oauth2client / httplib2) was essentially stored in a global. Once I created it in the task handler thread the set_default_fetch_deadline worked.
Try adding the deadline parameter:
my_result = urlfetch.fetch(my_url, deadline=15)

Deadline Exceeded Error even with a TaskQueue running on a Backend

I'm having some issues with Deadline Exceeded error. Basically I'm doing some webscraping in an URL using Mechanize. So when trying to perform
br.open(url)
I have this error
HTTPException: Deadline exceeded while waiting for HTTP response from
URL: my-url
I have read the documentation where it says to use Backends (I'm using a dynamic backend, B4_1G class with 5 instances), but still having this error happening in 60 seconds. And according to the docs, when using TaskQueue and Backends the timeout should be extended to 10 minutes.
Here is how I assign the operation to be runnnig on a TaskQueue with it's target on the first instance of my Backend.
taskqueue.add(url='/crons/myworker', target='1.myworker')
Here is the backends.yaml.
backends:
- name: myworker
class: B4_1G
instances: 5
options: dynamic
Any ideas of what might be happening? Thank you.
No request that involves getting data via HTTP can take more then 60 seconds on app engine.
The 10 minute limit refers to the tasks themselves - they can run for up to 10 minutes.
So GAE might not be the best choice here as you can only use it's provided versions of urlfetch etc, if your requests are going to take longer then 60 seconds on average anyway.
You can set a deadline for a request, the most amount of time the
service will wait for a response. By default, the deadline for a fetch
is 5 seconds. The maximum deadline is 60 seconds for HTTP requests and
10 minutes for task queue and cron job requests.
https://developers.google.com/appengine/docs/python/urlfetch/
So a task can run for up to 10 minutes and a url fetch for (max) 60 seconds. It does not matter where you perform the urlfetch operation from, a front or backend, the limit is the same.

If Google App Engine cron jobs have a 10 minute limit, then why do I get a DeadlineExceededError after the normal 30 seconds?

According to https://developers.google.com/appengine/docs/python/config/cron cron jobs can run for 10 minutes. However, when I try and test it by going to the url for the cron job when signed in as an admin, it times out with a DeadlineExceededError. Best I can tell this happens about 30 seconds in, which is the non-cron limit for requests. Do I need to do something special to test it with the cron rules versus the normal limits?
Here's what I'm doing:
Going to the url for the cron job
This calls my handler which calls a single function in my py script
This function does a database call to google's cloud sql and loops through the resulting rows, calling a function on each row that use's ebay's api to get some data
The data from the ebay api call is stored in an array to all be written back to the database after all the calls are done.
Once the loop is done, it writes the data to the database and returns back to the handler
The handler prints a done message
It always has issues during the looping ebay api calls. It's something like 500 api calls that have to be made in the loop.
Any idea why I'm not getting the full 10 minutes for this?
Edit: I can post actual code if you think it would help, but I'm assuming it's a process that I'm doing wrong, rather than an error in the code since it works just fine if I limit the query to about 60 api calls.
The way GAE executes a cron job allows it to run for 10 min. This is probably done (i'm just guessing here) through checking the user-agent, IP address, or some other method. Just because you setup a cron job to hit a URL in your application doesn't mean a standard HTTP request from your browser will allow it to run for 10 minutes.
The way to test if the job works is to do so on the local dev server where there is no limit. Or wait until your cron job executes and check the logs for any errors.
Hope this helps!
Here is how you can clarify the exception and tell if it's a urlfetch problem. If the exception is:
* google.appengine.runtime.DeadlineExceededError: raised if the overall request times out, typically after 60 seconds, or 10 minutes for task queue requests;
* google.appengine.runtime.apiproxy_errors.DeadlineExceededError: raised if an RPC exceeded its deadline. This is typically 5 seconds, but it is settable for some APIs using the 'deadline' option;
* google.appengine.api.urlfetch_errors.DeadlineExceededError: raised if the URLFetch times out.
then see https://developers.google.com/appengine/articles/deadlineexceedederrors as it's a urlfetch issue.
If it's the urlfetch that's timing out, try setting a longer duration (ex 60 sec.):
result = urlfetch.fetch(url, deadline=60)

app engine python urlfetch timing out

I have two instances of app engine applications running that I want to communicate with a Restful interface. Once the data of one is updated, it calls a web hook on the second which will retrieve a fresh copy of the data for it's own system.
Inside 'site1' i have:
from google.appengine.api import urlfetch
url = www.site2.com/data_updated
result = urlfetch.fetch(url)
Inside the handler for data_updated on 'site2' I have:
url = www.site1.com/get_new_data
result = urlfetch.fetch(url)
There is very little data being passed between the two sites but I receive the following error. I've tried increasing the deadline to 10 seconds but this still doesn't work.
DeadlineExceededError: ApplicationError: 5
Can anyone provide any insight into what might be happening?
Thanks - Richard
App Engine's urlfetch doesn't always behave as it is expected, you have about 10 seconds to fetch the URL. Assuming the URL you're trying to fetch is up and running, you should be able to catch the DeadlineExceededError by calling from google.appengine.runtime import apiproxy_errors and then wrapping the urlfetch call within a try/except block using except apiproxy_errors.DeadlineExceededError:.
Relevant answer here.
Changing the method
from
result = urlfetch.fetch(url)
to
result = urlfetch(url,deadline=2,method=urlfetch.POST)
has fixed the Deadline errors.
From the urlfetch documentation:
deadline
The maximum amount of time to wait for a response from the
remote host, as a number of seconds. If the remote host does not
respond in this amount of time, a DownloadError is raised.
Time spent waiting for a request does not count toward the CPU quota
for the request. It does count toward the request timer. If the app
request timer expires before the URL Fetch call returns, the call is
canceled.
The deadline can be up to a maximum of 60 seconds for request handlers
and 10 minutes for tasks queue and cron job handlers. If deadline is
None, the deadline is set to 5 seconds.
Have you tried manually querying the URLs (www.site2.com/data_updated and www.site1.com/get_new_data) with curl or otherwise to make sure that they're responding within the time limit? Even if the amount of data that needs to be transferred is small, maybe there's a problem with the handler that's causing a delay in returning the results.
The amount of data being transferred is not the problem here, the latency is.
If the app you are talking to is often taking > 10 secs to respond, you will have to use a "proxy callback" server on another cloud platform (EC2, etc.) If you can hold off for a while the new backend instances are supposed to relax the urlfetch time limits somewhat.
If the average response time is < 10 secs, and only a relatively few are failing, just retry a few times. I hope for your sake the calls are idempotent (i.e. so that a retry doesn't have adverse effects). If not, you might be able to roll your own layer on top - it's a bit painful but it works ok, it's what we do.
J
The GAE doc now states the deadline can be 60 sec:
result = urlfetch(url,deadline=60,method=urlfetch.POST)

How to set timeout for urlfetch in Google App Engine?

I'm trying to have Django (on top of GAE) fetch data from another web service. I'm often hit with error like this:
ApplicationError: 2 timed out Request
Method: GET
Request URL:http://localhost:8080/
Exception Type: DownloadError
Exception Value: ApplicationError: 2 timed out
Exception Location: /google_appengine/google/appengine/api/urlfetch.py in _get_fetch_result, line 325
It feels as if it will time out only after 12 seconds (I'm not sure, but it's really short).
Question: how can I set a longer timeout?
Seeing as this is a Python question, I thought I'd provide a Python answer for anyone who comes across this problem.
Just import urlfetch and then define a deadline before doing anything else in your code:
from google.appengine.api import urlfetch
urlfetch.set_default_fetch_deadline(60)
You can set it using the deadline argument of the fetch function. From the docs:
The deadline can be up to a maximum of 60 seconds for request handlers and 10 minutes for tasks queue and cron job handlers. If deadline is None, the deadline is set to 5 seconds.
Edit: looks like this has changed now. From here:
You can set a deadline for a request, the most amount of time the service will wait for a response. By default, the deadline for a fetch is 5 seconds. You can adjust the default deadline for requests using the urlfetch.set_default_fetch_deadline() function.
And this page lists the default timeout values:
Currently, there are several errors named DeadlineExceededError for the Python runtime:
google.appengine.runtime.DeadlineExceededError: raised if the overall request times out, typically after 60 seconds, or 10 minutes for task queue requests.
google.appengine.runtime.apiproxy_errors.DeadlineExceededError: raised if an RPC exceeded its deadline. This is typically 5 seconds, but it is settable for some APIs using the 'deadline' option.
google.appengine.api.urlfetch_errors.DeadlineExceededError: raised if the URLFetch times out.
For Go, you might want to try below code.
// createClient is urlfetch.Client with Deadline
func createClient(context appengine.Context, t time.Duration) *http.Client {
return &http.Client{
Transport: &urlfetch.Transport{
Context: context,
Deadline: t,
},
}
}
Here is how to use it.
// urlfetch
client := createClient(c, time.Second*60)
It seems short but you have to know that the timeout of a request on GAE is around 30 seconds. As you probably need to do some operations on the response of your urlfetch, there's no need to have a timeout more than 10 seconds I think.

Categories

Resources