app engine python urlfetch timing out - python

I have two instances of app engine applications running that I want to communicate with a Restful interface. Once the data of one is updated, it calls a web hook on the second which will retrieve a fresh copy of the data for it's own system.
Inside 'site1' i have:
from google.appengine.api import urlfetch
url = www.site2.com/data_updated
result = urlfetch.fetch(url)
Inside the handler for data_updated on 'site2' I have:
url = www.site1.com/get_new_data
result = urlfetch.fetch(url)
There is very little data being passed between the two sites but I receive the following error. I've tried increasing the deadline to 10 seconds but this still doesn't work.
DeadlineExceededError: ApplicationError: 5
Can anyone provide any insight into what might be happening?
Thanks - Richard

App Engine's urlfetch doesn't always behave as it is expected, you have about 10 seconds to fetch the URL. Assuming the URL you're trying to fetch is up and running, you should be able to catch the DeadlineExceededError by calling from google.appengine.runtime import apiproxy_errors and then wrapping the urlfetch call within a try/except block using except apiproxy_errors.DeadlineExceededError:.
Relevant answer here.

Changing the method
from
result = urlfetch.fetch(url)
to
result = urlfetch(url,deadline=2,method=urlfetch.POST)
has fixed the Deadline errors.
From the urlfetch documentation:
deadline
The maximum amount of time to wait for a response from the
remote host, as a number of seconds. If the remote host does not
respond in this amount of time, a DownloadError is raised.
Time spent waiting for a request does not count toward the CPU quota
for the request. It does count toward the request timer. If the app
request timer expires before the URL Fetch call returns, the call is
canceled.
The deadline can be up to a maximum of 60 seconds for request handlers
and 10 minutes for tasks queue and cron job handlers. If deadline is
None, the deadline is set to 5 seconds.

Have you tried manually querying the URLs (www.site2.com/data_updated and www.site1.com/get_new_data) with curl or otherwise to make sure that they're responding within the time limit? Even if the amount of data that needs to be transferred is small, maybe there's a problem with the handler that's causing a delay in returning the results.

The amount of data being transferred is not the problem here, the latency is.
If the app you are talking to is often taking > 10 secs to respond, you will have to use a "proxy callback" server on another cloud platform (EC2, etc.) If you can hold off for a while the new backend instances are supposed to relax the urlfetch time limits somewhat.
If the average response time is < 10 secs, and only a relatively few are failing, just retry a few times. I hope for your sake the calls are idempotent (i.e. so that a retry doesn't have adverse effects). If not, you might be able to roll your own layer on top - it's a bit painful but it works ok, it's what we do.
J

The GAE doc now states the deadline can be 60 sec:
result = urlfetch(url,deadline=60,method=urlfetch.POST)

Related

BigQuery Python client - meaning of timeout parameter, and how to set query result timeout

This question is about the timeout parameter in the result method of QueryJob objects in the BigQuery Python client.
It looks like the meaning of timeout has changed in relation to version 1.24.0.
For example, the documentation for QueryJob's result in version 1.24.0 states that timeout is:
The number of seconds to wait for the underlying HTTP transport before using retry. If multiple requests are made under the hood, timeout is interpreted as the approximate total time of all requests.
As I understand it, this could be used as a way to limit the total time that the result method call will wait for the results.
For example, consider the following script:
import logging
from google.cloud import bigquery
# Set logging level to DEBUG in order to see the HTTP requests
# being made by urllib3
logging.basicConfig(level=logging.DEBUG)
PROJECT_ID = "project_id" # replace by actual project ID
client = bigquery.Client(project=PROJECT_ID)
QUERY = ('SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` '
'WHERE state = "TX" '
'LIMIT 100')
TIMEOUT = 30 # in seconds
query_job = client.query(QUERY) # API request - starts the query
assert query_job.state == 'RUNNING'
# Waits for the query to finish
iterator = query_job.result(timeout=TIMEOUT)
rows = list(iterator)
assert query_job.state == 'DONE'
As I understand it, if all the API calls involved in fetching the results added up to more than 30 seconds, the call to result would give up. So, timeout here serves to limit the total execution time of the result method call.
However, later versions introduced a change. For example, the documentation for result in 1.27.2 states that timeout is:
The number of seconds to wait for the underlying HTTP transport before using retry. If multiple requests are made under the hood, timeout applies to each individual request.
If I'm understanding this correctly, the example above changes meaning completely, and the call to result could potentially take more than 30 seconds.
My doubts are:
What exactly is the difference of the script above if I run it with the new version of result versus the old version?
What are the currently recommended use cases for passing a timeout value to result?
What is the currently recommended way to time out after a given total time while waiting for query results?
Thank you.
As you can see in this fix:
A transport layer timeout is made independent of the query timeout,
i.e. the maximum time to wait for the query to complete.
The query timeout is used by the blocking poll so that the backend
does not block for too long when polling for job completion, but the
transport can have different timeout requirements, and we do not want
it to be raising sometimes unnecessary timeout errors.
Apply timeout to each of the underlying requests
As job methods do not split the timeout anymore between all requests a
method might make, the Client methods are adjusted in the same way.
So the basic difference is that in the previous version, if many requests were made in layer below they would share a 30 seconds timeout. In other words, if the first request takes 20 seconds, the second would timeout in 10 seconds.
In the new version every single request will have 30 seconds.
About the use case, basically it depends on your application. If you can not wait a long time for a request that might be lost you can decrease you timeout.

Google App Engine urlfetch DeadlineExceededError in push task handler running apiclient batch request

I have a task handler that is making a batch request to the Google Calendar API. After 5 seconds, the request fails with DeadlineExceededError: The API call urlfetch.Fetch() took too long to respond and was cancelled. I have changed urlfetch.set_default_fetch_deadline(60) near where I make the batch request, as suggested here but it does not seem to make a difference: the deadline seems to remain 5 seconds.
I am using the Python Google API Client library which sits on top of oauth2client and httplib2. But my understanding is that GAE intercepts the underlying calls to use urlfetch.Fetch. This is what the stack trace seems to show as well.
Can you see any reason why urlfetch.set_default_fetch_deadline does not seem to be working?
EDIT:
This is the code used to build the batch request:
# note `http` is a oauth2client authorized http client
cal = apiclient.discovery.build('calendar','v3',http=http)
req = cal.new_batch_http_request(callback=_callback)
for event in events: # anything larger than ~5 events in batch takes >5 secs
req.add(
cal.events().patch(calendarId=calid, eventId=event["id"], body=self._value)
)
urlfetch.set_default_fetch_deadline(60) # has no effect
req.execute()
So, urlfetch.set_default_fetch_deadline() did eventually work for me. The problem was my underlying http client (oauth2client / httplib2) was essentially stored in a global. Once I created it in the task handler thread the set_default_fetch_deadline worked.
Try adding the deadline parameter:
my_result = urlfetch.fetch(my_url, deadline=15)

App Engine Python UrlFetch.set_default_fetch_deadline

I have looked through the docs here:
https://cloud.google.com/appengine/docs/python/urlfetch/
and here:
https://cloud.google.com/appengine/articles/deadlineexceedederrors?hl=en
I also found this stack overflow question which relates to my question:
How to set timeout for urlfetch in Google App Engine?
I am connecting from my app engine app to an outside web service, that I do not have control over. Sometimes the requests take longer than 60 seconds. I set my application up to use the deferred app engine task queue api.
I am so confused. In the docs I've read it seems as though urlfetch has a maximum deadline of 60 seconds. But if its running in a task_queue it's 10 minutes? I really just need someone to clarify this for me.
Does that mean the task has 10 minutes to complete, but the urlfetch inside the task is still limited to the 60 seconds?
Pseudocode:
myTask = newTask()
deffered.defer(myTask.long_process, _queue="myqueue")
class newTask:
url = "https://example.com"
def long_process(self):
#will setting the deadline to more than 60 seconds work or not?
urlfetch.set_default_fetch_deadline(120)
data = {}
resp = urlfetch.fetch(self.url, method="POST", payload=data)
#do something with resp....
You're on the right track. Tiny correction: there is no 60s max for urlfetch.set_default_fetch_deadline(), it you might have been misled by the context of the discussion.
You can bump the 120 value up to 600, see the OP's comment to the selected answer in this recent Q&A: Appengine task runs for 5 seconds before throwing DeadlineExceededError
You can control both the urlfetch and the deferred task deadline.
Both can run for up to 600s AFAIK.
The one thing you shoudn't do is setting the urfetch deadline to a higher value than the task ;)

If Google App Engine cron jobs have a 10 minute limit, then why do I get a DeadlineExceededError after the normal 30 seconds?

According to https://developers.google.com/appengine/docs/python/config/cron cron jobs can run for 10 minutes. However, when I try and test it by going to the url for the cron job when signed in as an admin, it times out with a DeadlineExceededError. Best I can tell this happens about 30 seconds in, which is the non-cron limit for requests. Do I need to do something special to test it with the cron rules versus the normal limits?
Here's what I'm doing:
Going to the url for the cron job
This calls my handler which calls a single function in my py script
This function does a database call to google's cloud sql and loops through the resulting rows, calling a function on each row that use's ebay's api to get some data
The data from the ebay api call is stored in an array to all be written back to the database after all the calls are done.
Once the loop is done, it writes the data to the database and returns back to the handler
The handler prints a done message
It always has issues during the looping ebay api calls. It's something like 500 api calls that have to be made in the loop.
Any idea why I'm not getting the full 10 minutes for this?
Edit: I can post actual code if you think it would help, but I'm assuming it's a process that I'm doing wrong, rather than an error in the code since it works just fine if I limit the query to about 60 api calls.
The way GAE executes a cron job allows it to run for 10 min. This is probably done (i'm just guessing here) through checking the user-agent, IP address, or some other method. Just because you setup a cron job to hit a URL in your application doesn't mean a standard HTTP request from your browser will allow it to run for 10 minutes.
The way to test if the job works is to do so on the local dev server where there is no limit. Or wait until your cron job executes and check the logs for any errors.
Hope this helps!
Here is how you can clarify the exception and tell if it's a urlfetch problem. If the exception is:
* google.appengine.runtime.DeadlineExceededError: raised if the overall request times out, typically after 60 seconds, or 10 minutes for task queue requests;
* google.appengine.runtime.apiproxy_errors.DeadlineExceededError: raised if an RPC exceeded its deadline. This is typically 5 seconds, but it is settable for some APIs using the 'deadline' option;
* google.appengine.api.urlfetch_errors.DeadlineExceededError: raised if the URLFetch times out.
then see https://developers.google.com/appengine/articles/deadlineexceedederrors as it's a urlfetch issue.
If it's the urlfetch that's timing out, try setting a longer duration (ex 60 sec.):
result = urlfetch.fetch(url, deadline=60)

How to set timeout for urlfetch in Google App Engine?

I'm trying to have Django (on top of GAE) fetch data from another web service. I'm often hit with error like this:
ApplicationError: 2 timed out Request
Method: GET
Request URL:http://localhost:8080/
Exception Type: DownloadError
Exception Value: ApplicationError: 2 timed out
Exception Location: /google_appengine/google/appengine/api/urlfetch.py in _get_fetch_result, line 325
It feels as if it will time out only after 12 seconds (I'm not sure, but it's really short).
Question: how can I set a longer timeout?
Seeing as this is a Python question, I thought I'd provide a Python answer for anyone who comes across this problem.
Just import urlfetch and then define a deadline before doing anything else in your code:
from google.appengine.api import urlfetch
urlfetch.set_default_fetch_deadline(60)
You can set it using the deadline argument of the fetch function. From the docs:
The deadline can be up to a maximum of 60 seconds for request handlers and 10 minutes for tasks queue and cron job handlers. If deadline is None, the deadline is set to 5 seconds.
Edit: looks like this has changed now. From here:
You can set a deadline for a request, the most amount of time the service will wait for a response. By default, the deadline for a fetch is 5 seconds. You can adjust the default deadline for requests using the urlfetch.set_default_fetch_deadline() function.
And this page lists the default timeout values:
Currently, there are several errors named DeadlineExceededError for the Python runtime:
google.appengine.runtime.DeadlineExceededError: raised if the overall request times out, typically after 60 seconds, or 10 minutes for task queue requests.
google.appengine.runtime.apiproxy_errors.DeadlineExceededError: raised if an RPC exceeded its deadline. This is typically 5 seconds, but it is settable for some APIs using the 'deadline' option.
google.appengine.api.urlfetch_errors.DeadlineExceededError: raised if the URLFetch times out.
For Go, you might want to try below code.
// createClient is urlfetch.Client with Deadline
func createClient(context appengine.Context, t time.Duration) *http.Client {
return &http.Client{
Transport: &urlfetch.Transport{
Context: context,
Deadline: t,
},
}
}
Here is how to use it.
// urlfetch
client := createClient(c, time.Second*60)
It seems short but you have to know that the timeout of a request on GAE is around 30 seconds. As you probably need to do some operations on the response of your urlfetch, there's no need to have a timeout more than 10 seconds I think.

Categories

Resources