So I wrote a python script that iterates over a list of URLs and records the time it takes to get a response. Some of these ULRs can take upwards of a minute to respond, which is expected (expensive API calls) the first time they are called, but are practically instantaneous the second time (redis cache).
When I run the script on my windows machine, it works as expected for all URLs.
On my Linux server it runs as expected until it hits an URL that takes upwards of about 30 seconds to respond. At that point the call to requests.get(url, timeout=600) does not return until the 10 minute timeout is reached and then comes back with a "Read Timeout". Calling the same URL again afterwards results in a fast, successfull request, because the response has now been cached in redis. (So the request must have finished on the server providing the API.)
I would be thankful for any ideas as to what might be causing this weird behavior.
Related
I have a Python function using cachetools to cache its result with a TTL.
The function itself takes a couple of minutes to run. Is there a way to cache the results a first time and whenever the TTL expires, instead of making the next caller wait for the real invocation that takes a long time, just keep returning the stale value while the actual function runs on the background, and only when it finishes update the cached value?
Edit: It's a flask app that when I open a page it triggers a request to a remote api that takes a long time to respond.
While that happens, the page usually errors because of the timeout.
This is not the end of the world because I can start the app and call the function once to have the cache ready, but once it's TTL expires, the next person to request the page gets the error.
I have a Flask/Gunicorn backend running a machine learning process that takes around 20 minutes. There is a post request triggering the function and returning the output.
Everything works fine when I run the request through cURL, but when running the same request from the browser in the front-end, a few minutes into the request the same flask process starts again, without terminating the first one, where I end up running two requests simultaneausly which increases the run time.
What causes that? I know that cURL doesn't do the initial OPTIONS request, is it possible that OPTIONS triggers process before POST arrives?
I don't know if "queue time" is the right term for what i'm trying to log, maybe TTFB (time to first byte) is more correct.
I'm trying to explain better with a test I did:
I wrote a little python app (flask framework), with one function (one endpoint) that need about 5 seconds to complete the process (but same result with a sleep of 5 seconds).
I used uWSGI as application server, configured with 1 process and 1 thread, and nginx as reverse proxy.
With this configuration if i do two concurrent requests from the browser what i see is that the first finishes in about 5 seconds and the second finishes in about 10 second.
That's all right, with only one uWSGI process the second request must wait the first is completed, but what i want to log is the time the second request stay in "queue" waiting to be processed from uWSGI.
I tried all the nginx log variables I could find and could seem relevant to my need:
$request_time
request processing time in seconds with a milliseconds resolution; time elapsed between the first bytes were read from the client and the log write after the last bytes were sent to the client
$upstream_response_time
keeps time spent on receiving the response from the upstream server; the time is kept in seconds with millisecond resolution.
$upstream_header_time
keeps time spent on receiving the response header from the upstream server (1.7.10); the time is kept in seconds with millisecond resolution.
but all of them report the same time, about 5 second, for both requests.
I also tried to add to log the variable $msec
time in seconds with a milliseconds resolution at the time of the log write
and a custom variable $my_start_time, initialized at the start of the server section with set $my_start_time "${msec}"; in this context msec is:
current time in seconds with the milliseconds resolution
but also in this case the difference between the two times is about 5 seconds for both requests.
I suppose nginx should know the time that i try to log or at least the total time of the request from which i can subtract the "request time" and get the waiting time.
If i analyze the requests with the chrome browser and check the waterfall i see, for the first request, a total time of about 5 seconds of which almost all in the row "Waiting (TTFB)" while for the second request i see a total time of about 10 seconds with about 5 in the row "Waiting (TTFB)" and about 5 in the row "Stalled".
The time i want log from server side is the "Stalled" time reported by chrome; from this question:
Understanding Chrome network log "Stalled" state
i understand that this time is related to proxy negotiation, so i suppose it is related with nginx that act as reverse proxy.
The test configuration is done with long process in order to measure these times more easily, but the time will be present, albeit shorter, whenever there are more concurrent requests then uWSGI processes.
Did I miss something in my elucubrations?
What is the correct name of this "queue time"?
How can i log it?
Thanks in advance for any suggestion
According to https://developers.google.com/appengine/docs/python/config/cron cron jobs can run for 10 minutes. However, when I try and test it by going to the url for the cron job when signed in as an admin, it times out with a DeadlineExceededError. Best I can tell this happens about 30 seconds in, which is the non-cron limit for requests. Do I need to do something special to test it with the cron rules versus the normal limits?
Here's what I'm doing:
Going to the url for the cron job
This calls my handler which calls a single function in my py script
This function does a database call to google's cloud sql and loops through the resulting rows, calling a function on each row that use's ebay's api to get some data
The data from the ebay api call is stored in an array to all be written back to the database after all the calls are done.
Once the loop is done, it writes the data to the database and returns back to the handler
The handler prints a done message
It always has issues during the looping ebay api calls. It's something like 500 api calls that have to be made in the loop.
Any idea why I'm not getting the full 10 minutes for this?
Edit: I can post actual code if you think it would help, but I'm assuming it's a process that I'm doing wrong, rather than an error in the code since it works just fine if I limit the query to about 60 api calls.
The way GAE executes a cron job allows it to run for 10 min. This is probably done (i'm just guessing here) through checking the user-agent, IP address, or some other method. Just because you setup a cron job to hit a URL in your application doesn't mean a standard HTTP request from your browser will allow it to run for 10 minutes.
The way to test if the job works is to do so on the local dev server where there is no limit. Or wait until your cron job executes and check the logs for any errors.
Hope this helps!
Here is how you can clarify the exception and tell if it's a urlfetch problem. If the exception is:
* google.appengine.runtime.DeadlineExceededError: raised if the overall request times out, typically after 60 seconds, or 10 minutes for task queue requests;
* google.appengine.runtime.apiproxy_errors.DeadlineExceededError: raised if an RPC exceeded its deadline. This is typically 5 seconds, but it is settable for some APIs using the 'deadline' option;
* google.appengine.api.urlfetch_errors.DeadlineExceededError: raised if the URLFetch times out.
then see https://developers.google.com/appengine/articles/deadlineexceedederrors as it's a urlfetch issue.
If it's the urlfetch that's timing out, try setting a longer duration (ex 60 sec.):
result = urlfetch.fetch(url, deadline=60)
I have two instances of app engine applications running that I want to communicate with a Restful interface. Once the data of one is updated, it calls a web hook on the second which will retrieve a fresh copy of the data for it's own system.
Inside 'site1' i have:
from google.appengine.api import urlfetch
url = www.site2.com/data_updated
result = urlfetch.fetch(url)
Inside the handler for data_updated on 'site2' I have:
url = www.site1.com/get_new_data
result = urlfetch.fetch(url)
There is very little data being passed between the two sites but I receive the following error. I've tried increasing the deadline to 10 seconds but this still doesn't work.
DeadlineExceededError: ApplicationError: 5
Can anyone provide any insight into what might be happening?
Thanks - Richard
App Engine's urlfetch doesn't always behave as it is expected, you have about 10 seconds to fetch the URL. Assuming the URL you're trying to fetch is up and running, you should be able to catch the DeadlineExceededError by calling from google.appengine.runtime import apiproxy_errors and then wrapping the urlfetch call within a try/except block using except apiproxy_errors.DeadlineExceededError:.
Relevant answer here.
Changing the method
from
result = urlfetch.fetch(url)
to
result = urlfetch(url,deadline=2,method=urlfetch.POST)
has fixed the Deadline errors.
From the urlfetch documentation:
deadline
The maximum amount of time to wait for a response from the
remote host, as a number of seconds. If the remote host does not
respond in this amount of time, a DownloadError is raised.
Time spent waiting for a request does not count toward the CPU quota
for the request. It does count toward the request timer. If the app
request timer expires before the URL Fetch call returns, the call is
canceled.
The deadline can be up to a maximum of 60 seconds for request handlers
and 10 minutes for tasks queue and cron job handlers. If deadline is
None, the deadline is set to 5 seconds.
Have you tried manually querying the URLs (www.site2.com/data_updated and www.site1.com/get_new_data) with curl or otherwise to make sure that they're responding within the time limit? Even if the amount of data that needs to be transferred is small, maybe there's a problem with the handler that's causing a delay in returning the results.
The amount of data being transferred is not the problem here, the latency is.
If the app you are talking to is often taking > 10 secs to respond, you will have to use a "proxy callback" server on another cloud platform (EC2, etc.) If you can hold off for a while the new backend instances are supposed to relax the urlfetch time limits somewhat.
If the average response time is < 10 secs, and only a relatively few are failing, just retry a few times. I hope for your sake the calls are idempotent (i.e. so that a retry doesn't have adverse effects). If not, you might be able to roll your own layer on top - it's a bit painful but it works ok, it's what we do.
J
The GAE doc now states the deadline can be 60 sec:
result = urlfetch(url,deadline=60,method=urlfetch.POST)