Log nginx "queue time"

Log nginx "queue time" - python

I don't know if "queue time" is the right term for what i'm trying to log, maybe TTFB (time to first byte) is more correct.
I'm trying to explain better with a test I did:
I wrote a little python app (flask framework), with one function (one endpoint) that need about 5 seconds to complete the process (but same result with a sleep of 5 seconds).
I used uWSGI as application server, configured with 1 process and 1 thread, and nginx as reverse proxy.
With this configuration if i do two concurrent requests from the browser what i see is that the first finishes in about 5 seconds and the second finishes in about 10 second.
That's all right, with only one uWSGI process the second request must wait the first is completed, but what i want to log is the time the second request stay in "queue" waiting to be processed from uWSGI.
I tried all the nginx log variables I could find and could seem relevant to my need:
$request_time
request processing time in seconds with a milliseconds resolution; time elapsed between the first bytes were read from the client and the log write after the last bytes were sent to the client
$upstream_response_time
keeps time spent on receiving the response from the upstream server; the time is kept in seconds with millisecond resolution.
$upstream_header_time
keeps time spent on receiving the response header from the upstream server (1.7.10); the time is kept in seconds with millisecond resolution.
but all of them report the same time, about 5 second, for both requests.
I also tried to add to log the variable $msec
time in seconds with a milliseconds resolution at the time of the log write
and a custom variable $my_start_time, initialized at the start of the server section with set $my_start_time "${msec}"; in this context msec is:
current time in seconds with the milliseconds resolution
but also in this case the difference between the two times is about 5 seconds for both requests.
I suppose nginx should know the time that i try to log or at least the total time of the request from which i can subtract the "request time" and get the waiting time.
If i analyze the requests with the chrome browser and check the waterfall i see, for the first request, a total time of about 5 seconds of which almost all in the row "Waiting (TTFB)" while for the second request i see a total time of about 10 seconds with about 5 in the row "Waiting (TTFB)" and about 5 in the row "Stalled".
The time i want log from server side is the "Stalled" time reported by chrome; from this question:
Understanding Chrome network log "Stalled" state
i understand that this time is related to proxy negotiation, so i suppose it is related with nginx that act as reverse proxy.
The test configuration is done with long process in order to measure these times more easily, but the time will be present, albeit shorter, whenever there are more concurrent requests then uWSGI processes.
Did I miss something in my elucubrations?
What is the correct name of this "queue time"?
How can i log it?
Thanks in advance for any suggestion

Related

Python requests time out on a linux machine but not on windows

So I wrote a python script that iterates over a list of URLs and records the time it takes to get a response. Some of these ULRs can take upwards of a minute to respond, which is expected (expensive API calls) the first time they are called, but are practically instantaneous the second time (redis cache).
When I run the script on my windows machine, it works as expected for all URLs.
On my Linux server it runs as expected until it hits an URL that takes upwards of about 30 seconds to respond. At that point the call to requests.get(url, timeout=600) does not return until the 10 minute timeout is reached and then comes back with a "Read Timeout". Calling the same URL again afterwards results in a fast, successfull request, because the response has now been cached in redis. (So the request must have finished on the server providing the API.)
I would be thankful for any ideas as to what might be causing this weird behavior.

Free Heroku Server: Does Sleep Count as Active Time?

I am planning on having a Python app run under a free Heroku server, but I have read that there is a max 18 hour execution time before the process is slept. However, what if my app runs likes this -
process something (which should take less than a second).
sleep for 5 minutes.
I plan on having this script run continuously (all day long).
Does the 5 minute sleep count towards the 18 hour time limit?

I think it will be counted as the processing time because you are using a single thread to process a request, which is different from Heroku "sleep".
The timeout value is not configurable. If your server requires longer
than 30 seconds to complete a given request, we recommend moving that
work to a background task or worker to periodically ping your server
to see if the processing request has been finished. This pattern frees
your web processes up to do more work, and decreases overall
application response times.
You can read more here : https://devcenter.heroku.com/articles/request-timeout
if you are willing to wait for 10 minutes you can try https://elements.heroku.com/addons/scheduler or use some kind of monitoring service like http://godrb.com/

Limit speed of urlfetch per domain

Is there a way to limit the number of requests that urlfetch makes to any single server, per time unit?
I accidentally DoS'd a site I was crawling, since the async urlfetch api made it branch out until it died (each request spawns more than one new request on average). The logs contain ~200 DeadlineExceeded with a millisecond between each.

You could use time.sleep() method. Suspend execution of the current thread for the given number of seconds.
import time
[...]
for u in urls:
urllib2.urlopen(u, timeout=4)
time.sleep(1)
https://docs.python.org/2/library/time.html#time.sleep

Django, sleep() pauses all processes, but only if no GET parameter?

Using Django (hosted by Webfaction), I have the following code
import time
def my_function(request):
time.sleep(10)
return HttpResponse("Done")
This is executed via Django when I go to my url, www.mysite.com
I enter the url twice, immediately after each other. The way I see it, both of these should finish after 10 seconds. However, the second call waits for the first one and finishes after 20 seconds.
If, however, I enter some dummy GET parameter, www.mysite.com?dummy=1 and www.mysite.com?dummy=2 then they both finish after 10 seconds. So it is possible for both of them to run simultaneously.
It's as though the scope of sleep() is somehow global?? Maybe entering a parameter makes them run as different processes instead of the same???
It is hosted by Webfaction. httpd.conf has:
KeepAlive Off
Listen 30961
MaxSpareThreads 3
MinSpareThreads 1
ServerLimit 1
SetEnvIf X-Forwarded-SSL on HTTPS=1
ThreadsPerChild 5
I do need to be able to use sleep() and trust that it isn't stopping everything. So, what's up and how to fix it?
Edit: Webfaction runs this using Apache.

As Gjordis pointed out, sleep will pause the current thread. I have looked at Webfaction and it looks like their are using WSGI for running the serving instance of Django. This means, every time a request comes in, Apache will look at how many worker processes (that are processes that each run a instance of Django) are currently running. If there are none/to view it will spawn additonally workers and hand the requests to them.
Here is what I think is happening in you situation:
first GET request for resource A comes in. Apache uses a running worker (or starts a new one)
the worker sleeps 10 seconds
during this, a new request for resource A comes in. Apache sees it is requesting the same resource and sends it to the same worker as for request A. I guess the assumption here is that a worker that recently processes a request for a specific resource it is more likely that the worker has some information cached/preprocessed/whatever so it can handle this request faster
this results in a 20 second block since there is only one worker that waits 2 times 10 seconds
This behavior makes complete sense 99% of the time so it's logical to do this by default.
However, if you change the requested resource for the second request (by adding GET parameter) Apache will assume that this is a different resource and will start another worker (since the first one is already "busy" (Apache can not know that you are not doing any hard work). Since there are now two worker, both waiting 10 seconds the total time goes down to 10 seconds.
Additionally I assume that something is **wrong** with your design. There are almost no cases which I can think of where it would be sensible to not respond to a HTTP request as fast as you can. After all, you want to serve as many requests as possible in the shortest amount of time, so sleeping 10 seconds is the most counterproductive thing you can do. I would recommend the you create a new question and state what you actual goal is that you are trying to achieve. I'm pretty sure there is a more sensible solution to this!

Assuming you run your Django-server just with run() , by default this makes a single threaded server. If you use sleep on a single threaded process, the whole application freezes for that sleep time.

It may simply be that your browser is queuing the second request to be performed only after the first one completes. If you are opening your URLs in the same browser, try using the two different ones (e.g. Firefox and Chrome), or try performing requests from the command line using wget or curl instead.

app engine python urlfetch timing out

I have two instances of app engine applications running that I want to communicate with a Restful interface. Once the data of one is updated, it calls a web hook on the second which will retrieve a fresh copy of the data for it's own system.
Inside 'site1' i have:
from google.appengine.api import urlfetch
url = www.site2.com/data_updated
result = urlfetch.fetch(url)
Inside the handler for data_updated on 'site2' I have:
url = www.site1.com/get_new_data
result = urlfetch.fetch(url)
There is very little data being passed between the two sites but I receive the following error. I've tried increasing the deadline to 10 seconds but this still doesn't work.
DeadlineExceededError: ApplicationError: 5
Can anyone provide any insight into what might be happening?
Thanks - Richard

App Engine's urlfetch doesn't always behave as it is expected, you have about 10 seconds to fetch the URL. Assuming the URL you're trying to fetch is up and running, you should be able to catch the DeadlineExceededError by calling from google.appengine.runtime import apiproxy_errors and then wrapping the urlfetch call within a try/except block using except apiproxy_errors.DeadlineExceededError:.
Relevant answer here.

Changing the method
from
result = urlfetch.fetch(url)
to
result = urlfetch(url,deadline=2,method=urlfetch.POST)
has fixed the Deadline errors.
From the urlfetch documentation:
deadline
The maximum amount of time to wait for a response from the
remote host, as a number of seconds. If the remote host does not
respond in this amount of time, a DownloadError is raised.
Time spent waiting for a request does not count toward the CPU quota
for the request. It does count toward the request timer. If the app
request timer expires before the URL Fetch call returns, the call is
canceled.
The deadline can be up to a maximum of 60 seconds for request handlers
and 10 minutes for tasks queue and cron job handlers. If deadline is
None, the deadline is set to 5 seconds.

Have you tried manually querying the URLs (www.site2.com/data_updated and www.site1.com/get_new_data) with curl or otherwise to make sure that they're responding within the time limit? Even if the amount of data that needs to be transferred is small, maybe there's a problem with the handler that's causing a delay in returning the results.

The amount of data being transferred is not the problem here, the latency is.
If the app you are talking to is often taking > 10 secs to respond, you will have to use a "proxy callback" server on another cloud platform (EC2, etc.) If you can hold off for a while the new backend instances are supposed to relax the urlfetch time limits somewhat.
If the average response time is < 10 secs, and only a relatively few are failing, just retry a few times. I hope for your sake the calls are idempotent (i.e. so that a retry doesn't have adverse effects). If not, you might be able to roll your own layer on top - it's a bit painful but it works ok, it's what we do.
J

The GAE doc now states the deadline can be 60 sec:
result = urlfetch(url,deadline=60,method=urlfetch.POST)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.