I would like to profile a flask apps endpoint to see where it is slowing down when executing the endpoints functions. I have tried using Pycharms built-in profiler but the output tells me that most time is spent in the wait function i.e waiting for user input. I have tried installing the flask-profiler but was not able to set it up due to a project structure different than the package was expecting. Any help is appreciated. Thank you!
Werkzeug has a built in application profiler based on cProfile.
With help from this gist I managed to set it up as follows:
from flask import Flask
from werkzeug.middleware.profiler import ProfilerMiddleware
from time import sleep
app = Flask(__name__)
app.wsgi_app = ProfilerMiddleware(app.wsgi_app)
#app.route('/')
def index():
print ('begin')
sleep(3)
print ('end')
return 'success'
A request to this endpoint, results in the following summary in the termainl:
* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
begin
end
--------------------------------------------------------------------------------
PATH: '/'
298 function calls in 2.992 seconds
Ordered by: internal time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
1 2.969 2.969 2.969 2.969 {built-in method time.sleep}
1 0.002 0.002 0.011 0.011 /usr/local/lib/python3.7/site-packages/flask/app.py:1955(finalize_request)
1 0.002 0.002 0.008 0.008 /usr/local/lib/python3.7/site-packages/werkzeug/wrappers/base_response.py:173(__init__)
35 0.002 0.000 0.002 0.000 {built-in method builtins.isinstance}
4 0.001 0.000 0.001 0.000 /usr/local/lib/python3.7/site-packages/werkzeug/datastructures.py:910(_unicodify_header_value)
2 0.001 0.000 0.003 0.002 /usr/local/lib/python3.7/site-packages/werkzeug/datastructures.py:1298(__setitem__)
1 0.001 0.001 0.001 0.001 /usr/local/lib/python3.7/site-packages/werkzeug/datastructures.py:960(__getitem__)
6 0.001 0.000 0.001 0.000 /usr/local/lib/python3.7/site-packages/werkzeug/_compat.py:210(to_unicode)
2 0.000 0.000 0.002 0.001 /usr/local/lib/python3.7/site-packages/werkzeug/datastructures.py:1212(set)
4 0.000 0.000 0.000 0.000 {method 'decode' of 'bytes' objects}
1 0.000 0.000 0.002 0.002 /usr/local/lib/python3.7/site-packages/werkzeug/wrappers/base_response.py:341(set_data)
10 0.000 0.000 0.001 0.000 /usr/local/lib/python3.7/site-packages/werkzeug/local.py:70(__getattr__)
8 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
1 0.000 0.000 0.008 0.008 /usr/local/lib/python3.7/site-packages/flask/app.py:2029(make_response)
1 0.000 0.000 0.004 0.004 /usr/local/lib/python3.7/site-packages/werkzeug/routing.py:1551(bind_to_environ)
1 0.000 0.000 0.000 0.000 /usr/local/lib/python3.7/site-packages/werkzeug/_internal.py:67(_get_environ)
1 0.000 0.000 0.001 0.001 /usr/local/lib/python3.7/site-packages/werkzeug/routing.py:1674(__init__)
[snipped for berevity]
You could limit the results down slightly by passing a restrictions argument:
restrictions (Iterable[Union[str, int, float]]) – A tuple of restrictions to filter stats by. See pstats.Stats.print_stats().
So, for example if you were interested in the python file living at /code/app.py specifically, you could instead define the profiler like:
app.wsgi_app = ProfilerMiddleware(app.wsgi_app, restrictions=('/code/app.py',))
Resulting in the output:
* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
begin
end
--------------------------------------------------------------------------------
PATH: '/'
300 function calls in 3.016 seconds
Ordered by: internal time, call count
List reduced from 131 to 2 due to restriction <'/code/app.py'>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 3.007 3.007 /code/app.py:12(index)
1 0.000 0.000 2.002 2.002 /code/app.py:9(slower)
--------------------------------------------------------------------------------
With some tweaking this could prove useful to solve your issue.
Related
I ran this code with python 3.7 to see what happens when I call import numpy.
import cProfile, pstats
profiler = cProfile.Profile()
profiler.enable()
import numpy
profiler.disable()
# Get and print table of stats
stats = pstats.Stats(profiler).sort_stats('time')
stats.print_stats()
The first few lines of output look like this:
79557 function calls (76496 primitive calls) in 0.120 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
32/30 0.015 0.000 0.017 0.001 {built-in method _imp.create_dynamic}
318 0.015 0.000 0.015 0.000 {built-in method builtins.compile}
115 0.011 0.000 0.011 0.000 {built-in method marshal.loads}
648 0.006 0.000 0.006 0.000 {built-in method posix.stat}
119 0.004 0.000 0.005 0.000 <frozen importlib._bootstrap_external>:914(get_data)
246/244 0.004 0.000 0.007 0.000 {built-in method builtins.__build_class__}
329 0.002 0.000 0.012 0.000 <frozen importlib._bootstrap_external>:1356(find_spec)
59 0.002 0.000 0.002 0.000 {built-in method posix.getcwd}
It spends a lot of time on builtins.compile. Is it creating the bytecode for NumPy for pycache? Why would that happen every time?
I'm on Mac OS. What I really want is to speed up the import, and it seems to me that compile should not be necessary.
User L3viathan pointed out in a comment that the code for numpy contains explicit calls to compile. This explains why builtins.compile is getting called. Thanks!
lovers,
When running "cProfile" in "IPython" I can't get the "sort_order" option to work, in contrast to running the equivalent code in the system shell (which I've redirected to a file, to be able to see the first lines of the output). What am I missing?
E.g. when running the following code:
%run -m cProfile -s cumulative myscript.py
gives me the following output (Ordered by: standard name):
9885548 function calls (9856804 primitive calls) in 17.054 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 <string>:1(DeprecatedOption)
1 0.000 0.000 0.000 0.000 <string>:1(RegisteredOption)
6 0.000 0.000 0.001 0.000 <string>:1(non_reentrant)
1 0.000 0.000 0.000 0.000 <string>:2(<module>)
32 0.000 0.000 0.000 0.000 <string>:8(__new__)
1 0.000 0.000 0.000 0.000 ImageFilter.py:106(MinFilter)
1 0.000 0.000 0.000 0.000 ImageFilter.py:122(MaxFilter)
1 0.000 0.000 0.000 0.000 ImageFilter.py:140(ModeFilter)
... rest omitted
The IMO equivalent code run from the system shell (Win7):
python -m cProfile -s cumulative myscript.py > outputfile.txt
gives me the following sorted output:
9997772 function calls (9966740 primitive calls) in 17.522 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.116 0.116 17.531 17.531 reprep.py:1(<module>)
6 0.077 0.013 11.700 1.950 reprep.py:837(add_biorep2treatment)
9758 0.081 0.000 6.927 0.001 ops.py:538(wrapper)
33592 0.100 0.000 4.209 0.000 frame.py:1635(__getitem__)
23918 0.010 0.000 3.834 0.000 common.py:111(isnull)
23918 0.041 0.000 3.823 0.000 common.py:128(_isnull_new)
... rest omitted
I also noticed that there is a difference in the number of function calls. Why?
I'm running Python 2.7.6 64bit (from Enthought) and have made sure that the exact same version of python are used for both executions (though of course the first one has an additional "IPython" "layer").
I know I've got a working solution, but the interactive version would be a time saver and I would like to understand why there's a difference.
Thank you for your time and help!!
%run has some options for profiling. Actually from the docs for %prun:
If you want to run complete programs under the profiler's control, use
%run -p [prof_opts] filename.py [args to program] where prof_opts
contains profiler specific options as described here.
Is probably a better way to do it.
I have a Python script in a file which takes just over 30 seconds to run. I am trying to profile it as I would like to cut down this time dramatically.
I am trying to profile the script using cProfile, but essentially all it seems to be telling me is that yes, the main script took a long time to run, but doesn't give the kind of breakdown I was expecting. At the terminal, I type something like:
cat my_script_input.txt | python -m cProfile -s time my_script.py
The results I get are:
<my_script_output>
683121 function calls (682169 primitive calls) in 32.133 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 31.980 31.980 32.133 32.133 my_script.py:18(<module>)
121089 0.050 0.000 0.050 0.000 {method 'split' of 'str' objects}
121090 0.038 0.000 0.049 0.000 fileinput.py:243(next)
2 0.027 0.014 0.036 0.018 {method 'sort' of 'list' objects}
121089 0.009 0.000 0.009 0.000 {method 'strip' of 'str' objects}
201534 0.009 0.000 0.009 0.000 {method 'append' of 'list' objects}
100858 0.009 0.000 0.009 0.000 my_script.py:51(<lambda>)
952 0.008 0.000 0.008 0.000 {method 'readlines' of 'file' objects}
1904/952 0.003 0.000 0.011 0.000 fileinput.py:292(readline)
14412 0.001 0.000 0.001 0.000 {method 'add' of 'set' objects}
182 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
1 0.000 0.000 0.000 0.000 fileinput.py:80(<module>)
1 0.000 0.000 0.000 0.000 fileinput.py:197(__init__)
1 0.000 0.000 0.000 0.000 fileinput.py:266(nextfile)
1 0.000 0.000 0.000 0.000 {isinstance}
1 0.000 0.000 0.000 0.000 fileinput.py:91(input)
1 0.000 0.000 0.000 0.000 fileinput.py:184(FileInput)
1 0.000 0.000 0.000 0.000 fileinput.py:240(__iter__)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
This doesn't seem to be telling me anything useful. The vast majority of the time is simply listed as:
ncalls tottime percall cumtime percall filename:lineno(function)
1 31.980 31.980 32.133 32.133 my_script.py:18(<module>)
In my_script.py, Line 18 is nothing more than the closing """ of the file's header block comment, so it's not that there is a whole load of work concentrated in Line 18. The script as a whole is mostly made up of line-based processing with mostly some string splitting, sorting and set work, so I was expecting to find the majority of time going to one or more of these activities. As it stands, seeing all the time grouped in cProfile's results as occurring on a comment line doesn't make any sense or at least does not shed any light on what is actually consuming all the time.
EDIT: I've constructed a minimum working example similar to my above case to demonstrate the same behavior:
mwe.py
import fileinput
for line in fileinput.input():
for i in range(10):
y = int(line.strip()) + int(line.strip())
And call it with:
perl -e 'for(1..1000000){print "$_\n"}' | python -m cProfile -s time mwe.py
To get the result:
22002536 function calls (22001694 primitive calls) in 9.433 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 8.004 8.004 9.433 9.433 mwe.py:1(<module>)
20000000 1.021 0.000 1.021 0.000 {method 'strip' of 'str' objects}
1000001 0.270 0.000 0.301 0.000 fileinput.py:243(next)
1000000 0.107 0.000 0.107 0.000 {range}
842 0.024 0.000 0.024 0.000 {method 'readlines' of 'file' objects}
1684/842 0.007 0.000 0.032 0.000 fileinput.py:292(readline)
1 0.000 0.000 0.000 0.000 fileinput.py:80(<module>)
1 0.000 0.000 0.000 0.000 fileinput.py:91(input)
1 0.000 0.000 0.000 0.000 fileinput.py:197(__init__)
1 0.000 0.000 0.000 0.000 fileinput.py:184(FileInput)
1 0.000 0.000 0.000 0.000 fileinput.py:266(nextfile)
1 0.000 0.000 0.000 0.000 {isinstance}
1 0.000 0.000 0.000 0.000 fileinput.py:240(__iter__)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Am I using cProfile incorrectly somehow?
As I mentioned in a comment, when you can't get cProfile to work externally, you can often use it internally instead. It's not that hard.
For example, when I run with -m cProfile in my Python 2.7, I get effectively the same results you did. But when I manually instrument your example program:
import fileinput
import cProfile
pr = cProfile.Profile()
pr.enable()
for line in fileinput.input():
for i in range(10):
y = int(line.strip()) + int(line.strip())
pr.disable()
pr.print_stats(sort='time')
… here's what I get:
22002533 function calls (22001691 primitive calls) in 3.352 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
20000000 2.326 0.000 2.326 0.000 {method 'strip' of 'str' objects}
1000001 0.646 0.000 0.700 0.000 fileinput.py:243(next)
1000000 0.325 0.000 0.325 0.000 {range}
842 0.042 0.000 0.042 0.000 {method 'readlines' of 'file' objects}
1684/842 0.013 0.000 0.055 0.000 fileinput.py:292(readline)
1 0.000 0.000 0.000 0.000 fileinput.py:197(__init__)
1 0.000 0.000 0.000 0.000 fileinput.py:91(input)
1 0.000 0.000 0.000 0.000 {isinstance}
1 0.000 0.000 0.000 0.000 fileinput.py:266(nextfile)
1 0.000 0.000 0.000 0.000 fileinput.py:240(__iter__)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
That's a lot more useful: It tells you what you probably already expected, that more than half your time is spent calling str.strip().
Also, note that if you can't edit the file containing code you wish to profile (mwe.py), you can always do this:
import cProfile
pr = cProfile.Profile()
pr.enable()
import mwe
pr.disable()
pr.print_stats(sort='time')
Even that doesn't always work. If your program calls exit(), for example, you'll have to use a try:/finally: wrapper and/or an atexit. And it it calls os._exit(), or segfaults, you're probably completely hosed. But that isn't very common.
However, something I discovered later: If you move all code out of the global scope, -m cProfile seems to work, at least for this case. For example:
import fileinput
def f():
for line in fileinput.input():
for i in range(10):
y = int(line.strip()) + int(line.strip())
f()
Now the output from -m cProfile includes, among other things:
2000000 4.819 0.000 4.819 0.000 :0(strip)
100001 0.288 0.000 0.295 0.000 fileinput.py:243(next)
I have no idea why this also made it twice as slow… or maybe that's just a cache effect; it's been a few minutes since I last ran it, and I've done lots of web browsing in between. But that's not important, what's important is that most of the time is getting charged to reasonable places.
But if I change this to move the outer loop to the global level, and only its body into a function, most of the time disappears again.
Another alternative, which I wouldn't suggest except as a last resort…
I notice that if I use profile instead of cProfile, it works both internally and externally, charging time to the right calls. However, those calls are also about 5x slower. And there seems to be an additional 10 seconds of constant overhead (which gets charged to import profile if used internally, whatever's on line 1 if used externally). So, to find out that split is using 70% of my time, instead of waiting 4 seconds and doing 2.326 / 3.352, I have to wait 27 seconds, and do 10.93 / (26.34 - 10.01). Not much fun…
One last thing: I get the same results with a CPython 3.4 dev build—correct results when used internally, everything charged to the first line of code when used externally. But PyPy 2.2/2.7.3 and PyPy3 2.1b1/3.2.3 both seem to give me correct results with -m cProfile. This may just mean that PyPy's cProfile is faked on top of profile because the pure-Python code is fast enough.
Anyway, if someone can figure out/explain why -m cProfile isn't working, that would be great… but otherwise, this is usually a perfectly good workaround.
When running:
./manage.py test appname
How can I disable all the stats/logging/output after "OK"?
I have already commented out the entire logging section - no luck.
Also commented out any print_stat calls - no luck
my manage.py is pretty bare so it likely isn't that.
I run many tests and constantly have to scroll up thousands of terminal lines to view results.
Clearly, I am new to Python/Django and it's testing framework, so I would appreciate any help.
----------------------------------------------------------------------
Ran 2 tests in 2.133s
OK
1933736 function calls (1929454 primitive calls) in 2.133 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 2.133 2.133 <string>:1(<module>)
30 0.000 0.000 0.000 0.000 <string>:8(__new__)
4 0.000 0.000 0.000 0.000 Cookie.py:315(_quote)
26 0.000 0.000 0.000 0.000 Cookie.py:333(_unquote)
10 0.000 0.000 0.000 0.000 Cookie.py:432(__init__)
28 0.000 0.000 0.000 0.000 Cookie.py:441(__setitem__)
.
.
.
2 0.000 0.000 0.000 0.000 {time.gmtime}
18 0.000 0.000 0.000 0.000 {time.localtime}
18 0.000 0.000 0.000 0.000 {time.strftime}
295 0.000 0.000 0.000 0.000 {time.time}
556 0.000 0.000 0.000 0.000 {zip}
If it helps, I am importing:
from django.utils import unittest
class TestEmployeeAdd(unittest.TestCase):
def setUp(self):
If you use a unix-like shell (Mac does) you can use the head command to do the trick like this:
python manage.py test appname | head -n 3
Switch the number 3 for the one you need to truncate the output after the OK line.
Also you can test if you like more the output yielded by setting the verbosity of the command to minimal like this:
python manage.py test appname -v 0
Hope this helps!
I am testing dynamodb via boto and have found it to be surprisingly slow in retrieving data sets based on hashkey, rangekey condition queries. I have seen some discussion about the oddity that causes ssl (is_secure) to perform about 6x faster then non-ssl and I can confirm that finding. But even using ssl I am seeing 1-2 seconds to retrieve 300 records using a hashkey/range key condition on a fairly small data set (less then 1K records).
Running profilehooks profiler I see a lot of extraneous time spent in ssl.py to the order of 20617 ncalls to retrieve the 300 records. It seems like even at 10 calls per record it's still 6x more then I would expect. This is on a medium instance-- though the same results occur on a micro instance. 500 Reads/sec 1000 writes/sec provisioning with no throttles logged.
I have looked at doing a batch request but the inability to use range key conditions eliminates that option for me.
Any ideas on where I'm loosing time would be greatly appreciated!!
144244 function calls in 2.083 CPU seconds
Ordered by: cumulative time, internal time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 2.083 2.083 eventstream.py:427(session_range)
107 0.006 0.000 2.081 0.019 dynamoDB.py:36(rangeQ)
408 0.003 0.000 2.073 0.005 layer2.py:493(query)
107 0.001 0.000 2.046 0.019 layer1.py:435(query)
107 0.002 0.000 2.040 0.019 layer1.py:119(make_request)
107 0.006 0.000 1.988 0.019 connection.py:699(_mexe)
107 0.001 0.000 1.916 0.018 httplib.py:956(getresponse)
107 0.002 0.000 1.913 0.018 httplib.py:384(begin)
662 0.049 0.000 1.888 0.003 socket.py:403(readline)
20617 0.040 0.000 1.824 0.000 ssl.py:209(recv)
20617 0.036 0.000 1.785 0.000 ssl.py:130(read)
20617 1.748 0.000 1.748 0.000 {built-in method read}
107 0.002 0.000 1.738 0.016 httplib.py:347(_read_status)
107 0.001 0.000 0.170 0.002 mimetools.py:24(__init__)
107 0.000 0.000 0.165 0.002 rfc822.py:88(__init__)
107 0.007 0.000 0.165 0.002 httplib.py:230(readheaders)
107 0.001 0.000 0.031 0.000 __init__.py:332(loads)
107 0.001 0.000 0.028 0.000 decoder.py:397(decode)
107 0.008 0.000 0.026 0.000 decoder.py:408(raw_decode)
107 0.001 0.000 0.026 0.000 httplib.py:910(request)
107 0.003 0.000 0.026 0.000 httplib.py:922(_send_request)
107 0.001 0.000 0.025 0.000 connection.py:350(authorize)
107 0.004 0.000 0.024 0.000 auth.py:239(add_auth)
3719 0.011 0.000 0.019 0.000 layer2.py:31(item_object_hook)
301 0.010 0.000 0.018 0.000 item.py:38(__init__)
22330 0.015 0.000 0.015 0.000 {method 'append' of 'list' objects}
107 0.001 0.000 0.012 0.000 httplib.py:513(read)
214 0.001 0.000 0.011 0.000 httplib.py:735(send)
856 0.002 0.000 0.010 0.000 __init__.py:1034(debug)
214 0.001 0.000 0.009 0.000 ssl.py:194(sendall)
107 0.000 0.000 0.008 0.000 httplib.py:900(endheaders)
107 0.001 0.000 0.008 0.000 httplib.py:772(_send_output)
107 0.001 0.000 0.008 0.000 auth.py:223(string_to_sign)
856 0.002 0.000 0.008 0.000 __init__.py:1244(isEnabledFor)
137 0.001 0.000 0.008 0.000 httplib.py:603(_safe_read)
214 0.001 0.000 0.007 0.000 ssl.py:166(send)
214 0.007 0.000 0.007 0.000 {built-in method write}
3311 0.006 0.000 0.006 0.000 item.py:186(__setitem__)
107 0.001 0.000 0.006 0.000 auth.py:95(sign_string)
137 0.001 0.000 0.006 0.000 socket.py:333(read)
This isn't a complete answer but I thought it was worth posting it at this time.
I've heard reports like this from a couple of people over the last few weeks. I was able to reproduce the anomaly of HTTPS being considerably faster than HTTP but wasn't able to track it down. It seemed like that problem was unique to Python/boto but it turns out the same issue was found on C#/.Net and investigating that it was found that the underlying problem was the use of the Nagle's algorithm in the Python and .Net libraries. In .Net, it's easy to turn this off but it's not as easy in Python, unfortunately.
To test this, I wrote a simple script that performed 1000 GetItem requests in a loop. The item that was being fetch was very small, well under 1K. Running this on Python 2.6.7 on an m1.medium instance in the us-east-1 region produced these results:
>>> http_data = speed_test(False, 1000)
dynamoDB_speed_test - RUNTIME = 53.120193
Throttling exceptions: 0
>>> https_data = speed_test(True, 1000)
dynamoDB_speed_test - RUNTIME = 8.167652
Throttling exceptions: 0
Note that there is sufficient provisioned capacity in the table to avoid any throttling from the service and the unexpected gap between HTTP and HTTPS is clear.
I next ran the same test in Python 2.7.2:
>>> http_data = speed_test(False, 1000)
dynamoDB_speed_test - RUNTIME = 5.668544
Throttling exceptions: 0
>>> https_data = speed_test(True, 1000)
dynamoDB_speed_test - RUNTIME = 7.425210
Throttling exceptions: 0
So, 2.7 seems to have fixed this issue. I then applied a simple patch to httplib.py in 2.6.7. The patch simply sets the TCP_NO_DELAY property of the socket associated with the HTTPConnection object, like this:
self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
I then re-ran the test on 2.6.7:
>>> http_data = speed_test(False, 1000)
dynamoDB_speed_test - RUNTIME = 5.914109
Throttling exceptions: 0
>>> https_data = speed_test(True, 1000)
dynamoDB_speed_test - RUNTIME = 5.137570
Throttling exceptions: 0
Even better although still an expectedly faster time with HTTPS than HTTP. It's hard to know whether that difference is significant or not.
So, I'm looking for ways to programmatically configure the socket for HTTPConnection objects to have TCP_NO_DELAY configured correctly. It's not easy to get at that in the httplib.py. My best advice for the moment is to use Python 2.7, if possible.