I am profiling a simple Python function such as
def prof_func():
x = 100
y = 100
a = np.random.rand(x,y)
c = signal.convolve2d(a, a, boundary='symm', mode='same')
using cProfile but the times I get out only have a resolution of three decimal places.
How can I have cProfile print a greater number of decimal places so I can see how long the a = np.random.rand(x,y) line takes. Currently cProfile tells me that it takes 0.000s.
I had hoped that there would be an easy way to increase the resolution of the cProfile output but I have checked the documentation and not found one: https://docs.python.org/3/library/profile.html
I just found your question when searching for a solution to the same problem :)
In case might be useful, I share the workaround I used to get more significant figures using cprofile and pstats.
The pstats.Stats.print_stats() method uses a function called f8 (see 1), that returns the numbers with a fixed format.
By defining a similar function in your script where the profiling is made, such as:
def f8_alt(x)
return "%14.9f" % x
and monkey patching the static method
pstats.f8 = f8_alt
the print_stats() method will return the output with more decimals.
Hope it helps!
https://github.com/python/cpython/blob/main/Lib/pstats.py
Related
Can someone help me understand how process_time() works?
My code is
from time import process_time
t = process_time()
def fibonacci_of(n):
if n in cache: # Base case
return cache[n]
# Compute and cache the Fibonacci number
cache[n] = fibonacci_of(n - 1) + fibonacci_of(n - 2) # Recursive case
return cache[n]
cache = {0: 0, 1: 1}
fib = [fibonacci_of(n) for n in range(1500)]
print(fib[-1])
print(process_time() - t)
And last print is always 0.0.
My expected result is something like 0.764891862869
Docs at https://docs.python.org/3/library/time.html#time.process_time don't help newbie me :(
I tried some other functions and reading docs. But without success.
I'd assume this is OS dependent. Linux lets me get down to ~5 microseconds using process_time, but other operating systems might well not deal well with differences this small and return zero instead.
It's for this reason that Python exposes other timers that are designed to be more accurate over shorter time scales. Specifically, perf_counter is specified as using:
the highest available resolution to measure a short duration
Using this lets me measure down to ~80 nanoseconds, whether I'm using perf_counter or perf_counter_ns.
As documentation suggests:
time.process_time() → float
Return the value (in fractional seconds) of the sum of the system and user
CPU time of the current process. It does not include time
elapsed during sleep. It is process-wide by definition. The reference
point of the returned value is undefined, so that only the difference
between the results of two calls is valid.
Use process_time_ns() to avoid the precision loss caused by the float type.
This last sentence is the most important and differentiates between the very precise function: process_time_ns() and one that is less precise but more appropriate for the long-running processes - time.process_time().
It turns out that when you measure a couple nanoseconds (nano means 10**-9) and try to express it in seconds dividing by 10**9, you often go out of the precision of float (64 bits) and end up rounding up to zero. The float limitations are described in the Python documentation.
To get to know more you can also read a general introduction to precision in floating point arithmetic (ambitious) and its peril (caveats).
Introduction
Today I found a weird behaviour in python while running some experiments with exponentiation and I was wondering if someone here knows what's happening. In my experiments, I was trying to check what is faster in python int**int or float**float. To check that I run some small snippets, and I found a really weird behaviour.
Weird results
My first approach was just to write some for loops and prints to check which one is faster. The snipper I used is this one
import time
# Run powers outside a method
ti = time.time()
for i in range(EXPERIMENTS):
x = 2**2
tf = time.time()
print(f"int**int took {tf-ti:.5f} seconds")
ti = time.time()
for i in range(EXPERIMENTS):
x = 2.**2.
tf = time.time()
print(f"float**float took {tf-ti:.5f} seconds")
After running it I got
int**int took 0.03004
float**float took 0.03070 seconds
Cool, it seems that data types do not affect the execution time. However, since I try to be a clean coder I refactored the repeated logic in a function power_time
import time
# Run powers in a method
def power_time(base, exponent):
ti = time.time()
for i in range(EXPERIMENTS):
x = base ** exponent
tf = time.time()
return tf-ti
print(f"int**int took {power_time(2, 2):.5f} seconds")
print(f"float**float took {power_time(2., 2.):5f} seconds")
And what a surprise of mine when I got these results
int**int took 0.20140 seconds
float**float took 0.05051 seconds
The refactor didn't affect a lot the float case, but it multiplied by ~7 the time required for the int case.
Conclusions and questions
Apparently, running something in a method can slow down your process depending on your data types, and that's really weird to me.
Also, if I run the same experiments but change ** by * or + the weird results disappear, and all the approaches give more or less the same results
Does someone know why is this happening? Am I missing something?
Apparently, running something in a method can slow down your process depending on your data types, and that's really weird to me.
It would be really weird if it was not like this! You can write your class that has it's own ** operator (through implementing the __pow__(self, other) method), and you could, for example, sleep 1s in there. Why should that take as long as taking a float to the power of another?
So, yeah, Python is a dynamically typed language. So, the operations done on data depend on the type of that data, and things can generally take different times.
In your first example, the difference never arises, because a) most probably the values get cached, because right after parsing it's clear that 2**2 is a constant and does not need to get evaluated every loop. Even if that's not the case, b) the time it costs to run a loop in python is hundreds of times that it takes to actually execute the math here – again, dynamically typed, dynamically named.
base**exponent is a whole different story. None about this is constant. So, there's actually going to be a calculation every iteration.
Now, the ** operator (__rpow__ in the Python data model) for Python's built-in float type is specified to do the float exponent (which is something implemented in highly optimized C and assembler), as exponentiation can elegantly be done on floating point numbers. Look for nb_power in cpython's floatobject.c. So, for the float case, the actual calculation is "free" for all that matters, again, because your loop is limited by how much effort it is to resolve all the names, types and functions to call in your loop. Not by doing the actual math, which is trivial.
The ** operator on Python's built-in int type is not as neatly optimized. It's a lot more complicated – it needs to do checks like "if the exponent is negative, return a float," and it does not do elementary math that your computer can do with a simple instruction, it handles arbitrary-length integers (remember, a python integer has as many bytes as it needs. You can save numbers that are larger than 64 bit in a Python integer!), which comes with allocation and deallocations. (I encourage you to read long_pow in CPython's longobject.c; it has 200 lines.)
All in all, integer exponentiation is expensive in python, because of python's type system.
This function is working fine but it takes too much time to solve. Please suggest me how to improve the solving time.
from sympy.solvers import solve
from sympy import Symbol
QD = 25.45
CDI = 0.65
AIN = 33.6
GTL = 10
GTSELV = 2300.1
CDGT = 1.9
def fun(HWE):
TWE = Symbol('TWE')
expression = (CDI*AIN*(2*9.81*(HWE-TWE))**0.5) - (CDGT*GTL*(TWE-GTSELV)**1.5)-QD
solution = solve(expression)
return solution
Function fun(2303) gives [2302.23386564786] which is correct but solving time is about 30 seconds. I need to run this for many arguments.
The dReal system can handle these sorts of problems, using the notion of delta-satisfiability. (See http://dreal.github.io for details.)
This is how your program is coded using dReal's Python interface (To install, see the notes at https://github.com/dreal/dreal4#python-binding):
from dreal import *
QD = 25.45
CDI = 0.65
AIN = 33.6
GTL = 10
GTSELV = 2300.1
CDGT = 1.9
def fun(HWE):
TWE = Variable("TWE")
expression = (CDI*AIN*(2*9.81*(HWE-TWE))**0.5) - (CDGT*GTL*(TWE-GTSELV)**1.5)-QD
return (expression == 0)
result = CheckSatisfiability(fun(2303), 0.001)
print(result)
When I run it on my now 3 year old computer, I get:
$ time python a.py
TWE : [2302.2338656478555, 2302.2338656478582]
python3 a.py 0.03s user 0.01s system 92% cpu 0.044 total
So, it takes about 0.044 seconds to go through, which does include loading the entire Python echo-system. (So, if you run many problems one after another, each instance should go even faster.)
Note that dReal shows you an interval for the acceptable solution, within a user-specified numerical error bound. The bound is the second argument to CheckSatisfiability, which we set at 0.001 for this problem. You can increase this precision at the cost of potentially more computation time, but looks like 0.001 seems to be doing quite well in this case. Also note that you get an "interval" for the solution for each variable. If you increase the precision, this interval might get smaller. For instance, when I change the call to:
result = CheckSatisfiability(fun(2303), 0.0000000000001)
I get:
$ time python a.py
TWE : [2302.2338656478569, 2302.2338656478569]
python3 a.py 0.03s user 0.01s system 84% cpu 0.050 total
where the interval has reduced to a single-point, but the program took slightly longer to run. For each problem, you should experiment with an appropriate delta to make sure the interval you get for the results is reasonable.
Use solve when you want a solution in terms of symbols. Use nsolve when you want a numerical solution. In your case, replace solve with nsolve and add the argument HWE to the call statement (i.e. nsolve(expression, HWE)). That 2nd argument is a guess at where the solution is near. Alternatively, give fun a 2nd arg guess and use that as the 2nd arg for nsolve. If you slowly change some parameter, using the last solution as the guess for the next solution will speed up the process (which is already quite fast).
If you know that the solution is real then you might want to take the real part of it with re(solution) since the solution comes back with a small imaginary component.
So I recently decided to learn python and as a exercise (plus making something useful) I decided to make a Euler's Modified Method algorithm for solving higher-then-first order differential equations. An example input would be:
python script_name.py -y[0] [10,0]
where the first argument is the deferential equation (here: y''=-y), and the second one the initial conditions (here: y(0)=10, y'(0)=0). It is then meant to out put the resusts to two files (x-data.txt, and y-data.txt).
Heres the problem:
When in run the code with the specified the final line (at t=1) reads -0.0, but if you solve the ODE (y=10*cos(x)), it should read 5.4. Even if you go through the program with a pen and paper and execute the code, your (and the computers) results apart to diverge by the second iteration). Any idea what could have caused this?
NB: I'm using python 2.7 on a os x
Here's my code:
#! /usr/bin/python
# A higher order differential equation solver using Euler's Modified Method
import math
import sys
step_size = 0.01
x=0
x_max=1
def derivative(x, y):
d = eval(sys.argv[1])
return d
y=eval(sys.argv[2])
order = len(y)
y_derivative=y
xfile = open('x-data.txt','w+')
yfile = open('y-data.txt','w+')
while (x<x_max):
xfile.write(str(x)+"\n")
yfile.write(str(y[0])+"\n")
for i in range(order-1):
y_derivative[i]=y[(i+1)]
y_derivative[(order-1)] = derivative(x,y)
for i in range(order):
y[i]=y[i]+step_size*y_derivative[i]
x=x+step_size
xfile.close()
yfile.close()
print('done')
When you say y_derivative=y they are the SAME list with different names. I.e. when you change y_derivative[i]=y[i+1] both lists are changing. You want to use y_derivative=y[:] to make a copy of y to put in y_derivative
See How to clone or copy a list? for more info
Also see http://effbot.org/zone/python-list.htm
Note, I was able to debug this in IDLE by replacing sys.argv with your provided example. Then if you turn on the debugger and step through the code, you can see both lists change.
Given log(a) and log(b), I want to compute log(a+b) (in a numerically stable way).
I wrote a little function for this:
def log_add(logA,logB):
if logA == log(0):
return logB
if logA<logB:
return log_add(logB,logA)
return log( 1 + math.exp(logB-logA) ) + logA
I wrote a program where this is by far the most time-consuming piece of code. Obviously I could try to optimize it (eliminate the recursive call, for instance).
Do you know of a standard math or numpy function for computing log(a+b) from log(a) and log(b)?
If not, do you know of a simple way to make a single C++ hook for this function? It's not a complicated function (it uses floats), and as I said, it's taking up the majority of my runtime.
Thanks in advance, numerical methods ninja!
Note: Best answer until now is to simply use numpy.logaddexp(logA,logB).
Why exactly do you compare with log(0)? This is equal to -numpy.inf, in this case you come to log(1 + math.exp(-inf-logB) ) + logB Which reduces itself to logB. This call always will give an warning message which is extremely slow.
I could come up with this one-liner. However you'll need to really measure to see if this is actually faster. It does only use one 'complex' calculation function instead of the two that you use, and no recursion is happening, the if is still there but hidden (and maybe optimized) in fabs/maximum.
def log_add(logA,logB):
return numpy.logaddexp(0,-numpy.fabs(logB-logA)) + numpy.maximum(logA,logB)
edit:
I did a quick timeit() with following results :
Your original version took about 120s
My version took about 30s
I removed the compare with log(0) from your version and it came down to 20s
I edited my code to keep the logaddexp but also worked with your recursive if and it went down to 18s.
Updated code, you could also switch the recursive call with an inline updated formula but this made little difference in my timing tests:
def log_add2(logA, logB):
if logA < logB:
return log_add2(logB, logA)
return numpy.logaddexp(0,logB-logA)+logA
Edit 2:
As pv noted in comments, you could actually just do numpy.logaddexp(logA, logB) which comes down to calculating log(exp(logA)+exp(logB)) which is of course equal to log(A+B). I timed it (on the same machine as above) and it went further down to about 10s. So we've come down to about 1/12, not bad ;).
def log_add(logA, logB):
return math.log(math.exp(logA) + math.exp(logB))
is too slow? or inaccurate?