speeding up scipy.integrate.quad()?

speeding up scipy.integrate.quad()? - python

I'm trying to speed up the following code which computes a sum of integrals. To get good accuracy I need to increase L_max but that also makes the execution time much longer. The specific case below computes 0.999999 of the probability curve and takes about 65 seconds. I have heard of cython and its ability to speed up code, but i dont know how to use it or how it could help in this case. any ideas?
import math
from scipy import integrate
import numpy
from decimal import *
import time
start_time=time.time()
getcontext().prec=100
################################
def pt(t):
first_term=math.exp(-lam*t)*((0.0001*I)**ni)*(t**(ni-1))*math.exp(-(0.0001*I)*t)/(math.factorial(ni-1))
sum_term=0.0
i=0
while i<ni:
sum_term=sum_term+((0.0001*I)**i)*(t**(i))*math.exp(-(0.0001*I)*t)/(math.factorial(i))
i=i+1
sum_term=lam*math.exp(-lam*t)*sum_term
total=first_term+sum_term
return total
#################################
def pLgt(t):
return Decimal(((Decimal((0.0001*O)*t))**Decimal(L))*Decimal(math.exp(-(0.0001*O)*t)))/Decimal((math.factorial(L)))
######################################
def pL_t(t):
return (pLgt(t))*Decimal(pt(t))
################################
lam=0.0001
beta=0.0001
ni=10
I=5969
O=48170
L_max=300.0
L=0.0
sum_term=0.0
sum_probability=0.0
while L<L_max:
probability=(integrate.quad(lambda t: pL_t(t),0,800))[0]
sum_probability=sum_probability+probability
sum_term=sum_term+L*probability
L=L+1.0
print time.time()-start_time
print sum_probability
print sum_term
print (sum_term-1)*0.46+6.5

Doing calculations in Decimal is probably slowing you down a lot without providing any benefit. Decimal calculations are much slower than float, ~100x slower, as noted by Kozyarchuk on Stack Overflow here. Using Decimal types in Numpy arrays keeps you from getting the speed benefits from Numpy.
Meanwhile, it's unclear to me that the results from scipy.integrate.quad would actually be of the level of precision you want; if you really do need arbitrary precision, you may have to write your quadrature code from scratch.
If you do need to use Decimal numbers, at least caching the Decimal representations of those numbers will provide you some speed benefit. That is, using
O=Decimal(48170)
L=Decimal(0.0)
and telling pLgt to just use O and L, will be faster.

The pL_t function looks like a sum of gamma distributions, in which case, you should be able to evaluate the integral as a sum of partial incomplete gamma functions: http://docs.scipy.org/doc/scipy/reference/generated/scipy.special.gdtr.html#scipy.special.gdtr

Related

process_time() function in python not working as expected

Can someone help me understand how process_time() works?
My code is
from time import process_time
t = process_time()
def fibonacci_of(n):
if n in cache: # Base case
return cache[n]
# Compute and cache the Fibonacci number
cache[n] = fibonacci_of(n - 1) + fibonacci_of(n - 2) # Recursive case
return cache[n]
cache = {0: 0, 1: 1}
fib = [fibonacci_of(n) for n in range(1500)]
print(fib[-1])
print(process_time() - t)
And last print is always 0.0.
My expected result is something like 0.764891862869
Docs at https://docs.python.org/3/library/time.html#time.process_time don't help newbie me :(
I tried some other functions and reading docs. But without success.

I'd assume this is OS dependent. Linux lets me get down to ~5 microseconds using process_time, but other operating systems might well not deal well with differences this small and return zero instead.
It's for this reason that Python exposes other timers that are designed to be more accurate over shorter time scales. Specifically, perf_counter is specified as using:
the highest available resolution to measure a short duration
Using this lets me measure down to ~80 nanoseconds, whether I'm using perf_counter or perf_counter_ns.

As documentation suggests:
time.process_time() → float
Return the value (in fractional seconds) of the sum of the system and user
CPU time of the current process. It does not include time
elapsed during sleep. It is process-wide by definition. The reference
point of the returned value is undefined, so that only the difference
between the results of two calls is valid.
Use process_time_ns() to avoid the precision loss caused by the float type.
This last sentence is the most important and differentiates between the very precise function: process_time_ns() and one that is less precise but more appropriate for the long-running processes - time.process_time().
It turns out that when you measure a couple nanoseconds (nano means 10**-9) and try to express it in seconds dividing by 10**9, you often go out of the precision of float (64 bits) and end up rounding up to zero. The float limitations are described in the Python documentation.
To get to know more you can also read a general introduction to precision in floating point arithmetic (ambitious) and its peril (caveats).

Running something in a method takes much more depending on the data types

Introduction
Today I found a weird behaviour in python while running some experiments with exponentiation and I was wondering if someone here knows what's happening. In my experiments, I was trying to check what is faster in python int**int or float**float. To check that I run some small snippets, and I found a really weird behaviour.
Weird results
My first approach was just to write some for loops and prints to check which one is faster. The snipper I used is this one
import time
# Run powers outside a method
ti = time.time()
for i in range(EXPERIMENTS):
x = 2**2
tf = time.time()
print(f"int**int took {tf-ti:.5f} seconds")
ti = time.time()
for i in range(EXPERIMENTS):
x = 2.**2.
tf = time.time()
print(f"float**float took {tf-ti:.5f} seconds")
After running it I got
int**int took 0.03004
float**float took 0.03070 seconds
Cool, it seems that data types do not affect the execution time. However, since I try to be a clean coder I refactored the repeated logic in a function power_time
import time
# Run powers in a method
def power_time(base, exponent):
ti = time.time()
for i in range(EXPERIMENTS):
x = base ** exponent
tf = time.time()
return tf-ti
print(f"int**int took {power_time(2, 2):.5f} seconds")
print(f"float**float took {power_time(2., 2.):5f} seconds")
And what a surprise of mine when I got these results
int**int took 0.20140 seconds
float**float took 0.05051 seconds
The refactor didn't affect a lot the float case, but it multiplied by ~7 the time required for the int case.
Conclusions and questions
Apparently, running something in a method can slow down your process depending on your data types, and that's really weird to me.
Also, if I run the same experiments but change ** by * or + the weird results disappear, and all the approaches give more or less the same results
Does someone know why is this happening? Am I missing something?

Apparently, running something in a method can slow down your process depending on your data types, and that's really weird to me.
It would be really weird if it was not like this! You can write your class that has it's own ** operator (through implementing the __pow__(self, other) method), and you could, for example, sleep 1s in there. Why should that take as long as taking a float to the power of another?
So, yeah, Python is a dynamically typed language. So, the operations done on data depend on the type of that data, and things can generally take different times.
In your first example, the difference never arises, because a) most probably the values get cached, because right after parsing it's clear that 2**2 is a constant and does not need to get evaluated every loop. Even if that's not the case, b) the time it costs to run a loop in python is hundreds of times that it takes to actually execute the math here – again, dynamically typed, dynamically named.
base**exponent is a whole different story. None about this is constant. So, there's actually going to be a calculation every iteration.
Now, the ** operator (__rpow__ in the Python data model) for Python's built-in float type is specified to do the float exponent (which is something implemented in highly optimized C and assembler), as exponentiation can elegantly be done on floating point numbers. Look for nb_power in cpython's floatobject.c. So, for the float case, the actual calculation is "free" for all that matters, again, because your loop is limited by how much effort it is to resolve all the names, types and functions to call in your loop. Not by doing the actual math, which is trivial.
The ** operator on Python's built-in int type is not as neatly optimized. It's a lot more complicated – it needs to do checks like "if the exponent is negative, return a float," and it does not do elementary math that your computer can do with a simple instruction, it handles arbitrary-length integers (remember, a python integer has as many bytes as it needs. You can save numbers that are larger than 64 bit in a Python integer!), which comes with allocation and deallocations. (I encourage you to read long_pow in CPython's longobject.c; it has 200 lines.)
All in all, integer exponentiation is expensive in python, because of python's type system.

For loop vs Numpy vectorization computation time

I was randomly comparing the computation times of an explicit for-loop with vectorized implementation in numpy. I ran exactly 1 million iterations and found some astounding differences. For-loop took about 646ms while the np.exp() function computed the same result in less than 20ms.
import time
import math
import numpy as np
iter = 1000000
x = np.zeros((iter,1))
v = np.random.randn(iter,1)
before = time.time()
for i in range(iter):
x[i] = math.exp(v[i])
after = time.time()
print(x)
print("Non vectorized= " + str((after-before)*1000) + "ms")
before = time.time()
x = np.exp(v)
after = time.time()
print(x)
print("Vectorized= " + str((after-before)*1000) + "ms")
The result I got:
[[0.9256753 ]
[1.2529006 ]
[3.47384978]
...
[1.14945181]
[0.80263805]
[1.1938528 ]]
Non vectorized= 646.1577415466309ms
[[0.9256753 ]
[1.2529006 ]
[3.47384978]
...
[1.14945181]
[0.80263805]
[1.1938528 ]]
Vectorized= 19.547224044799805ms
My questions are:
What exactly is happening in the second case? The first one is using
an explicit for-loop and thus the computation time is justified.
What is happening "behind the scenes" in the second case?
How can one implement such computations (second case) without using numpy (in plain Python)?

What is happening is that NumPy is calling high quality numerical libraries (BLAS for instance) which are very good at vector arithmetic.
I imagine you could specifically call the exact libraries used by NumPy, however, NumPy would likely know best which to use.

NumPy is a Python wrapper over libraries and code written in C. This is a large part of the efficiency of NumPy. C code compiles directly to instructions which are executed by your processor or GPU. On the other hand, Python code must be interpreted as it executes. Despite the ever increasing speed we can get from interpreted languages with advances like Just In Time Compilers, for some tasks they will never be able to approach the speed of compiled languages.

It comes down to the fact that Python does not have direct access to the hardware level.
Python can't use the SIMD (Single instruction, multiple data) assembly instructions that most modern CPU's and GPU's have. These SIMD instruction allow a single operation to execute on a vector of data all at once (within a single clock cycle) at the hardware level.
NumPy on the other hand has functions built in C, and C is a language capable of running SIMD instructions. Therefore NumPy can take advantage of the vectorization hardware in your processor.

Numeric function for log of sum in Python

Given log(a) and log(b), I want to compute log(a+b) (in a numerically stable way).
I wrote a little function for this:
def log_add(logA,logB):
if logA == log(0):
return logB
if logA<logB:
return log_add(logB,logA)
return log( 1 + math.exp(logB-logA) ) + logA
I wrote a program where this is by far the most time-consuming piece of code. Obviously I could try to optimize it (eliminate the recursive call, for instance).
Do you know of a standard math or numpy function for computing log(a+b) from log(a) and log(b)?
If not, do you know of a simple way to make a single C++ hook for this function? It's not a complicated function (it uses floats), and as I said, it's taking up the majority of my runtime.
Thanks in advance, numerical methods ninja!

Note: Best answer until now is to simply use numpy.logaddexp(logA,logB).
Why exactly do you compare with log(0)? This is equal to -numpy.inf, in this case you come to log(1 + math.exp(-inf-logB) ) + logB Which reduces itself to logB. This call always will give an warning message which is extremely slow.
I could come up with this one-liner. However you'll need to really measure to see if this is actually faster. It does only use one 'complex' calculation function instead of the two that you use, and no recursion is happening, the if is still there but hidden (and maybe optimized) in fabs/maximum.
def log_add(logA,logB):
return numpy.logaddexp(0,-numpy.fabs(logB-logA)) + numpy.maximum(logA,logB)
edit:
I did a quick timeit() with following results :
Your original version took about 120s
My version took about 30s
I removed the compare with log(0) from your version and it came down to 20s
I edited my code to keep the logaddexp but also worked with your recursive if and it went down to 18s.
Updated code, you could also switch the recursive call with an inline updated formula but this made little difference in my timing tests:
def log_add2(logA, logB):
if logA < logB:
return log_add2(logB, logA)
return numpy.logaddexp(0,logB-logA)+logA
Edit 2:
As pv noted in comments, you could actually just do numpy.logaddexp(logA, logB) which comes down to calculating log(exp(logA)+exp(logB)) which is of course equal to log(A+B). I timed it (on the same machine as above) and it went further down to about 10s. So we've come down to about 1/12, not bad ;).

def log_add(logA, logB):
return math.log(math.exp(logA) + math.exp(logB))
is too slow? or inaccurate?

Accurate signal delay calculation in Python

I'm trying to calculate the lag between two signals in Python using cross correlation. The two signals are almost identical except for a very small timelag. I've tried numpy.correlate and scipy.convolve (alot faster) and both works relatively well but gives a small error. I'm starting to suspect that the error is the result of Python/scipy/numpy truncating a float somewhere. Has anyone been able to get high accuracy signal delay calculations working in Python?
Best regards
Fredrik

Depending on the power spectrum of the two signals you do get a small error due to the fact that the cross correlation is not properly normalised at each lag. Here is a little function that I use; it normallises the overlap region at each lag and I found it gives accurate results:
def NormCrossCorrSlow(x1, x2,
nlags=400):
res=[]
for i in range(-(nlags/2),nlags/2,1):
if i<0:
xx1=x1[:i]
xx2=x2[-i:]
elif i==0:
xx1=x1
xx2=x2
else:
xx1=x1[i:]
xx2=x2[:-i]
res.append( (xx1*xx2).sum() /( (xx1**2).sum() *(xx2**2).sum() )**0.5)
return numpy.array(res)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.