numpy polyfit alternative for speed - python

I am utilizing numpy's polyfit a numerous amount of times to run calculations and get the slope between two datasets. However, the speed at which it performs these calculations are not fast enough for what would be desired.
Two things to note about the calculations:
The value for x in the call numpy.polyfit(x,y,n) will always be the same slope value, and
The value for n = 1. So it is a linear regression.
I know there are many different alternatives, including numpy.polynomial.polynomial.polyfit(x,y,n), but they seem to provide the same slow speed performance. I have had little luck getting np.linalg to work properly. Therefore, I am wondering what might be an alternative to speed up calculations?

As others have commented, this can be done using linear least squares.
Using numpy.linalg.lstsq, this could look like:
import numpy as np
def lstsq(x, y):
a = np.stack((x, np.ones_like(x)), axis=1)
return np.linalg.lstsq(a, y)[0]
This offers a slight speed improvement over polyfit. To obtain a significant speed increase (at the expense of numerical stability - for a summary of methods see Numerical methods for linear least squares) you can instead solve the normal equations:
def normal(x, y):
a = np.stack((x, np.ones_like(x)), axis=1)
aT = a.T
return np.linalg.solve(aT#a, aT#y)
As you say that x is constant, you can precompute a.T#a providing a further speed increase:
def normal2(aT, aTa, y):
return np.linalg.solve(aTa, aT#y)
Make up some test data and time:
rng = np.random.default_rng()
N = 1000
x = rng.random(N)
y = rng.random(N)
a = np.stack((x, np.ones_like(x)), axis=1)
aT = a.T
aTa = aT#a
assert np.allclose(lstsq(x, y), np.polyfit(x, y, 1))
assert np.allclose(normal(x, y), np.polyfit(x, y, 1))
assert np.allclose(normal2(aT, aTa, y), np.polyfit(x, y, 1))
%timeit np.polyfit(x, y, 1)
%timeit lstsq(x, y)
%timeit normal(x, y)
%timeit normal2(aT, aTa, y)
Output:
256 µs ± 270 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
220 µs ± 1.87 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
20.2 µs ± 32.3 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
6.54 µs ± 13.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Related

How can I speed up iterating large list and summing values

# Generate test data
test = list(range(150))
groups = []
for _ in range(75_000):
groups.append(random.sample(test, 6))
Setup variables as numpy arrays:
# Best version
import numpy as np
import random
from numba import jit # Kind of optional see below
# Generate test data
test = list(range(150))
groups = np.array([random.sample(test, 6) for _ in range(75_000)])
# This will change every time but just leaving the same for example
scores_dict = {i: random.uniform(0, 120) for i in range(150)}
scores = np.array(list(scores_dict.items()))
Here's the vectorized version using numpy's sum and take:
def fun1(scores, groups):
for _ in range(6250):
c = np.sum(np.take(scores[:, 1], groups), axis=1)
return c
%timeit fun1(scores, groups) # Takes ~2.5 mins to run
18.6 s ± 625 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If you really want to go all out you can try using numba on top of numpy:
#jit(nopython=True)
def fun2(scores, groups):
for _ in range(6250):
c = np.sum(np.take(scores[:, 1], groups), axis=1)
return c
%timeit fun2(scores, groups) # Takes ~1.2 mins to run
10.1 s ± 1.32 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

Fast alternative to conditionally set an array elements

I have two given 3d arrays x_dist and y_dist each of shape (36,50,50). Elements in x_dist and y_dist are of type np.float32 and can be positive, negative or zero. I need to create a new array res_array, where I set its value to (1-y_dist)*(x_dist) at all indexes except for where the condition ((x_dist <= 0) | ((x_dist > 0) & (y_dist > (1 + x_dist)))) is True. My current implementation is as follows.
res_array = (1-y_dist)*(x_dist)
res_array[((x_dist <= 0) | ((x_dist > 0) & (y_dist > (1 + x_dist))))] = 0.0
However, I need to run the code that contains this code snipet thousands of time and I am sure there is a smarter and more faster way to do the same. Can you please help me here to get a performance wise better code or one-liner?
I appreciate your help in advance!
Numba JIT can be used to do that efficiently. Here is an implementation:
#njit
def fastImpl(x_dist, y_dist):
res_array = np.empty(x_dist.shape)
for z in range(res_array.shape[0]):
for y in range(res_array.shape[1]):
for x in range(res_array.shape[2]):
xDist = x_dist[z,y,x]
yDist = y_dist[z,y,x]
if xDist > 0.0 and yDist <= (1.0 + xDist):
res_array[z,y,x] = (1.0 - yDist) * xDist
return res_array
Here are performance results on random input matrices:
Original implementation: 494 µs ± 6.23 µs per loop (mean ± std. dev. of 7 runs, 500 loops each)
New implementation: 37.8 µs ± 236 ns per loop (mean ± std. dev. of 7 runs, 500 loops each)
The new implementation is about 13 times faster (without taking into account the compilation/warm-up time).

Why numpy.var is O(N) space?

I have an array of ~13GB. I call numpy.var on it to compute the variance. However, it allocates another ~13GB to do this. Why does it need O(N) space? Or am I calling numpy.var in a wrong way?
import numpy as np
# data = ...
print('Variance: ', np.var(data))
NumPy will create an intermediate array to compute abs(data - data.mean()) ** 2 in order to compute the variance. You can write your own variance function with a loop and make it fast with Numba:
import numpy as np
import numba as nb
#nb.njit(parallel=True)
def var_nb(a, ddof=0):
n = len(a)
s = a.sum()
m = s / (n - ddof)
v = 0
for i in nb.prange(n):
v += abs(a[i] - m) ** 2
return v / (n - ddof)
np.random.seed(100)
a = np.random.rand(100_000)
print(np.var(a))
# 0.08349747560941487
print(var_nb(a))
# 0.08349747560941487
%timeit np.var(a)
# 143 µs ± 414 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit var_nb(a)
# 40.2 µs ± 530 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
This is faster whitout parallelization:
import numpy as np
def var(a: np.ndarray, axis: int = 0):
return np.sum(abs(a - (a.sum(axis=axis) / len(a))) ** 2, axis=axis) / len(a)

speeding up sympy matrix operations

I have written a code, which uses sympy to set up a matrix and a vector. The elements of these two are sympy symbols. Then I invert the matrix and multiply the inverted matrix and the vector. This should be a generic solver for linear equation systems with n variables. I am interested in the symbolic solution of these linear equations.
The problem is that my code is too slow.
For instance, for n=4 it takes roughly 30 sec but for n=7 I haven't been able to solve it so far, the code ran all night (8h) and hasn't finished in the morning.
This is my code.
from sympy import *
import pprint
MM = Matrix(niso,1, lambda i,j:var('MM_%s' % (i+1) ))
MA = Matrix (niso,1, lambda i,j:var('m_%s%s' % ('A', chr(66+i)) ) )
MX = Matrix (niso,1, lambda i,j:var('m_%s%s'% (chr(66+i), 'A')))
RB = Matrix(niso,1, lambda i,j:var('R_%s%s' % ('A'+chr(66+i),i+2)))
R = Matrix (niso, niso-1, lambda i,j: var('R_%s%d' % (chr(65+i) , j+2 )))
K= Matrix(niso-1,1, lambda i,j:var('K_%d' % (i+2) ) )
C= Matrix(niso-1,1, lambda i,j:var('A_%d' % i))
A = Matrix(niso-1,niso-1, lambda i,j:var('A_%d' % i))
b = Matrix(niso-1,1, lambda i,j:var('A_%d' % i))
for i in range(0,niso-1):
for j in range(0,niso-1):
A[i,j]=MM[j+1,0]*(Add(Mul(R[0,j],1/MA[i,0]/(RB[i,0]-R[0,i])))+R[i+1,j]/MX[i,0]/(-RB[i,0]+R[0,i]))
for i in range(0,niso-1):
b[i,0]=MM[0,0]*(Add(Mul(1,1/MA[i,0]/(RB[i,0]-R[0,i])))+1/MX[i,0]/(-RB[i,0]+R[0,i]))
A_in = Inverse(A)
if niso <= 4:
X =simplify(A_in*b)
if niso > 4:
X = A_in*b
pprint(X)
Is there a way to speed it up?
Don't invert! With n=4
%timeit soln = A.LUsolve(b)
697 µs ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
With n=10
%timeit soln = A.LUsolve(b)
431 ms ± 13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Numpy efficient matrix self-multiplication (gram matrix)

I want to multiply B = A # A.T in numpy. Obviously, the answer would be a symmetric matrix (i.e. B[i, j] == B[j, i]).
However, it is not clear to me how to leverage this easily to cut the computation time down in half (by only computing the lower triangle of B and then using that to get the upper triangle for free).
Is there a way to perform this optimally?
As noted in #PaulPanzer's link, dot can detect this case. Here's the timing proof:
In [355]: A = np.random.rand(1000,1000)
In [356]: timeit A.dot(A.T)
57.4 ms ± 960 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [357]: B = A.T.copy()
In [358]: timeit A.dot(B)
98.6 ms ± 805 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Numpy dot too clever about symmetric multiplications
You can always use sklearns's pairwise_distances
Usage:
from sklearn.metrics.pairwise import pairwise_distances
gram = pairwise_distance(x, metric=metric)
Where metric is a callable or a string defining one of their implemented metrics (full list in the link above)
But, I wrote this for myself a while back so I can share what I did:
import numpy as np
def computeGram(elements, dist):
n = len(elements)
gram = np.zeros([n, n])
for i in range(n):
for j in range(i + 1):
gram[i, j] = dist(elements[i], elements[j])
upTriIdxs = np.triu_indices(n)
gram[upTriIdxs] = gram.T[upTriIdxs]
return gram
Where dist is a callable, in your case np.inner

Categories

Resources