Python: fast residuals computing

Python: fast residuals computing - python

What is the most efficient way to compute the residuals of two numpy arrays?
I'm doing this the next way:
def residuals(array1, array2):
sum = 0.
for i in xrange(len(lane1)):
sum += (lane1[i] - lane2[i])**2
return sum
And I'm wondering if there is any other better solutions?

Yes, note that you can perform mathematical operations directly on arrays and they are applied element-wise:
>>> import numpy as np
>>> arr1 = np.array((1, 2, 3))
>>> arr2 = np.array((4, 5, 6))
# differences
>>> arr1 - arr2
array([-3, -3, -3])
# squared differences
>>> (arr1 - arr2) ** 2
array([9, 9, 9])
# sum of squared differences
>>> np.sum((arr1 - arr2) ** 2)
27

Related

Numpy fast for loop

I have a list named "y" with 8 numpy arrays of the shape (180000,)
Now I want to create a new numpy array named "Collision" with the same shape that counts how many values of y are not 0. See the following example:
import numpy as np
collisions = np.zeros(len(y[0]), dtype=np.uint8)
for yi in y:
collisions[np.where(yi > 0)] += 1
The calculation of this function takes a relatively long time. Is there a faster implementation to do this?

I am not sure why your calculation takes so long, hope this helps to clarify, for example your list of array is like this:
import numpy as np
y = [np.random.normal(0,1,180000) for i in range(8)]
Running your code, it works ok:
collisions = np.zeros(len(y[0]), dtype=np.uint8)
for yi in y:
collisions[np.where(yi > 0)] += 1
collisions
array([4, 2, 4, ..., 4, 4, 5], dtype=uint8)
You can do it a bit faster like this, basically making your list of arrays a matrix and doing a row sum of >0, but I don't see the problem with that above:
(np.array(y)>0).sum(axis=0)
array([4, 2, 4, ..., 4, 4, 5])

I'm assuming you're looking for something like this:
import numpy as np
# simulating your data by randomly generating numbers in [-0.5, 0.5)
y = np.random.rand(8, 180_000) - 0.5
print(y.shape) # (8, 180000)
collisions = np.sum(y > 0, axis=0, dtype=np.uint8)
print(collisions.shape) # (180000,)
print(collisions) # [4 4 4 ... 1 6 7]

Understanding numpy vectorization

I came across performing calculation for euclidian distance using numpy vectorization, here. Calculation done is:
>>> tri = np.array([[1, 1],
... [3, 1],
... [2, 3]])
>>> np.sum(tri**2, axis=1) ** 0.5 # Or: np.sqrt(np.sum(np.square(tri), 1))
array([1.4142, 3.1623, 3.6056])
So, to understand, I tried:
>>> np.sum(tri**2, axis=1)
array([ 2, 10, 13])
So basically, tri**2 is squaring each element: [[1,1],[9,1],[4,9]]. Next, we sum each sub-array element to get [1+1, 9+1, 4+9] = [2,10,13]
Then we take square root of each of them.
But I didnt get where are we doing the subtraction qi-pi as in the formula? Also I felt we should be getting single value: √((1-1)^2+(9-1)^2+(4-9)^2)=9.43
Am I missing some maths here or python / numpy understanding?

Assuming you have two vectors p and q represented as np.array:
dist = np.sqrt(np.sum((q - p) ** 2))
There is also np.linalg.norm which computes the same thing:
assert np.isclose(dist, np.linalg.norm(q - p))

sum through specific values in an array

I have an array of data-points, for example:
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
and I need to perform the following sum on the values:
However, the problem is that I need to perform this sum on each value > i. For example, using the last 3 values in the set the sum would be:
and so on up to 10.
If i run something like:
import numpy as np
x = np.array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
alpha = 1/np.log(2)
for i in x:
y = sum(x**(alpha)*np.log(x))
print (y)
It returns a single value of y = 247.7827060452275, whereas I need an array of values. I think I need to reverse the order of the data to achieve what I want but I'm having trouble visualising the problem (hope I explained it properly) as a whole so any suggestions would be much appreciated.

The following computes all the partial sums of the grand sum in your formula
import numpy as np
# Generate numpy array [1, 10]
x = np.arange(1, 11)
alpha = 1 / np.log(2)
# Compute parts of the sum
parts = x ** alpha * np.log(x)
# Compute all partial sums
part_sums = np.cumsum(parts)
print(part_sums)
You really do not any explicit loop, or a non-numpy operation (like sum()) here. numpy takes care of all your needs.

Calculating "generating functions" with numpy

In mathematics, a "generating function" is defined from a sequence of numbers c0, c1, c2, ..., cn by c0+c1*x+c2*x^2 + ... + cn*x^n. These come as "moment generating functions", "probability generating functions" and various other types, depending on the source of the coefficient.
I have an array of the coefficients and I'd like a quick way to create the corresponding generating function.
I could do
import numpy as np
myArray = np.array([1,2,3,4])
x=0.2
sum([c*x**k for k,c in enumerate myArray])
or I could have an array having c[k] in the kth entry. It seems there should be a fast numpy way to do this.
Unfortunately attempts to look this up are complicated by the fact that "generate" and "function" are common words in programming, as is the combination "generating function" so I haven't had any luck with search engines.

x = .2
coeffs = np.array([1,2,3,4])
Make an array of the degree of each term
degrees = np.arange(len(coeffs))
Raise x the each degree
terms = np.power(x, degrees)
Multiply the coefficients and sum
result = np.sum(coeffs*terms)
>>> coeffs
array([1, 2, 3, 4])
>>> degrees
array([0, 1, 2, 3])
>>> terms
array([ 1. , 0.2 , 0.04 , 0.008])
>>> result
1.552
>>>
As a function:
def f(coeffs, x):
degrees = np.arange(len(coeffs))
terms = np.power(x, degrees)
return np.sum(coeffs*terms)
Or simply us the Numpy Polynomial Package
from numpy.polynomial import Polynomial as P
p = P(coeffs)
result = p(x)

If you are looking for performance, using np.einsum could be suggested too -
np.einsum('i,i->',myArray,x**np.arange(myArray.size))

>>> coeffs = np.random.random(5)
>>> coeffs
array([ 0.70632473, 0.75266724, 0.70575037, 0.49293719, 0.66905641])
>>> x = np.random.random()
>>> x
0.7252944971757169
>>> powers = np.arange(0, coeffs.shape[0], 1)
>>> powers
array([0, 1, 2, 3, 4])
>>> result = coeffs * x ** powers
>>> result
array([ 0.70632473, 0.54590541, 0.37126147, 0.18807659, 0.18514853])
>>> np.sum(result)
1.9967167252487628

Using numpys Polynomial class is probably the easiest way.
from numpy.polynomial import Polynomial
coefficients = [1,2,3,4]
f = Polynomial( coefficients )
You can then use the object like any other function.
import numpy as np
import matplotlib.pyplot as plt
print f( 0.2 )
x = np.linspace( -5, 5, 51 )
plt.plot( x , f(x) )

Decrease array size by averaging adjacent values with numpy

I have a large array of thousands of vals in numpy. I want to decrease its size by averaging adjacent values.
For example:
a = [2,3,4,8,9,10]
#average down to 2 values here
a = [3,9]
#it averaged 2,3,4 and 8,9,10 together
So, basically, I have n number of elements in array, and I want to tell it to average down to X number of values, and it averages like above.
Is there some way to do that with numpy (already using it for other things, so I'd like to stick with it).

Using reshape and mean, you can average every m adjacent values of an 1D-array of size N*m, with N being any positive integer number. For example:
import numpy as np
m = 3
a = np.array([2, 3, 4, 8, 9, 10])
b = a.reshape(-1, m).mean(axis=1)
#array([3., 9.])
1)a.reshape(-1, m) will create a 2D image of the array without copying data:
array([[ 2, 3, 4],
[ 8, 9, 10]])
2)taking the mean in the second axis (axis=1) will then calculate the mean value of each row, resulting in:
array([3., 9.])

Try this:
n_averaged_elements = 3
averaged_array = []
a = np.array([ 2, 3, 4, 8, 9, 10])
for i in range(0, len(a), n_averaged_elements):
slice_from_index = i
slice_to_index = slice_from_index + n_averaged_elements
averaged_array.append(np.mean(a[slice_from_index:slice_to_index]))
>>>> averaged_array
>>>> [3.0, 9.0]

Looks like a simple non-overlapping moving window average to me, how about:
In [3]:
import numpy as np
a = np.array([2,3,4,8,9,10])
window_sz = 3
a[:len(a)/window_sz*window_sz].reshape(-1,window_sz).mean(1)
#you want to be sure your array can be reshaped properly, so the [:len(a)/window_sz*window_sz] part
Out[3]:
array([ 3., 9.])

In this example, I presume that a is the 1D numpy array that needs to be averaged. In the method that I give below, we first find the factors of the length of this array a. And, then we choose the an appropriate factor as the step size to average the array with.
Here is the code.
import numpy as np
from functools import reduce
''' Function to find factors of a given number 'n' '''
def factors(n):
return list(set(reduce(list.__add__,
([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0))))
a = [2,3,4,8,9,10] #Given array.
'''fac: list of factors of length of a.
In this example, len(a) = 6. So, fac = [1, 2, 3, 6] '''
fac = factors(len(a))
'''step: choose an appropriate step size from the list 'fac'.
In this example, we choose one of the middle numbers in fac
(3). '''
step = fac[int( len(fac)/3 )+1]
'''avg: initialize an empty array. '''
avg = np.array([])
for i in range(0, len(a), step):
avg = np.append( avg, np.mean(a[i:i+step]) ) #append averaged values to `avg`
print avg #Prints the final result
[3.0, 9.0]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: fast residuals computing - python

What is the most efficient way to compute the residuals of two numpy arrays? I'm doing this the next way: def residuals(array1, array2): sum = 0. for i in xrange(len(lane1)): sum += (lane1[i] - lane2[i])**2 return sum And I'm wondering if there is any other better solutions?

Related

Numpy fast for loop

Understanding numpy vectorization

sum through specific values in an array

Calculating "generating functions" with numpy

Decrease array size by averaging adjacent values with numpy

Categories

Resources