How to sum every 2 consecutive vectors using numpy - python

How to sum every 2 consecutive vectors using numpy. Or the mean of every 2 consecutive vectors.
The list of lists (that can have even or uneven number of vectors.)
example:
[[2,2], [1,2], [1,1], [2,2]] --> [[3,4], [3,3]]
Maybe something like this but using numpy and something that actually works on array of vectors and not an array of integers. Or maybe some sort of array comprehension if the that exists.
def pairwiseSum(lst, n):
sum = 0;
for i in range(len(lst)-1):
# adding the alternate numbers
sum = lst[i] + lst[i + 1]

def mean_consecutive_vectors(lst, step):
idx_list = list(range(step, len(lst), step))
new_lst = np.split(lst, idx_list)
return np.mean(new_lst, axis=1)
Same could be done with np.sum() instead of np.mean().

You can reshape your array into pairs, which will allow you to use np.sum() or np.mean() directly by providing the correct axis:
import numpy as np
a = np.array([[2,2], [1,2], [1,1], [2,2]])
np.sum(a.reshape(-1, 2, 2), axis=1)
# array([[3, 4],
# [3, 3]])
Edit to address comment:
To get a the means of each adjacent pair, you can add slices of the original array and broadcast division by 2:
> a = np.array([[2,2], [1,2], [1,1], [2,2], [11, 10], [20, 30]])
> (a[:-1] + a[1:])/2
array([[ 1.5, 2. ],
[ 1. , 1.5],
[ 1.5, 1.5],
[ 6.5, 6. ],
[15.5, 20. ]])

Related

How to apply to a numpy array special function based on neighbour elements?

I want to apply a function to a 2d-array that must work like commulative sum, but instead uses max() function applied to upper, left and upper-left-diagonal neighbours and element itself. It should start from upper-left element and commulate previous results for calculation the next element. Is there a way to do it without nested loops?
Example:
>>> x = np.arange(9)[np.random.permutation(9)].reshape((3,3))
>>> x
array([[3, 4, 8],
[0, 6, 5],
[7, 2, 1]])
>>> res = np.zeros(x.shape)
>>> for i in range(0, x.shape[0]):
for j in range(0, x.shape[1]):
if i==0:
if j==0:
res[i,j] = x[i,j]
else:
res[i,j] = max(x[i,j], res[i,j-1])
else:
if j==0:
res[i,j] = max(x[i,j], res[i-1,j])
else:
res[i,j] = max(x[i,j], res[i-1,j], res[i,j-1], res[i-1,j-1])
>>> res
array([[3., 4., 8.],
[3., 6., 8.],
[7., 7., 8.]])
Because values of "further" res elements depend on values
of "previous" elements (computed earlier), you can not use Numpy
vectorization, as it depends only on elements of the source array.
But you can simplify your code as follows:
res = x.copy()
for i in range(a.shape[0]):
i1 = max(i-1, 0) # Start of row range
for j in range(a.shape[1]):
j1 = max(j-1, 0) # Start of column range
res[i,j] = res[i1:i+1, j1:j+1].max()
​
The result is:
array([[3, 4, 8],
[3, 6, 8],
[7, 7, 8]])
Note that due to res = x.copy() the result has the same dtype as
the original array.
The numpy has cumsum() built-in function, which we can use here if we provide axis as "1" , then it will give cu. sum of the columns , if you want the cu. sum of the rows you can use axis as "0"
import numpy as np
x = np.arange(9)[np.random.permutation(9)].reshape((3,3))
print(x)
x =np.cumsum(x,dtype=float,axis=1)
print(x)
Output:
[[0 2 8]
[3 1 6]
[5 4 7]]
[[ 0. 2. 10.]
[ 3. 4. 10.]
[ 5. 9. 16.]]

How to add different random values to n elements of a numpy array?

I am trying to add random values to a specific amount of values in a numpy array to mutate weights of my neural network. For example, 2 of the values in this array
[ [0 1 2]
[3 4 5]
[6 7 8] ]
are supposed to be mutated (i. e. a random value between -1 and 1 is added to them). The result may look something like this then:
[ [0 0.7 2]
[3 4 5]
[6.9 7 8]]
I would prefer a solution without looping, as my real problem is a little bigger than a 3x3 matrix and looping usually is inefficient.
Here's one way based on np.random.choice -
def add_random_n_places(a, n):
# Generate a float version
out = a.astype(float)
# Generate unique flattened indices along the size of a
idx = np.random.choice(a.size, n, replace=False)
# Assign into those places ramdom numbers in [-1,1)
out.flat[idx] += np.random.uniform(low=-1, high=1, size=n)
return out
Sample runs -
In [89]: a # input array
Out[89]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [90]: add_random_n_places(a, 2)
Out[90]:
array([[0. , 1. , 2. ],
[2.51523009, 4. , 5. ],
[6. , 7. , 8.36619255]])
In [91]: add_random_n_places(a, 4)
Out[91]:
array([[0.67792859, 0.84012682, 2. ],
[3. , 3.71209157, 5. ],
[6. , 6.46088001, 8. ]])
You can use np.random.rand(3,3) to create a 3x3 matrix with [0,1) random values.
To get (-1,1) values try np.random.rand(3,3) - np.random.rand(3,3) and add this to a matrix you want to mutate.

Numpy subset matrix based on another with binary data

I have a n x m matrix X and a n x p matrix Y where Y is binary data. In the end I want a p x n matrix Z where the columns of Z are a function of the columns of X subsetting to the column entries corresponding to 1's in Y.
For example
>>> X
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> Y
array([[1, 0],
[1, 0],
[0, 1]])
n_x,m = X.shape
n_y,p = Y.shape
Z = np.zeros([p, n_x])
for i in range(n_x):
col = X[:,[i]]
for j in range(p):
#this is where I subset col with Y[:,[j]]
Z[j][i] = my_func(subsetted_column)
The iterations would produce
i=0, j=0: subsetted_column = [[1],[4]]
i=0, j=1: subsetted_column = [[7]]
i=1, j=0: subsetted_column = [[2],[5]]
i=1, j=1: subsetted_column = [[8]]
i=2, j=0: subsetted_column = [[3],[6]]
i=2, j=1: subsetted_column = [[9]]
I assume there is some way to do that nested loop in a single list comprehension. The function my_func also takes a long time so would be nice to parallelize that somehow.
Edit: I could do something like
for i in range(n_x):
for j in range(p):
subsetted_column = np.trim_zeros(np.multiply(X[:,i], Y[:,j]))
Z[j][i] = my_func(subsetted_column)
But I still believe there is an easier solution
Does this what you want?
import numpy as np
N, M, P = 4, 3, 2
a = np.random.random((N, M))
b = np.random.randint(2, size=(N, P)).astype(bool)
your_func = lambda x: x # insert proper function here
flat = [your_func(ai[bj]) for bj in b.T for ai in a.T]
out = np.empty((P, M), dtype=object)
out.ravel()[:] = flat
print(a)
print(b)
print(out)
Remarks:
It is easiest to convert your masking array to dtype bool because this allows you to use logical indexing.
If your_func returns just a number it's better not to use dtype=object for out.
If you want to parallelise, a list comprehension is perhaps not the best thing to do, but I'm no expert on that. It's just that the loop looks like an obvious parallelisation target, since the order of iterations is irrelevant.
Sample output:
[[ 0.62739382 0.85774837 0.81958524]
[ 0.99690996 0.71202879 0.97636715]
[ 0.89235107 0.91739852 0.39537849]
[ 0.0413107 0.11662271 0.72419308]]
[[False True]
[ True True]
[False False]
[ True True]]
[[array([ 0.99690996, 0.0413107 ]) array([ 0.71202879, 0.11662271])
array([ 0.97636715, 0.72419308])]
[array([ 0.62739382, 0.99690996, 0.0413107 ])
array([ 0.85774837, 0.71202879, 0.11662271])
array([ 0.81958524, 0.97636715, 0.72419308])]]
It may help to perform the subsetting in a pre-processing loop
In [112]: xs = [X[y,:] for y in Y.astype(bool).T]
In [113]: xs
Out[113]:
[array([[1, 2, 3],
[4, 5, 6]]),
array([[7, 8, 9]])]
(.T is used to iterate on columns in the list comprehension; bool allows 'masked' selection)
Let's say, for example that my_func takes the mean on axis=0 for the subsets
In [116]: [np.mean(s, axis=0) for s in xs]
Out[116]: [array([ 2.5, 3.5, 4.5]), array([ 7., 8., 9.])]
In [117]: np.array(_)
Out[117]:
array([[ 2.5, 3.5, 4.5],
[ 7. , 8. , 9. ]])
I could combine it into one loop, but it's harder to think about:
np.array([np.mean(X[y,:],axis=0) for y in Y.astype(bool).T])
With this xs list, you can focus your efforts on applying my_func efficiently to all the columns of xs[i] as np.mean(xs[i], axis=0) does.
The double loop version of this mean
In [121]: p=np.zeros((2,3))
In [122]: for i in range(2):
...: for j in range(3):
...: p[i,j] = np.mean(xs[i][:,j])
...:
In [123]: p
Out[123]:
array([[ 2.5, 3.5, 4.5],
[ 7. , 8. , 9. ]])
Equivalent double list comprehension
In [125]: [[np.mean(i) for i in j.T] for j in xs]
Out[125]: [[2.5, 3.5, 4.5], [7.0, 8.0, 9.0]]

python mean of list of lists

I want to find the means of all the negative numbers from a list that has a mix of positive and negative numbers. I can find the mean of the lists as
import numpy as np
listA = [ [2,3,-7,-4] , [-2,3,4,-5] , [-5,-6,-8,2] , [9,5,13,2] ]
listofmeans = [np.mean(i) for i in listA ]
I want to create a similar one line code that only takes the mean of the negative numbers in the list. So for example the first element of the new list would be (-7 + -4)/2 = -5.5
My complete list would be:
listofnegativemeans = [ -5.5, -3.5, -6.333333, 0 ]
You could use the following:
listA = [[2,3,-7,-4], [-2,3,4,-5], [-5,-6,-8,2], [9,5,13,2]]
means = [np.mean([el for el in sublist if el < 0] or 0) for sublist in listA]
print(means)
Output
[-5.5, -3.5, -6.3333, 0.0]
If none of the elements in sublist are less than 0, the list comprehension will evaluate to []. By including the expression [] or 0 we handle your scenario where you want to evaluate the mean of an empty list to be 0.
If you're using numpy at all, you should strive for numpythonic code rather than falling back to python logic. That means using numpy's ndarray data structure, and the usual indexing style for arrays, rather than python loops.
For the usual means:
>>> listA
[[2, 3, -7, -4], [-2, 3, 4, -5], [-5, -6, -8, 2], [9, 5, 13, 2]]
>>> A = np.array(listA)
>>> np.mean(A, axis=1)
array([-1.5 , 0. , -4.25, 7.25])
Negative means:
>>> [np.mean(row[row<0]) for row in A]
[-5.5, -3.5, -6.333333333333333, nan]
The pure numpy way :
In [2]: np.ma.masked_greater(listA,0).mean(1).data
Out[2]: array([-5.5 , -3.5 , -6.33333333, 0. ])
That would be something like:
listA = np.array( [ [2,3,-7,-4] , [-2,3,4,-5] , [-5,-6,-8,2] , [9,5,13,2] ] )
listofnegativemeans = [np.mean(i[i<0]) for i in listA ]
output:
[-5.5, -3.5, -6.333333333333333, nan]
Zero is misleading, I definitely prefer nan if you don't have any elements that are negative.

what does numpy.apply_along_axis perform exactly?

I have come across the numpy.apply_along_axis function in some code. And I don't understand the documentation about it.
This is an example of the documentation:
>>> def new_func(a):
... """Divide elements of a by 2."""
... return a * 0.5
>>> b = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> np.apply_along_axis(new_func, 0, b)
array([[ 0.5, 1. , 1.5],
[ 2. , 2.5, 3. ],
[ 3.5, 4. , 4.5]])
As far I as thought I understood the documentation, I would have expected:
array([[ 0.5, 1. , 1.5],
[ 4 , 5 , 6 ],
[ 7 , 8 , 9 ]])
i.e. having applied the function along the axis [1,2,3] which is axis 0 in [[1,2,3], [4,5,6], [7,8,9]]
Obviously I am wrong. Could you correct me ?
apply_along_axis applies the supplied function along 1D slices of the input array, with the slices taken along the axis you specify. So in your example, new_func is applied over each slice of the array along the first axis. It becomes clearer if you use a vector valued function, rather than a scalar, like this:
In [20]: b = np.array([[1,2,3], [4,5,6], [7,8,9]])
In [21]: np.apply_along_axis(np.diff,0,b)
Out[21]:
array([[3, 3, 3],
[3, 3, 3]])
In [22]: np.apply_along_axis(np.diff,1,b)
Out[22]:
array([[1, 1],
[1, 1],
[1, 1]])
Here, numpy.diff (i.e. the arithmetic difference of adjacent array elements) is applied along each slice of either the first or second axis (dimension) of the input array.
The function is performed on 1-d arrays along axis=0. You can specify another axis using the "axis" argument. A usage of this paradigm is:
np.apply_along_axis(np.cumsum, 0, b)
The function was performed on each subarray along dimension 0. So, it is meant for 1-D functions and returns a 1D array for each 1-D input.
Another example is :
np.apply_along_axis(np.sum, 0, b)
Provides a scalar output for a 1-D array.
Of course you could just set the axis parameter in cumsum or sum to do the above, but the point here is that it can be used for any 1-D function you write.

Categories

Resources