selective row sum matrix in numpy

selective row sum matrix in numpy - python

Is there any efficient numpy way to do the following:
Assume I have some matix M of size R X C. Now assume I have another matrix
E which is of shape R X a (where a is just some constant a < C), which contains row indices of
M (and -1 for padding, i.e., every element of E is in {-1, 0, .., R-1}). For example,
M=array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
E = array([[ 0, 1],
[ 2, -1],
[-1, 0]])
Now, given those matrices, I want to generate a third matrix P, where the i'th row of P will
contain the sum of the following rows of M : E[i,:]. In the example, P will be,
P[0,:] = M[0,:] + M[1,:]
P[1,:] = M[2,:]
P[2,:] = M[0,:]
Yes, doing it with a loop is pretty straight forward and easy, I was wondering if there is
any fancy numpy way to make it more efficient (assuming that I want to do it with large matrices,
e.g., 200 X 200.
Thanks!

One way would be to sum with indexed on original array and then subtract out the summations caused by the last indexed ones by -1s -
out = M[E].sum(1) - M[-1]*(E==-1).sum(1)[:,None]
Another way would be pad zeros at the end of M, so that those -1 would index into those zeros and hence have no effect on the final sum after indexing -
M1 = np.vstack((M, np.zeros((1,M.shape[1]), dtype=M.dtype)))
out = M1[E].sum(1)
If there is exactly one or lesser -1 per row in E, we can optimize further -
out = M[E].sum(1)
m = (E==-1).any(1)
out[m] -= M[-1]
Another based on tensor-multiplication -
np.einsum('ij,kli->kj',M, (E[...,None]==np.arange(M.shape[1])))

You could index M with E, and np.sum where the actual indices in E are greater or equal to 0. For that we have the where parameter:
np.sum(M[E], where=(E>=0)[...,None], axis=1)
array([[5, 7, 9],
[7, 8, 9],
[1, 2, 3]])
Where we have that:
M[E]
array([[[1, 2, 3],
[4, 5, 6]],
[[7, 8, 9],
[7, 8, 9]],
[[7, 8, 9],
[1, 2, 3]]])
Is added on the rows:
(E>=0)[...,None]
array([[[ True],
[ True]],
[[ True],
[False]],
[[False],
[ True]]])

Probably not the fastest but maybe educational: The operation you are describing can be thought of as matrix multiplication with a certain adjacency matrix:
from scipy import sparse
# construct adjacency matrix
indices = E[E!=-1]
indptr = np.concatenate([[0],np.count_nonzero(E!=-1,axis=1).cumsum()])
data = np.ones_like(indptr)
aux = sparse.csr_matrix((data,indices,indptr))
# multiply
aux*M
# array([[5, 7, 9],
# [7, 8, 9],
# [1, 2, 3]], dtype=int64)

Related

"Vectorized" Matrix-Vector multiplication in numpy

I have an $I$-indexed array $V = (V_i)_{i \in I}$ of (column) vectors $V_i$, which I want to multiply pointwise (along $i \in I$) by a matrix $M$. So I'm looking for a "vectorized" operation, wherein the individual operation is a multiplication of a matrix with a vector; that is
$W = (M V_i)_{i \in I}$
Is there a numpy way to do this?
numpy.dot unfortunately assumes that $V$ is a matrix, instead of an $I$-indexed family of vectors, which obviously fails.
So basically I want to "vectorize" the operation
W = [np.dot(M, V[i]) for i in range(N)]
Considering the 2D array V as a list (first index) of column vectors (second index).
If
shape(M) == (2, 2)
shape(V) == (N, 2)
Then
shape(W) == (N, 2)

EDIT:
Based on your iterative example, it seems it can be done with a dot product with some transposes to match the shapes. This is the same as (M#V.T).T which is the transpose of M # V.T.
# Step by step
((2,2) # (5,2).T).T
-> ((2,2) # (2,5)).T
-> (2,5).T
-> (5,2)
Code to prove this is as follows. Your iterative output results in a matrix W which is exactly equal to the solutions matrix.
M = np.random.random((2,2))
V = np.random.random((5,2))
# YOUR ITERATIVE SOLUTION (STACKED AS MATRIX)
W = np.stack([np.dot(M, V[i]) for i in range(5)])
print(W)
#array([[0.71663319, 0.84053871],
# [0.28626354, 0.36282745],
# [0.26865497, 0.55552295],
# [0.40165606, 0.10177711],
# [0.33950909, 0.54215385]])
# PROPOSED DOT PRODUCt
solution = (M#V.T).T #<---------------
print(solution)
#array([[0.71663319, 0.84053871],
# [0.28626354, 0.36282745],
# [0.26865497, 0.55552295],
# [0.40165606, 0.10177711],
# [0.33950909, 0.54215385]])
np.allclose(W, solution) #compare the 2 matrices
True
IIUC, your ar elooking for a pointwise multiplication of a matrix M and vector V (with broadcasting).
The matrix here is (3,3), while V is an array with 4 column vectors, each of which you want to independently multiply with the matrix while obeying broadcasting rules.
# Broadcasting Rules
M -> 3, 3
V -> 4, 1, 3 #V.T[:,None,:]
----------------
R -> 4, 3, 3
----------------
Code for this -
M = np.array([[1,1,1],
[0,0,0],
[1,1,1]]) #3,3 matrix M
V = np.array([[1,2,3,4],
[1,2,3,4], #4,3 indexed vector
[1,2,3,4]]) #store 4 column vectors
R = M * V.T[:,None,:] #<--------------
R
array([[[1, 1, 1],
[0, 0, 0],
[1, 1, 1]],
[[2, 2, 2],
[0, 0, 0],
[2, 2, 2]],
[[3, 3, 3],
[0, 0, 0],
[3, 3, 3]],
[[4, 4, 4],
[0, 0, 0],
[4, 4, 4]]])
Post this if you have any aggregation, you can reduce the matrix with the required operations.
Example, Matrix M * Column vector [1,1,1] results in -
array([[[1, 1, 1],
[0, 0, 0],
[1, 1, 1]],
while, Matrix M * Column vector [4,4,4] results in -
array([[[4, 4, 4],
[0, 0, 0],
[4, 4, 4]],

With
shape(M) == (2, 2)
shape(V) == (N, 2)
and
W = [np.dot(M, V[i]) for i in range(N)]
V[i] is (2,), so np.dot(M,V[i]) is (2,2) with(2,) => (2,) with sum-of-products on the last 2 of M. np.array(W) is then (N,2) shape
For 2d A,B, np.dot(A,B) does sum-of-products with the last dimension of A and 2nd to the last of B. You want the last dim of M with the last of V.
One way is:
np.dot(M,V.T).T # (2,2) with (2,N) => (2,N) => (N,2)
(M#V.T).T # with the matmul operator
Sometimes einsum makes the relation between axes clearer:
np.einsum('ij,nj->ni',M,V)
np.einsum('ij,jn->in',M,V.T).T # with j in last/2nd last positions
Or switching the order of V and M:
V # M.T # 'nj,ji->ni'
Or treating the N dimension as a batch, we could make V[:,:,None] (N,2,1). This could be thought of as N (2,1) "column vectors".
M # V[:,:,None] # (N,2,1)
np.einsum('ij,njk->nik', M, V[:,:,None]) # again j is in the last/2nd last slots
Numerically:
In [27]: M = np.array([[1,2],[3,4]]); V = np.array([[1,2],[2,3],[3,4]])
In [28]: [M#V[i] for i in range(3)]
Out[28]: [array([ 5, 11]), array([ 8, 18]), array([11, 25])]
In [30]: (M#V.T).T
Out[30]:
array([[ 5, 11],
[ 8, 18],
[11, 25]])
In [31]: V#M.T
Out[31]:
array([[ 5, 11],
[ 8, 18],
[11, 25]])
Or the batched:
In [32]: M#V[:,:,None]
Out[32]:
array([[[ 5],
[11]],
[[ 8],
[18]],
[[11],
[25]]])
In [33]: np.squeeze(M#V[:,:,None])
Out[33]:
array([[ 5, 11],
[ 8, 18],
[11, 25]])

how to multiply each row from one matrix to every rows to another matrix on Python?

A and B matrices will be different when i run the program
A = np.array([[1, 1, 1], [2, 2, 2]])
B = np.array([[1, 1, 1], [2, 2, 2], [3, 3, 3]])
The output matrix (C) should be the same dimension as matrix A.
As title says, I'm trying to multiply each row from one matrix (A) to every rows to another matrix (B) and would like to sum them.
For example,
Dimension of C = (2,3)
C = [[A(0)*B(0) + A(1)*B(0)], [A(0)*B(1) + A(1)*B(1)],[A(0)*B(1) + A(1)*B(1)]]
I would like to know if there is a numpy function does that.

Use numpy broadcasting:
C = (A * B[:, None]).sum(axis=1)
Output:
>>> C
array([[3, 3, 3],
[6, 6, 6],
[9, 9, 9]])

Using Numpy to generate random combinations of two arrays without repetition

Given two arrays, for example [0,0,0] and [1,1,1], it is already clear (see here) how to generate all the combinations, i.e., [[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]]. itertools (combinations or product) and numpy.meshgrid are the most common ways as far as I know.
However, I could't find any discussions on how to generate this combinations randomly, without repetitions.
An easy solution could be to generate all the combinations and then choose some of them randomly. For example:
# Three random combinations of [0,0,0] and [1,1,1]
comb = np.array(np.meshgrid([0,1],[0,1],[0,1])).T.reshape(-1,3)
result = comb[np.random.choice(len(comb),3,replace=False),:]
However, this is infeasible when the number of combinations is too big.
Is there a way to generate random combinations without replacement in Python (possibly with Numpy) without generating all the combinations?
EDIT: You can notice in the accepted answer that we also got for free a technique to generate random binary vectors without repetitions, which is just a single line (described in the Bonus Section).

Here's a vectorized approach without generating all combinations -
def unique_combs(A, N):
# A : 2D Input array with each row representing one group
# N : No. of combinations needed
m,n = A.shape
dec_idx = np.random.choice(2**m,N,replace=False)
idx = ((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int)
return A[np.arange(m),idx]
Please note that this assumes we are dealing with equal number of elements per group.
Explanation
To give it a bit of explanation, let's say the groups are stored in a 2D array -
In [44]: A
Out[44]:
array([[4, 2], <-- group #1
[3, 5], <-- group #2
[8, 6]]) <-- group #3
We have two elems per group. Let's say we are looking for 4 unique group combinations : N = 4. To select from two numbers from each of those three groups, we would have a total of 8 unique combinations.
Let's generate N unique numbers in that interval of 8 using np.random.choice(8, N, replace=False) -
In [86]: dec_idx = np.random.choice(8,N,replace=False)
In [87]: dec_idx
Out[87]: array([2, 3, 7, 0])
Then, convert those to binary equivalents as later on we need those to index into each row of A -
In [88]: idx = ((dec_idx[:,None] & (1 << np.arange(3)))!=0).astype(int)
In [89]: idx
Out[89]:
array([[0, 1, 0],
[1, 1, 0],
[1, 1, 1],
[0, 0, 0]])
Finally, with fancy-indexing, we get those elems off A -
In [90]: A[np.arange(3),idx]
Out[90]:
array([[4, 5, 8],
[2, 5, 8],
[2, 5, 6],
[4, 3, 8]])
Sample run
In [80]: # Original code that generates all combs
...: comb = np.array(np.meshgrid([4,2],[3,5],[8,6])).T.reshape(-1,3)
...: result = comb[np.random.choice(len(comb),4,replace=False),:]
...:
In [81]: A = np.array([[4,2],[3,5],[8,6]]) # 2D array of groups
In [82]: unique_combs(A, 3) # 3 combinations
Out[82]:
array([[2, 3, 8],
[4, 3, 6],
[2, 3, 6]])
In [83]: unique_combs(A, 4) # 4 combinations
Out[83]:
array([[2, 3, 8],
[4, 3, 6],
[2, 5, 6],
[4, 5, 8]])
Bonus section
Explanation on ((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int) :
That step is basically converting decimal numbers to binary equivalents. Let's break it down to smaller steps for a closer look.
1) Input array of decimal numbers -
In [18]: dec_idx
Out[18]: array([7, 6, 4, 0])
2) Convert to 2D upon inserting new axis with None/np.newaxis -
In [19]: dec_idx[:,None]
Out[19]:
array([[7],
[6],
[4],
[0]])
3) Let's assume m = 3, i.e. we want to convert to 3 binary digit number equivalents.
We create 2-powered range array with bit-shift operation -
In [16]: (1 << np.arange(m))
Out[16]: array([1, 2, 4])
Alternatively, an explicit way would be -
In [20]: 2**np.arange(m)
Out[20]: array([1, 2, 4])
4) Now, the crux of the cryptic step there. We perform broadcasted bitwise AND-ind between 2D dec_idx and 2-powered range array.
Consider the first element from dec_idx : 7. We are performing bitiwse AND-ing of 7 against 1, 2, 4. Think of it as a filtering process, as we filter 7 at each binary interval of 1, 2, 4 as they represent the three binary digits. Similarly, we do this for all elems off dec_idx in a vectorized manner with broadcasting.
Thus, we would get the bit-wise AND-ing results like so -
In [43]: (dec_idx[:,None] & (1 << np.arange(m)))
Out[43]:
array([[1, 2, 4],
[0, 2, 4],
[0, 0, 4],
[0, 0, 0]])
The filtered numbers thus obtained are either 0 or the 2-powered range array numbers themselves. So, to have the binary equivalents, we just need to consider all non-zeros as 1s and zeros as 0s.
In [44]: ((dec_idx[:,None] & (1 << np.arange(m)))!=0)
Out[44]:
array([[ True, True, True],
[False, True, True],
[False, False, True],
[False, False, False]], dtype=bool)
In [45]: ((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int)
Out[45]:
array([[1, 1, 1],
[0, 1, 1],
[0, 0, 1],
[0, 0, 0]])
Thus, we have the binary numbers with MSBs to the right.

How to compare two numpy arrays and add missing values to the other with a tweak

I have two numpy arrays of different dimension. I want to add those additional elements of the bigger array to the smaller array, only the 0th element and the 1st element should be given as 0.
For example :
a = [ [2,4],[4,5], [8,9],[7,5]]
b = [ [2,5], [4,6]]
After adding the missing elements to b, b would become as follows :
b [ [2,5], [4,6], [8,0], [7,0] ]
I have tried the logic up to some extent, however some values are getting redundantly added as I am not able to check whether that element has already been added to b or not.
Secondly, I am doing it with the help of an additional array c which is the copy of b and then doing the desired operations to c. If somebody can show me how to do it without the third array c , would be very helpful.
import numpy as np
a = [[2,3],[4,5],[6,8], [9,6]]
b = [[2,3],[4,5]]
a = np.array(a)
b = np.array(b)
c = np.array(b)
for i in range(len(b)):
for j in range(len(a)):
if a[j,0] == b[i,0]:
print "matched "
else:
print "not matched"
c= np.insert(c, len(c), [a[j,0], 0], axis = 0)
print c

#####For explanation#####
#basic set operation to get the missing elements
c = set([i[0] for i in a]) - set([i[0] for i in b])
#c will just store the missing elements....
#then just append the elements
for i in c:
b.append([i, 0])
Output -
[[2, 5], [4, 6], [8, 0], [7, 0]]
Edit -
But as they are numpy arrays you can just do this (and without using c as an intermediate) - just two lines
for i in set(a[:, 0]) - (set(b[:, 0])):
b = np.append(b, [[i, 0]], axis = 0)
Output -
array([[2, 5],
[4, 6],
[8, 0],
[7, 0]])

You can use np.in1d to look for matching rows from b in a to get a mask and based on the mask choose rows from a or set to zeros. Thus, we would have a vectorized approach as shown below -
np.vstack((b,a[~np.in1d(a[:,0],b[:,0])]*[1,0]))
Sample run -
In [47]: a
Out[47]:
array([[2, 4],
[4, 5],
[8, 9],
[7, 5]])
In [48]: b
Out[48]:
array([[8, 7],
[4, 6]])
In [49]: np.vstack((b,a[~np.in1d(a[:,0],b[:,0])]*[1,0]))
Out[49]:
array([[8, 7],
[4, 6],
[2, 0],
[7, 0]])

First we should clear up one misconception. c does not have to be a copy. A new variable assignment is sufficient.
c = b
...
c= np.insert(c, len(c), [a[j,0], 0], axis = 0)
np.insert is not modifying any of its inputs. Rather it makes a new array. And the c=... just assigns that to c, replacing the original assignment. So the original c assignment just makes writing the iteration easier.
Since you are adding this new [a[j,0],0] at the end, you could use concatenate (the underlying function used by insert and stack(s).
c = np.concatenate((c, [a[j,0],0]), axis=0)
That won't make much of a change in the run time. It's better to find all the a[j] and add them all at once.
In this case you want to add a[2,0] and a[3,0]. Leaving aside, for the moment, the question of how we find [2,3], we can do:
In [595]: a=np.array([[2,3],[4,5],[6,8],[9,6]])
In [596]: b=np.array([[2,3],[4,5]])
In [597]: ind = [2,3]
An assign and fill approach would look like:
In [605]: c = np.zeros_like(a) # target array
In [607]: c[0:b.shape[0],:] = b # fill in the b values
In [608]: c[b.shape[0]:,0] = a[ind,0] # fill in the selected a column
In [609]: c
Out[609]:
array([[2, 3],
[4, 5],
[6, 0],
[9, 0]])
A variation would be construct a temporary array with the new a values, and concatenate
In [613]: a1 = np.zeros((len(ind),2),a.dtype)
In [614]: a1[:,0] = a[ind,0]
In [616]: np.concatenate((b,a1),axis=0)
Out[616]:
array([[2, 3],
[4, 5],
[6, 0],
[9, 0]])
I'm using the a1 create and fill approach because I'm too lazy to figure out how to concatenate a[ind,0] with enough 0s to make the same thing. :)
As Divakar shows, np.in1d is a handy way of finding the matches
In [617]: np.in1d(a[:,0],b[:,0])
Out[617]: array([ True, True, False, False], dtype=bool)
In [618]: np.nonzero(~np.in1d(a[:,0],b[:,0]))
Out[618]: (array([2, 3], dtype=int32),)
In [619]: np.nonzero(~np.in1d(a[:,0],b[:,0]))[0]
Out[619]: array([2, 3], dtype=int32)
In [620]: ind=np.nonzero(~np.in1d(a[:,0],b[:,0]))[0]
If you don't care about the order a[ind,0] can also be gotten with np.setdiff1d(a[:,0],b[:,0]) (the values will be sorted).

Assuming you are working on a single dimensional array:
import numpy as np
a = np.linspace(1, 90, 90)
b = np.array([1,2,3,4,5,6,7,8,9,10,11,13,14,15,16,17,18,19,20,
21,22,23,24,25,27,28,31,32,33,34,35,36,37,38,39,
40,41,42,43,44,46,47,48,49,50,51,52,53,54,55,56,
57,58,59,60,61,62,63,64,65,67,70,72,73,74,75,76,
77,78,79,80,81,82,84,85,86,87,88,89,90])
m_num = np.setxor1d(a, b).astype(np.uint8)
print("Total {0} numbers missing: {1}".format(len(m_num), m_num))
This also works in a 2D space:
t1 = np.reshape(a, (10, 9))
t2 = np.reshape(b, (10, 8))
m_num2 = np.setxor1d(t1, t2).astype(np.uint8)
print("Total {0} numbers missing: {1}".format(len(m_num2), m_num2))

Calculating Mean of arrays with different lengths

Is it possible to calculate the mean of multiple arrays, when they may have different lengths? I am using numpy. So let's say I have:
numpy.array([[1, 2, 3, 4, 8], [3, 4, 5, 6, 0]])
numpy.array([[5, 6, 7, 8, 7, 8], [7, 8, 9, 10, 11, 12]])
numpy.array([[1, 2, 3, 4], [5, 6, 7, 8]])
Now I want to calculate the mean, but ignoring elements that are 'missing' (Naturally, I can not just append zeros as this would mess up the mean)
Is there a way to do this without iterating through the arrays?
PS. These arrays are all 2-D, but will always have the same amount of coordinates for that array. I.e. the 1st array is 5 and 5, 2nd is 6 and 6, 3rd is 4 and 4.
An example:
np.array([[1, 2], [3, 4]])
np.array([[1, 2, 3], [3, 4, 5]])
np.array([[7], [8]])
This must give
(1+1+7)/3 (2+2)/2 3/1
(3+3+8)/3 (4+4)/2 5/1
And graphically:
[1, 2] [1, 2, 3] [7]
[3, 4] [3, 4, 5] [8]
Now imagine that these 2-D arrays are placed on top of each other with coordinates overlapping contributing to that coordinate's mean.

I often needed this for plotting mean of performance curves with different lengths.
Solved it with simple function (based on answer of #unutbu):
def tolerant_mean(arrs):
lens = [len(i) for i in arrs]
arr = np.ma.empty((np.max(lens),len(arrs)))
arr.mask = True
for idx, l in enumerate(arrs):
arr[:len(l),idx] = l
return arr.mean(axis = -1), arr.std(axis=-1)
y, error = tolerant_mean(list_of_ys_diff_len)
ax.plot(np.arange(len(y))+1, y, color='green')
So applying that function to the list of above-plotted curves yields the following:

numpy.ma.mean allows you to compute the mean of non-masked array elements. However, to use numpy.ma.mean, you have to first combine your three numpy arrays into one masked array:
import numpy as np
x = np.array([[1, 2], [3, 4]])
y = np.array([[1, 2, 3], [3, 4, 5]])
z = np.array([[7], [8]])
arr = np.ma.empty((2,3,3))
arr.mask = True
arr[:x.shape[0],:x.shape[1],0] = x
arr[:y.shape[0],:y.shape[1],1] = y
arr[:z.shape[0],:z.shape[1],2] = z
print(arr.mean(axis = 2))
yields
[[3.0 2.0 3.0]
[4.66666666667 4.0 5.0]]

The below function also works by adding columns of arrays of different lengths:
def avgNestedLists(nested_vals):
"""
Averages a 2-D array and returns a 1-D array of all of the columns
averaged together, regardless of their dimensions.
"""
output = []
maximum = 0
for lst in nested_vals:
if len(lst) > maximum:
maximum = len(lst)
for index in range(maximum): # Go through each index of longest list
temp = []
for lst in nested_vals: # Go through each list
if index < len(lst): # If not an index error
temp.append(lst[index])
output.append(np.nanmean(temp))
return output
Going off of your first example:
avgNestedLists([[1, 2, 3, 4, 8], [5, 6, 7, 8, 7, 8], [1, 2, 3, 4]])
Outputs:
[2.3333333333333335,
3.3333333333333335,
4.333333333333333,
5.333333333333333,
7.5,
8.0]
The reason np.amax(nested_lst) or np.max(nested_lst) was not used in the beginning to find the max value is because it will return an array if the nested lists are of different sizes.

OP, I know you were looking for a non-iterative built-in solution, but the following really only takes 3 lines (2 if you combine transpose and means but then it just gets messy):
arrays = [
np.array([1,2], [3,4]),
np.array([1,2,3], [3,4,5]),
np.array([7], [8])
]
mean = lambda x: sum(x)/float(len(x))
transpose = [[item[i] for item in arrays] for i in range(len(arrays[0]))]
means = [[mean(j[i] for j in t if i < len(j)) for i in range(len(max(t, key = len)))] for t in transpose]
Outputs:
>>>means
[[3.0, 2.0, 3.0], [4.666666666666667, 4.0, 5.0]]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

selective row sum matrix in numpy - python

Related

"Vectorized" Matrix-Vector multiplication in numpy

how to multiply each row from one matrix to every rows to another matrix on Python?

Using Numpy to generate random combinations of two arrays without repetition

How to compare two numpy arrays and add missing values to the other with a tweak

Calculating Mean of arrays with different lengths

Categories

Resources