Numpy filter matrix based on column

Numpy filter matrix based on column - python

I have a matrix with several different values for each row:
arr1 = np.array([[1,2,3,4,5,6,7,8,9],[10,11,12,13,14,15,16,17,18],[19,20,21,22,23,24,25,26,27]])
arr2 = np.array([["A"],["B"],["C"]])
This produces the following matrices:
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24, 25, 26, 27]])
array([['A'],
['B'],
['C']])
A represents the first 3 columns, B represents the next 3 columns, and C represents the last 3 columns. So the result I'd like here is:
array([[1,2,3],
[13,14,15],
[25,26,27]])
I was thinking about converting arr2 to a mask array, but I'm not even sure how to do this. If it was a 1darray I could do something like this:
arr[0,1,2]
but for a 2darray I'm not even sure how to mask like this. I tried this and got errors:
arr[[0,1,2],[3,4,5],[6,7,8]]
What's the best way to do this?
Thanks.

You could use string.ascii_uppercase to index the index in the alphabet. And reshape arr1 by 3 chunks:
from string import ascii_uppercase
reshaped = np.reshape(arr1, (len(arr1), -1, 3))
reshaped[np.arange(len(arr1)), np.vectorize(ascii_uppercase.index)(arr2).ravel()]
Or just directly map A to 0 and so on...
reshaped = np.reshape(arr1, (len(arr1), -1, 3))
reshaped[np.arange(len(arr1)), np.vectorize(['A', 'B', 'C'].index)(arr2).ravel()]
Both Output:
array([[ 1, 2, 3],
[13, 14, 15],
[25, 26, 27]])

If you gonna have shape of arr1 fixed as shown above (3,9) then it can be done with single line of code as below:
arr2 = np.array([arr1[0][0:3],arr1[1][3:6],arr1[2][6:9]])
The output will be as follows:
[[ 1 2 3]
[13 14 15]
[25 26 27]]

you can use 'advanced indexing' which index the target array by coordinate arrays.
rows = np.array([[0,0,0],[1,1,1],[2,2,2]])
cols = np.array([[0,1,2],[3,4,5],[6,7,8]])
arr1[rows, cols]
>>> array([[ 1, 2, 3],
[13, 14, 15],
[25, 26, 27]])
and you can make some functions like
def diagonal(arr, step):
rows = np.array([[x]*step for x in range(step)])
cols = np.array([[y for y in range(x, x+step)] for x in range(0, step**2, step)])
return arr[rows, cols]
diagonal(arr1, 3)
>>> array([[ 1, 2, 3],
[13, 14, 15],
[25, 26, 27]])
reference: https://numpy.org/devdocs/user/basics.indexing.html

Related

Numpy indexing by range of arrays

Say I have an array myarr such that myarr.shape = (2,64,64,2). Now if I define myarr2 = myarr[[0,1,0,0,1],...], then the following is true
myarr2.shape #(5,64,64,2)
myarr2[0,...] == myarr[0,...] # = True
myarr2[1,...] == myarr[1,...] # = True
myarr2[2,...] == myarr[0,...] # = True
...
Can this be generalized so the slices are arrays? That is, is there a way to make the following hypothetical code work?
myarr2 = myarr[...,[20,30,40]:[30,40,50],[15,25,35]:[25,35,45],..]
myarr2[0,] == myarr[...,20:30,15:25,...] # = True
myarr2[1,] == myarr[...,30:40,25:35,...] # = True
myarr2[2,] == myarr[...,40:50,35:45,...] # = True

you may feed the coordinates of subarrays to the cycle which cuts subarrays from myarray. I don't know hoe you store the indices of subarrays so I put them into nested list idx_list:
idx_list = [[[20,30,40],[30,40,50]],[[15,25,35]:[25,35,45]]] # assuming 2D cutouts
idx_array = np.array([k for i in idx_list for j in i for k in j]) # unpack
idx_array = idx_array .reshape(-1,2).T # reshape
myarray2 = np.array([myarray[a:b,c:d] for a,b,c,d in i2]) # cut and combine

Let's simplify the problem a bit; first by removing the two outer dimensions that don't affect the core indexing issue; and by reducing the size so we can see and understand the results.
The setup
In [540]: arr = np.arange(7*7).reshape(7,7)
In [541]: arr
Out[541]:
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34],
[35, 36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47, 48]])
In [542]: idx =np.array([[0,2,4,6],[1,3,5,7]])
Now a straightforward iteration approach:
In [543]: alist = []
...: for i in range(idx.shape[1]-1):
...: j,k = idx[:,i]
...: sub = arr[j:j+2, k:k+2]
...: alist.append(sub)
...:
In [544]: np.array(alist)
Out[544]:
array([[[ 1, 2],
[ 8, 9]],
[[17, 18],
[24, 25]],
[[33, 34],
[40, 41]]])
In [545]: _.shape
Out[545]: (3, 2, 2)
I simplified the iteration from:
...: for i in range(idx.shape[1]-1):
...: sub = arr[idx[0,i]:idx[0,i+1],idx[1,i]:idx[1,i+1]]
...: alist.append(sub)
to highlight the fact that we are generating ranges of a consistent size, and make the next transformation more obvious.
So I start with a (7,7) array, and create 3 (2,2) slices.
As I demonstrated in Slicing a different range at each index of a multidimensional numpy array, we can use linspace to expand a set of slices, or ranges.
In [567]: ranges = np.linspace(idx[:,:3],idx[:,:3]+1,2).astype(int)
In [568]: ranges
Out[568]:
array([[[0, 2, 4],
[1, 3, 5]],
[[1, 3, 5],
[2, 4, 6]]])
So ranges[0] expands on the idx[0] slices, etc. But if I simply index with these I get 'diagonal' values from Out[554]:
In [569]: arr[ranges[0], ranges[1]]
Out[569]:
array([[ 1, 17, 33],
[ 9, 25, 41]])
to get blocks I have to add a dimension to the first indices:
In [570]: arr[ranges[0,:,None], ranges[1]]
Out[570]:
array([[[ 1, 17, 33],
[ 2, 18, 34]],
[[ 8, 24, 40],
[ 9, 25, 41]]])
these are the same values as in Out[554], but need to be transposed:
In [571]: _.transpose(2,0,1)
Out[571]:
array([[[ 1, 2],
[ 8, 9]],
[[17, 18],
[24, 25]],
[[33, 34],
[40, 41]]])
The code's a bit clunky and needs to get generalized, but gives the general idea of how one can substitute one indexing for the iterative one, provide the slices are regular enough. For this small example it probably isn't faster, but it probably will come ahead as the problem size gets larger.

What is the easiest way in NumPy to index vectors of a matrix excluding one index in each row? [duplicate]

This question already has answers here:
How do I get all the values from a NumPy array excluding a certain index?
(5 answers)
Closed 4 years ago.
Suppose I have a NumPy ndarray M with the following content at M[0,:]:
[2, 3.9, 7, 9, 0, 1, 8.1, 3.2]
and I am given an integer, k, at runtime between 0 and 7. I want to produce the vector consisting of all items in this row except at column k. (Example: if k=3, then the desired vector is [2,3.9,7,0,1,8.1,3.2])
Is there an easy way to do this?
What if I have a vector of indices k, one for each row of M, representing the column I want to exclude from the row?
I'm kind of lost, other than a non-vectorized loop that mutates a result matrix:
nrows = M.shape[0]
result = np.zeros(nrows,M.shape[1]-1))
for irow in xrange(nrows):
result[irow,:k[irow]] = M[irow,:k[irow]] # content before the split point
result[irow,k[irow]:] = M[irow,k[irow]+1:] # content after the split point

One approach would be with masking/boolean-indexing -
mask = np.ones(M.shape,dtype=bool)
mask[np.arange(len(k)),k] = 0
out = M[mask].reshape(len(M),-1)
Alternativley, we could use broadcasting to get that mask -
np.not_equal.outer(k,np.arange(M.shape[1]))
# or k[:,None]!=np.arange(M.shape[1])
Thus, giving us a one-liner/compact version -
out = M[k[:,None]!=np.arange(M.shape[1])].reshape(len(M),-1)
To exclude multiple ones per row, edit the advanced-indexing part for the first method -
def exclude_multiple(M,*klist):
k = np.stack(klist).T
mask = np.ones(M.shape,dtype=bool)
mask[np.arange(len(k))[:,None],k] = 0
out = M[mask].reshape(len(M),-1)
return out
Sample run -
In [185]: M = np.arange(40).reshape(4,10)
In [186]: exclude_multiple(M,[1,3,2,0],[4,5,8,1])
Out[186]:
array([[ 0, 2, 3, 5, 6, 7, 8, 9],
[10, 11, 12, 14, 16, 17, 18, 19],
[20, 21, 23, 24, 25, 26, 27, 29],
[32, 33, 34, 35, 36, 37, 38, 39]])

Improvement on #Divakar's answer to extend this to zero or more excluded indices per row:
def excluding(A, *klist):
"""
excludes column k from each row of A, for each k in klist
(make sure the index vectors have no common elements)
"""
mask = np.ones(A.shape,dtype=bool)
for k in klist:
mask[np.arange(len(k)),k] = 0
return A[mask].reshape(len(A),-1)
Test:
M = np.arange(40).reshape(4,10)
excluding(M,[1,3,2,0],[4,5,8,1])
returns
array([[ 0, 2, 3, 5, 6, 7, 8, 9],
[10, 11, 12, 14, 16, 17, 18, 19],
[20, 21, 23, 24, 25, 26, 27, 29],
[32, 33, 34, 35, 36, 37, 38, 39]])

Merging rows in numpy to form new array

This is a sample of what I am trying to accomplish. I am very new to python and have searched for hours to find out what I am doing wrong. I haven't been able to find what my issue is. I am still new enough that I may be searching for the wrong phrases. If so, could you please point me in the right direction?
I want to combine n mumber of arrays to make one array. I want to have the first row from x as the first row in the combined the first row from y as the second row in combined, the first row from z as the third row in combined the the second row in x as the fourth row in combined, etc.
so I would look something like this.
x = [x1 x2 x3]
[x4 x5 x6]
[x7 x8 x9]
y = [y1 y2 y3]
[y4 y5 y6]
[y7 y8 y9]
x = [z1 z2 z3]
[z4 z5 z6]
[z7 z8 z9]
combined = [x1 x2 x3]
[y1 y2 y3]
[z1 z2 z3]
[x4 x5 x6]
[...]
[z7 z8 z9]
The best I can come up with is the
import numpy as np
x = np.random.rand(6,3)
y = np.random.rand(6,3)
z = np.random.rand(6,3)
combined = np.zeros((9,3))
for rows in range(len(x)):
combined[0::3] = x[rows,:]
combined[1::3] = y[rows,:]
combined[2::3] = z[rows,:]
print(combined)
All this does is write the last value of the input array to every third row in the output array instead of what I wanted. I am not sure if this is even the best way to do this. Any advice would help out.
*I just figure out this works but if someone knows a higher performance method, *please let me know.
import numpy as np
x = np.random.rand(6,3)
y = np.random.rand(6,3)
z = np.random.rand(6,3)
combined = np.zeros((18,3))
for rows in range(6):
combined[rows*3,:] = x[rows,:]
combined[rows*3+1,:] = y[rows,:]
combined[rows*3+2,:] = z[rows,:]
print(combined)

You can do this using a list comprehension and zip:
combined = np.array([row for row_group in zip(x, y, z) for row in row_group])

Using vectorised operations only:
A = np.vstack((x, y, z))
idx = np.arange(A.shape[0]).reshape(-1, x.shape[0]).T.flatten()
A = A[idx]
Here's a demo:
import numpy as np
x, y, z = np.random.rand(3,3), np.random.rand(3,3), np.random.rand(3,3)
print(x, y, z)
[[ 0.88259564 0.17609363 0.01067734]
[ 0.50299357 0.35075811 0.47230915]
[ 0.751129 0.81839586 0.80554345]]
[[ 0.09469396 0.33848691 0.51550685]
[ 0.38233976 0.05280427 0.37778962]
[ 0.7169351 0.17752571 0.49581777]]
[[ 0.06056544 0.70273453 0.60681583]
[ 0.57830566 0.71375038 0.14446909]
[ 0.23799775 0.03571076 0.26917939]]
A = np.vstack((x, y, z))
idx = np.arange(A.shape[0]).reshape(-1, x.shape[0]).T.flatten()
print(idx) # [0 3 6 1 4 7 2 5 8]
A = A[idx]
print(A)
[[ 0.88259564 0.17609363 0.01067734]
[ 0.09469396 0.33848691 0.51550685]
[ 0.06056544 0.70273453 0.60681583]
[ 0.50299357 0.35075811 0.47230915]
[ 0.38233976 0.05280427 0.37778962]
[ 0.57830566 0.71375038 0.14446909]
[ 0.751129 0.81839586 0.80554345]
[ 0.7169351 0.17752571 0.49581777]
[ 0.23799775 0.03571076 0.26917939]]

I have changed your code a little bit to get the desired output
import numpy as np
x = np.random.rand(6,3)
y = np.random.rand(6,3)
z = np.random.rand(6,3)
combined = np.zeros((18,3))
combined[0::3] = x
combined[1::3] = y
combined[2::3] = z
print(combined)
You had the shape of the combined matrix wrong and there is no real need for the for loop.

This might not be the most pythonic way to do it but you could
for block in range(len(combined)/3):
for rows in range(len(x)):
combined[block*3+0::3] = x[rows,:]
combined[block*3+1::3] = y[rows,:]
combined[block*3+2::3] = z[rows,:]

A simple numpy solution is to stack the arrays on a new middle axis, and reshape the result to 2d:
In [5]: x = np.arange(9).reshape(3,3)
In [6]: y = np.arange(9).reshape(3,3)+10
In [7]: z = np.arange(9).reshape(3,3)+100
In [8]: np.stack((x,y,z),axis=1).reshape(-1,3)
Out[8]:
array([[ 0, 1, 2],
[ 10, 11, 12],
[100, 101, 102],
[ 3, 4, 5],
[ 13, 14, 15],
[103, 104, 105],
[ 6, 7, 8],
[ 16, 17, 18],
[106, 107, 108]])
It may be easier to see what's happening if we give each dimension a different value; e.g. 2 3x4 arrays:
In [9]: x = np.arange(12).reshape(3,4)
In [10]: y = np.arange(12).reshape(3,4)+10
np.array combines them on a new 1st axis, making a 2x3x4 array. To get the interleaving you want, we can transpose the first 2 dimensions, producing a 3x2x4. Then reshape to a 6x4.
In [13]: np.array((x,y))
Out[13]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[10, 11, 12, 13],
[14, 15, 16, 17],
[18, 19, 20, 21]]])
In [14]: np.array((x,y)).transpose(1,0,2)
Out[14]:
array([[[ 0, 1, 2, 3],
[10, 11, 12, 13]],
[[ 4, 5, 6, 7],
[14, 15, 16, 17]],
[[ 8, 9, 10, 11],
[18, 19, 20, 21]]])
In [15]: np.array((x,y)).transpose(1,0,2).reshape(-1,4)
Out[15]:
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[ 4, 5, 6, 7],
[14, 15, 16, 17],
[ 8, 9, 10, 11],
[18, 19, 20, 21]])
np.vstack produces a 6x4, but with the wrong order. We can't transpose that directly.
np.stack with default axis behaves just like np.array. But with axis=1, it creates a 3x2x4, which we can reshape:
In [16]: np.stack((x,y), 1)
Out[16]:
array([[[ 0, 1, 2, 3],
[10, 11, 12, 13]],
[[ 4, 5, 6, 7],
[14, 15, 16, 17]],
[[ 8, 9, 10, 11],
[18, 19, 20, 21]]])
The list zip in the accepted answer is a list version of transpose, creating a list of 3 2-element tuples.
In [17]: list(zip(x,y))
Out[17]:
[(array([0, 1, 2, 3]), array([10, 11, 12, 13])),
(array([4, 5, 6, 7]), array([14, 15, 16, 17])),
(array([ 8, 9, 10, 11]), array([18, 19, 20, 21]))]
np.array(list(zip(x,y))) produces the same thing as the stack, a 3x2x4 array.
As for speed, I suspect the allocate and assign (as in Ash's answer) is fastest:
In [27]: z = np.zeros((6,4),int)
...: for i, arr in enumerate((x,y)):
...: z[i::2,:] = arr
...:
In [28]: z
Out[28]:
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[ 4, 5, 6, 7],
[14, 15, 16, 17],
[ 8, 9, 10, 11],
[18, 19, 20, 21]])
For serious timings, use much larger examples than this.

I'd like to produce a function like split (arr, i, j) which divides array arr by axis i, j

I'd like to produce a function like split(arr, i, j), which divides array arr by axis i, j?
But I do not know how to do it. In the following method using array_split. It is impossible for me to obtain the two-dimensional array that we are seeking by merely dividing N-dimensional arrays into N-1 dimensional arrays.
import numpy as np
arr = np.arange(36).reshape(4,9)
dim = arr.ndim
ax = np.arange(dim)
arritr = [np.array_split(arr, arr.shape[ax[i]], ax[i]) for i in range(dim)]
print(arritr[0])
print(arritr[1])
How can I achieve this?

I believe you would like to slice array by axis(row, column). Here is the doc. https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
arr[1,:] # will return all values at index 1(row index 1)
arr[:,1] # will return all values at column 1

I'm guessing a bit here, but it sounds like you want to divide the array into 4 blocks.
In [120]: arr = np.arange(36).reshape(6,6)
In [122]: [arr[:3,:4], arr[:3:,4:], arr[3:, :4], arr[3:,4:]]
Out[122]:
[array([[ 0, 1, 2, 3],
[ 6, 7, 8, 9],
[12, 13, 14, 15]]),
array([[ 4, 5],
[10, 11],
[16, 17]]),
array([[18, 19, 20, 21],
[24, 25, 26, 27],
[30, 31, 32, 33]]),
array([[22, 23],
[28, 29],
[34, 35]])]
Don't worry about efficiency. array_split does the same sort of slicing. Check its code to verify that.
If you want more slices, you could add more arr[i1:i2, j1:j2], for any mix of indices.

Are you looking for something like matlab's mat2cell? Then you could do:
import numpy as np
def ndsplit(a, splits):
assert len(splits) <= a.ndim
splits = [np.r_[0, s, m] for s, m in zip(splits, a.shape)]
return np.frompyfunc(lambda *x: a[tuple(slice(s[i],s[i+1]) for s, i in zip(splits, x))], len(splits), 1)(*np.indices(tuple(len(s) - 1 for s in splits)))
# demo
a = np.arange(56).reshape(7, 8)
print(ndsplit(a, [(2, 4), (1, 5, 6)]))
# [[array([[0],
# [8]])
# array([[ 1, 2, 3, 4],
# [ 9, 10, 11, 12]])
# array([[ 5],
# [13]]) array([[ 6, 7],
# [14, 15]])]
# [array([[16],
# [24]])
# array([[17, 18, 19, 20],
# [25, 26, 27, 28]])
# array([[21],
# [29]]) array([[22, 23],
# [30, 31]])]
# [array([[32],
# [40],
# [48]])
# array([[33, 34, 35, 36],
# [41, 42, 43, 44],
# [49, 50, 51, 52]])
# array([[37],
# [45],
# [53]])
# array([[38, 39],
# [46, 47],
# [54, 55]])]]

Replace the diagonal elements of a matrix with sum of other elements in the row in Python

import pandas as pd
import numpy as np
rates=(pd.read_excel("C:\Anaconda3\RateMatrix.xlsx", sheetname="Pu239Test", skiprows=0)).as_matrix() #read the matrix values from excel spreadsheet, and converts the values to a matrix
rates is a 22 x 22 matrix.
I would like to replace the diagonal elements of the Rates matrix with the sum of all other elements in the row.
For example,
rates.item(0,0) = rates.item(0,1)+rates.item(0,2)+rates.item(0,3)+....rates.item(0,21)
rates.item(1,1) = rates.item(1,0)+rates.item(1,2)+rates.item(1,3)+....rates.item(1,21)
.....
rates.item(21,21) = rates.item(21,0)+rates.item(21,2)+rates.item(21,3)+....rates.item(21,20)
I was wondering how I can do that. Thanks a lot in advance.

Here's a vectorized approach on a NumPy array a as input -
In [171]: a # Input array
Out[171]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
# Get row and column indices of diag elements
In [172]: row,col = np.diag_indices_from(a)
# Assign the sum of each row except the diag elems into diag positions
In [173]: a[row,col] = a.sum(axis=1) - a[row,col]
# Updated array
In [174]: a
Out[174]:
array([[10, 1, 2, 3, 4],
[ 5, 29, 7, 8, 9],
[10, 11, 48, 13, 14],
[15, 16, 17, 67, 19],
[20, 21, 22, 23, 86]])
Let's manually compute the summations and cross-check against the diagonal elements -
In [175]: a[0,1] + a[0,2] + a[0,3] + a[0,4]
Out[175]: 10
In [176]: a[1,0] + a[1,2] + a[1,3] + a[1,4]
Out[176]: 29
In [177]: a[2,0] + a[2,1] + a[2,3] + a[2,4]
Out[177]: 48
In [178]: a[3,0] + a[3,1] + a[3,2] + a[3,4]
Out[178]: 67
In [179]: a[4,0] + a[4,1] + a[4,2] + a[4,3]
Out[179]: 86

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy filter matrix based on column - python

If you gonna have shape of arr1 fixed as shown above (3,9) then it can be done with single line of code as below: arr2 = np.array([arr1[0][0:3],arr1[1][3:6],arr1[2][6:9]]) The output will be as follows: [[ 1 2 3] [13 14 15] [25 26 27]]

Related

Numpy indexing by range of arrays

What is the easiest way in NumPy to index vectors of a matrix excluding one index in each row? [duplicate]

Merging rows in numpy to form new array

I'd like to produce a function like split (arr, i, j) which divides array arr by axis i, j

Replace the diagonal elements of a matrix with sum of other elements in the row in Python

Categories

Resources