Extracting subarray based on logical indexing - python

Take a look at the following code in MATLAB:
a = [1,2; 5,6]
b = [-1,1; -1,1]
d = a(b(:)>0)
Now d will be the 2x1 array,[2;6]. This is because array b has positive entry only at the positions (1,2) and (2,2), and the third line of the code is extracting elements of a in those positions.
Is there an equivalent method in Python that does this? I searched numpy documentation but could not find any. In my actual code, I have multiple large, multidimensional arrays from which I would want to extract elements based on the elements of other arrays. Of course, this can be done with nested for loops but it would be much better if there is a nicer way like MATLAB does.

Assuming a and b are numpy arrays use:
d = a[b > 0]
In numpy, indexing is done with the [] operator.

Without using any libraries:
a = [[1, 2], [5, 6]]
b = [[-1, 1], [-1, 1]]
d = [
a_xy
for a_x, b_x in zip(a, b)
for a_xy, b_xy in zip(a_x, b_x)
if b_xy > 0
]
Using numpy:
import numpy as np
a = np.array([[1, 2], [5, 6]])
b = np.array([[-1, 1], [-1, 1]])
d = a[b > 0]

Related

Determine indexes where A is a submatrix of matrix B

I'm coding in python and I would like to know how I can get the indexes of a matrix A where the matrix B is contained in A. For example, if we have
A = [[1,2,3],
[4,5,6],
[7,8,9]]
and
B = [[2,3],
[5,6]]
Then it returns indexes ([0,0,1,1], [1,2,1,2]), where the first list corresponds to x-axis, and the second to the y-axis. Or something like this.
Thank you for yor help !
You can check this question to find if a matrix is a submatrix of another one.
Then, you can get the coordinates of each element exploiting the NumPy where function as:
import numpy as np
A = np.linspace(1, 9, 9).reshape([3, 3])
B = np.asarray([2, 3, 5, 6]).reshape([2, 2])
submatrix_tuple_coord = [list(np.where(A==b)) for bb in B for b in bb]
submatrix_xy = [[int(x), int(y)] for x, y in submatrix_tuple_coord]
# Return a list of list with the row-column indices
submatrix_xy
>>> [[0, 1], [0, 2], [1, 1], [1, 2]]

Multiplication of 2 lists of array with matching indices

Given 2 lists of arrays (or 2 3D arrays) is there a smarter way in numpy, besides a loop, to get the multiplication of the first array of the first list times the first array of the second list and so on? I have a feeling I am overlooking the obvious. This is my current implementation:
import numpy as np
r = []
for i in range(np.shape(rz)[2]):
r.append(ry[..., i] # rz[..., i])
r = np.array(r)
Assuming that the last dimension is the same, numpy.einsum should do the trick:
import numpy as np
np.einsum('ijk,jmk-> imk', ry, rz)
import numpy as np
A = np.array([[3, 6, 7], [5, -3, 0]])
B = np.array([[1, 1], [2, 1], [3, -3]])
C = A.dot(B)
print(C)
Output:
[[ 36 -12] [ -1 2]]

Make 1 dimensional array 2D using numpy

I have a list of numbers which I wish to add a second column such that the array becomes 2D like in the example below:
a = [1,1,1,1,1]
b = [2,2,2,2,2]
should become:
c = [[1,2],[1,2],[1,2],[1,2],[1,2]]
I am not sure how to do this using numpy?
I would just stack them and then transpose the resulting array with .T:
import numpy as np
a = np.array([1, 1, 1, 1, 1])
b = np.array([2, 2, 2, 2, 2])
c = np.stack((a, b)).T
Use numpy built-in functions:
import numpy as np
c = np.vstack((np.array(a),np.array(b))).T.tolist()
np.vstack stacks arrays vertically. .T transposes the array and tolist() converts it back to a list.
Another similar way to do it, is to add a dimensions using [:,None] and then you can horizontally stack them without the need to transpose:
c = np.hstack((np.array(a)[:,None],np.array(b)[:,None])).tolist())
output:
[[1, 2], [1, 2], [1, 2], [1, 2], [1, 2]]

NumPy apply function to groups of rows corresponding to another numpy array

I have a NumPy array with each row representing some (x, y, z) coordinate like so:
a = array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2]])
I also have another NumPy array with unique values of the z-coordinates of that array like so:
b = array([1, 2])
How can I apply a function, let's call it "f", to each of the groups of rows in a which correspond to the values in b? For example, the first value of b is 1 so I would get all rows of a which have a 1 in the z-coordinate. Then, I apply a function to all those values.
In the end, the output would be an array the same shape as b.
I'm trying to vectorize this to make it as fast as possible. Thanks!
Example of an expected output (assuming that f is count()):
c = array([2, 2])
because there are 2 rows in array a which have a z value of 1 in array b and also 2 rows in array a which have a z value of 2 in array b.
A trivial solution would be to iterate over array b like so:
for val in b:
apply function to a based on val
append to an array c
My attempt:
I tried doing something like this, but it just returns an empty array.
func(a[a[:, 2]==b])
The problem is that the groups of rows with the same Z can have different sizes so you cannot stack them into one 3D numpy array which would allow to easily apply a function along the third dimension. One solution is to use a for-loop, another is to use np.split:
a = np.array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2],
[4, 3, 1]])
a_sorted = a[a[:,2].argsort()]
inds = np.unique(a_sorted[:,2], return_index=True)[1]
a_split = np.split(a_sorted, inds)[1:]
# [array([[0, 0, 1],
# [4, 5, 1],
# [4, 3, 1]]),
# array([[1, 1, 2],
# [4, 5, 2]])]
f = np.sum # example of a function
result = list(map(f, a_split))
# [19, 15]
But imho the best solution is to use pandas and groupby as suggested by FBruzzesi. You can then convert the result to a numpy array.
EDIT: For completeness, here are the other two solutions
List comprehension:
b = np.unique(a[:,2])
result = [f(a[a[:,2] == z]) for z in b]
Pandas:
df = pd.DataFrame(a, columns=list('XYZ'))
result = df.groupby(['Z']).apply(lambda x: f(x.values)).tolist()
This is the performance plot I got for a = np.random.randint(0, 100, (n, 3)):
As you can see, approximately up to n = 10^5 the "split solution" is the fastest, but after that the pandas solution performs better.
If you are allowed to use pandas:
import pandas as pd
df=pd.DataFrame(a, columns=['x','y','z'])
df.groupby('z').agg(f)
Here f can be any custom function working on grouped data.
Numeric example:
a = np.array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2]])
df=pd.DataFrame(a, columns=['x','y','z'])
df.groupby('z').size()
z
1 2
2 2
dtype: int64
Remark that .size is the way to count number of rows per group.
To keep it into pure numpy, maybe this can suit your case:
tmp = np.array([a[a[:,2]==i] for i in b])
tmp
array([[[0, 0, 1],
[4, 5, 1]],
[[1, 1, 2],
[4, 5, 2]]])
which is an array with each group of arrays.
c = np.array([])
for x in np.nditer(b):
c = np.append(c, np.where((a[:,2] == x))[0].shape[0])
Output:
[2. 2.]

python numpy `np.take` with 2 dimensional array

I'm trying to take a list of elements from an 2D numpy array with given list of coordinates and I want to avoid using loop. I saw that np.take works with 1D array but I can't make it work with 2D arrays.
Example:
a = np.array([[1,2,3], [4,5,6]])
print(a)
# [[1 2 3]
# [4 5 6]]
np.take(a, [[1,2]])
# gives [2, 3] but I want just [6]
I want to avoid loop because I think that will be slower (I need speed). But if you can persuade me that a loop is as fast as an existing numpy function solution, then I can go for it.
If I understand it correctly, you have a list of coordinates like this:
coords = [[y0, x0], [y1, x1], ...]
To get the values of array a at these coordinates you need:
a[[y0, y1, ...], [x0, x1, ...]]
So a[coords] will not work. One way to do it is:
Y = [c[0] for c in coords]
X = [c[1] for c in coords]
or
Y = np.transpose(coords)[0]
X = np.transpose(coords)[1]
Then
a[Y, X]
Does fancy indexing do what you want? np.take seems to flatten the array before operating.
import numpy as np
a = np.arange(1, 10).reshape(3,3)
a
# array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]])
rows = [ 1,1,2,0]
cols = [ 0,1,1,2]
# Use the indices to access items in a
a[rows, cols]
# array([4, 5, 8, 3])
a[1,0], a[1,1], a[2,1], a[0,2]
# (4, 5, 8, 3)

Categories

Resources