Extract a subset of data from numpy array - python

I have a 2D numpy array that I need to extract a subset of data from where the value of the 2nd column is higher than a certain value. What's the best way to do this?
E.g. given the array:
array1 = [[1, 5], [2, 6], [3, 7], [4, 8]]
I would want to extract all rows where the 2nd column was higher than 6, so I'd get:
[3, 7], [4, 8]

Or, even more simply:
a[a[:,1] > 6]
Output:
array([[3, 7], [4, 8]])
Where a is the array.

Use numpy.where:
import numpy as np
a = np.array([[1, 5], [2, 6], [3, 7], [4, 8]])
# all elements where the second item it greater than 6:
print(a[np.where(a[:, 1] > 6)])
# output: [[3 7], [4 8]]

Use list comprehension:
array1 = [[1, 5], [2, 6], [3, 7], [4, 8]]
threshold = 6
print([elem for elem in array1 if elem[1] > threshold])
# [[3, 7], [4, 8]]
Or using numpy:
import numpy as np
array1 = np.array(array1)
print(array1[array1[:,1] > 6])
# array([[3, 7], [4, 8]])

Related

Divide a 2d numpy in 3D according to a window of size w and a step p

I can do it with a loop but it takes me forever. Is there a way to do it without a loop or much faster? Here is my code explained. "data" is my 2D-array (M, N). "seq" is my window size (e.g., 40) and size = data.shape[0] = M.
X = list()
for j in range(size):
end_idx = j + seq
if end_idx >= size:
break
seq_x = data[j:end_idx, :]
X.append(seq_x)
final_data = np.array(X)
It will look like below:
data = [[0, 1]
[2, 3]
[3, 4]
[4, 5]
[5, 6]
[6, 7]
[7, 8]
[8, 9]
[9, 7]]
For a window of size w = 2 we have
res = [[[0, 1]
[2, 3]]
[[2, 3]
[3, 4]]
[[3, 4]
[4, 5]]
...
[[8, 9]
[9, 7]]]
Is any one as an idea of how to do it so that it can be executed quickly?
import numpy as np
data = np.array([[0, 1],
[2, 3],
[3, 4],
[4, 5],
[5, 6],
[6, 7],
[7, 8],
[8, 9],
[9, 7]])
w = 2
window_width = data.shape[1]
out = np.lib.stride_tricks.sliding_window_view(data, window_shape=(w, window_width)).squeeze()
out:
array([[[0, 1],
[2, 3]],
[[2, 3],
[3, 4]],
...
[[7, 8],
[8, 9]],
[[8, 9],
[9, 7]]])

Seeking missing element in an array in Python

I have an array I with shape=(10,2). I want to probe this array for missing j=1. By missing, I mean there are no indices with j=1. As is evident from I, there are indices with j=2,3,4,5,6,7.
For the purpose of notation, in [0,3], i=0,j=3.
import numpy as np
I=np.array([[0, 3],
[1, 2],
[1, 4],
[2, 5],
[3, 4],
[4, 5],
[4, 6],
[5, 7],
[6, 7]])
The expected output is
Missing_j=[1]
If you want to check single index then you can use code from #ArrowRise comment.
if 1 not in I[:,1]: return '1 is missing'
If you want to get all missing indexes then you can use set() for this.
You can convert second column to set
set1 = set(I[:,1])
and generate set with all expected indexes
max_j = max(I[:,1])
set2 = set( range(1, max_j+1) )
And later you can do
missing_j = set2 - set1
Full working example - I added [6, 10] to have more missing indexes.
import numpy as np
I = np.array([
[0, 3],
[1, 2],
[1, 4],
[2, 5],
[3, 4],
[4, 5],
[4, 6],
[5, 7],
[6, 7],
[6, 10],
])
set1 = set( I[:,1] )
max_j = max(I[:,1])
set2 = set( range(1, max_j+1) )
missing_j = sorted( set2 - set1 )
print( missing_j )
Result:
[1, 8, 9]

Picking rows from two NumPy arrays at random

Starting from two numpy arrays:
A = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
B = np.array([[9, 8], [8, 7], [7, 6], [6, 5]])
I would like to create a new array C picking, for each index, one row from the same index but randomly from A or B. The idea is that at each index of random_selector, if the value is higher than 0.1, then we chose the same-index row from A, otherwise, the same-index row from B.
random_selector = np.random.random(size=len(A))
C = np.where(random_selector > .1, A, B)
# example of desired result picking rows from respectively A, B, B, A:
# [[1, 2], [8, 7], [7, 6], [4, 5]]
Running the above code, however, produces the following error:
ValueError: operands could not be broadcast together with shapes (4,) (4,2) (4,2)
Try adding a new dimension:
import numpy as np
A = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
B = np.array([[9, 8], [8, 7], [7, 6], [6, 5]])
random_selector = np.random.random(size=len(A))
C = np.where((random_selector > .1)[:, None], A, B)
print(C)
Output (of a single run)
[[1 2]
[8 7]
[3 4]
[4 5]]

Cartesian product of rows of a very big array

I have an array of size (100, 50). I need to generate an output array which represents a cartesian product of input array rows.
For simplification purposes, let's have an input array:
array([[2, 6, 5],
[7, 3, 6]])
As output I would like to have:
array([[2, 7],
[2, 3],
[2, 6],
[6, 7],
[6, 3],
[6, 6],
[5, 7],
[5, 3],
[5, 6]])
Note: itertools.product doesn't work here, because of the size of the input vector. Also all another similar answers, assumes number of rows smaller than 32, what is not the case here
This question has been asked many times, for example here.
The array of a size (100, 50) is too big and can't be handled by numpy. However, smaller array size might be solved.
Anyway, I prefer to use itertools for this kind of stuff:
import itertools
a = np.array([[2, 6, 5], [7, 3, 6]])
np.array(list(itertools.product(*a)))
array([[2, 7],
[2, 3],
[2, 6],
[6, 7],
[6, 3],
[6, 6],
[5, 7],
[5, 3],
[5, 6]])
a = np.array([[2, 6, 5],[7, 3, 6]])
out = np.array(np.meshgrid(a[0], a[1])).T.reshape(-1,2)
print(out)
"""
prints
[[2 7]
[2 3]
[2 6]
[6 7]
[6 3]
[6 6]
[5 7]
[5 3]
[5 6]]
"""

Filling an array with arrays or vectors in python using numpy without a loop

I'm trying to find a way to fill an array with rows of values. It's much easier to express my desired output with an example. Given the input of an N x M matrix, array1,
array1 = np.array([[2, 3, 4],
[4, 8, 3],
[7, 6, 3]])
I would like to output an array of arrays in which each row is an N x N consisting of the values from the respective row. The output would be
[[[2, 3, 4],
[2, 3, 4],
[2, 3, 4]],
[[4, 8, 3],
[4, 8, 3],
[4, 8, 3]],
[[7, 6, 3],
[7, 6, 3],
[7, 6, 3]]]
You can reshape the array from 2d to 3d, then use numpy.repeat() along the desired axis:
np.repeat(array1[:, None, :], 3, axis=1)
#array([[[2, 3, 4],
# [2, 3, 4],
# [2, 3, 4]],
# [[4, 8, 3],
# [4, 8, 3],
# [4, 8, 3]],
# [[7, 6, 3],
# [7, 6, 3],
# [7, 6, 3]]])
Or equivalently you can use numpy.tile:
np.tile(array1[:, None, :], (1,3,1))
Another solution which is sometimes useful is the following
out = np.empty((3,3,3), dtype=array1.dtype)
out[...] = array1[:, None, :]

Categories

Resources