Seeking missing element in an array in Python - python

I have an array I with shape=(10,2). I want to probe this array for missing j=1. By missing, I mean there are no indices with j=1. As is evident from I, there are indices with j=2,3,4,5,6,7.
For the purpose of notation, in [0,3], i=0,j=3.
import numpy as np
I=np.array([[0, 3],
[1, 2],
[1, 4],
[2, 5],
[3, 4],
[4, 5],
[4, 6],
[5, 7],
[6, 7]])
The expected output is
Missing_j=[1]

If you want to check single index then you can use code from #ArrowRise comment.
if 1 not in I[:,1]: return '1 is missing'
If you want to get all missing indexes then you can use set() for this.
You can convert second column to set
set1 = set(I[:,1])
and generate set with all expected indexes
max_j = max(I[:,1])
set2 = set( range(1, max_j+1) )
And later you can do
missing_j = set2 - set1
Full working example - I added [6, 10] to have more missing indexes.
import numpy as np
I = np.array([
[0, 3],
[1, 2],
[1, 4],
[2, 5],
[3, 4],
[4, 5],
[4, 6],
[5, 7],
[6, 7],
[6, 10],
])
set1 = set( I[:,1] )
max_j = max(I[:,1])
set2 = set( range(1, max_j+1) )
missing_j = sorted( set2 - set1 )
print( missing_j )
Result:
[1, 8, 9]

Related

Divide a 2d numpy in 3D according to a window of size w and a step p

I can do it with a loop but it takes me forever. Is there a way to do it without a loop or much faster? Here is my code explained. "data" is my 2D-array (M, N). "seq" is my window size (e.g., 40) and size = data.shape[0] = M.
X = list()
for j in range(size):
end_idx = j + seq
if end_idx >= size:
break
seq_x = data[j:end_idx, :]
X.append(seq_x)
final_data = np.array(X)
It will look like below:
data = [[0, 1]
[2, 3]
[3, 4]
[4, 5]
[5, 6]
[6, 7]
[7, 8]
[8, 9]
[9, 7]]
For a window of size w = 2 we have
res = [[[0, 1]
[2, 3]]
[[2, 3]
[3, 4]]
[[3, 4]
[4, 5]]
...
[[8, 9]
[9, 7]]]
Is any one as an idea of how to do it so that it can be executed quickly?
import numpy as np
data = np.array([[0, 1],
[2, 3],
[3, 4],
[4, 5],
[5, 6],
[6, 7],
[7, 8],
[8, 9],
[9, 7]])
w = 2
window_width = data.shape[1]
out = np.lib.stride_tricks.sliding_window_view(data, window_shape=(w, window_width)).squeeze()
out:
array([[[0, 1],
[2, 3]],
[[2, 3],
[3, 4]],
...
[[7, 8],
[8, 9]],
[[8, 9],
[9, 7]]])

Split a NumPy array into subarrays according to the values (not sorted, but grouped) of another array

Suppose I have two NumPy arrays
x = [[1, 2, 8],
[2, 9, 1],
[3, 8, 9],
[4, 3, 5],
[5, 2, 3],
[6, 4, 7],
[7, 2, 3],
[8, 2, 2],
[9, 5, 3],
[10, 2, 3],
[11, 2, 4]]
y = [0, 0, 1, 0, 1, 1, 2, 2, 2, 0, 0]
Note:
(values in x are not sorted in any way. I chose this example to better illustrate the example)
(These are just two examples of x and y. values of x and y can be arbitrarily many different numbers and y can have arbitrarily different numbers, but there are always as many values in x as there are in y)
I want to efficiently split the array x into sub-arrays according to the values in y.
My desired outputs would be
z_0 = [[1, 2, 8],
[2, 9, 1],
[4, 3, 5],
[10, 2, 3],
[11, 2, 4]]
z_1 = [[3, 8, 9],
[5, 2, 3],
[6, 4, 7],]
z_2 = [[7, 2, 3],
[8, 2, 2],
[9, 5, 3]]
Assuming that y starts with zero and is not sorted but grouped, what is the most efficient way to do this?
Note: This question is the unsorted version of this question:
Split a NumPy array into subarrays according to the values (sorted in ascending order) of another array
One way to solve this is to build up a list of filter indexes for each y value and then simply select those elements of x. For example:
z_0 = x[[i for i, v in enumerate(y) if v == 0]]
z_1 = x[[i for i, v in enumerate(y) if v == 1]]
z_2 = x[[i for i, v in enumerate(y) if v == 2]]
Output
array([[ 1, 2, 8],
[ 2, 9, 1],
[ 4, 3, 5],
[10, 2, 3],
[11, 2, 4]])
array([[3, 8, 9],
[5, 2, 3],
[6, 4, 7]])
array([[7, 2, 3],
[8, 2, 2],
[9, 5, 3]])
If you want to be more generic and support different sets of numbers in y, you could use a comprehension to produce a list of arrays e.g.
z = [x[[i for i, v in enumerate(y) if v == m]] for m in set(y)]
Output:
[array([[ 1, 2, 8],
[ 2, 9, 1],
[ 4, 3, 5],
[10, 2, 3],
[11, 2, 4]]),
array([[3, 8, 9],
[5, 2, 3],
[6, 4, 7]]),
array([[7, 2, 3],
[8, 2, 2],
[9, 5, 3]])]
If y is also an np.array and the same length as x you can simplify this to use boolean indexing:
z = [x[y==m] for m in set(y)]
Output is the same as above.
Just use list comprehension and boolean indexing
x = np.array(x)
y = np.array(y)
z = [x[y == i] for i in range(y.max() + 1)]
z
Out[]:
[array([[ 1, 2, 8],
[ 2, 9, 1],
[ 4, 3, 5],
[10, 2, 3],
[11, 2, 4]]),
array([[3, 8, 9],
[5, 2, 3],
[6, 4, 7]]),
array([[7, 2, 3],
[8, 2, 2],
[9, 5, 3]])]
Slight variation.
from operator import itemgetter
label = itemgetter(1)
Associate the implied information with the label ... (index,label)
y1 = [thing for thing in enumerate(y)]
Sort on the label
y1.sort(key=label)
Group by label and construct the results
import itertools
d = {}
for key,group in itertools.groupby(y1,label):
d[f'z{key}'] = [x[i] for i,k in group]
Pandas solution:
>>> import pandas as pd
>>> >>> df = pd.DataFrame({'points':[thing for thing in x],'cat':y})
>>> z = df.groupby('cat').agg(list)
>>> z
points
cat
0 [[1, 2, 8], [2, 9, 1], [4, 3, 5], [10, 2, 3], ...
1 [[3, 8, 9], [5, 2, 3], [6, 4, 7]]
2 [[7, 2, 3], [8, 2, 2], [9, 5, 3]]

Get the list of all possible numpy array column deletions

Given the following numpy array:
>>> a = np.arange(9).reshape((3, 3))
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
How can get the list of all possible column deletions? So in this case:
array([[[1, 2],
[4, 5],
[7, 8]],
[[0, 2],
[3, 5],
[6, 8]],
[[0, 1],
[3, 4],
[6, 7]]])
You can use itertools.combinations:
>>> from itertools import combinations
>>> np.array([a[:, list(comb)] for comb in combinations(range(a.shape[1]), r=2)])
array([[[0, 1],
[3, 4],
[6, 7]],
[[0, 2],
[3, 5],
[6, 8]],
[[1, 2],
[4, 5],
[7, 8]]])
Alternatively you can create a list of needed column indices first and then use integer array indexing to pick up the required columns from the original array:
r = range(a.shape[1])
cols = [[j for j in r if i != j] for i in r]
cols
# [[1, 2], [0, 2], [0, 1]]
a[:, cols].swapaxes(0, 1)
#[[[1 2]
# [4 5]
# [7 8]]
#
# [[0 2]
# [3 5]
# [6 8]]
#
# [[0 1]
# [3 4]
# [6 7]]]

Extract a subset of data from numpy array

I have a 2D numpy array that I need to extract a subset of data from where the value of the 2nd column is higher than a certain value. What's the best way to do this?
E.g. given the array:
array1 = [[1, 5], [2, 6], [3, 7], [4, 8]]
I would want to extract all rows where the 2nd column was higher than 6, so I'd get:
[3, 7], [4, 8]
Or, even more simply:
a[a[:,1] > 6]
Output:
array([[3, 7], [4, 8]])
Where a is the array.
Use numpy.where:
import numpy as np
a = np.array([[1, 5], [2, 6], [3, 7], [4, 8]])
# all elements where the second item it greater than 6:
print(a[np.where(a[:, 1] > 6)])
# output: [[3 7], [4 8]]
Use list comprehension:
array1 = [[1, 5], [2, 6], [3, 7], [4, 8]]
threshold = 6
print([elem for elem in array1 if elem[1] > threshold])
# [[3, 7], [4, 8]]
Or using numpy:
import numpy as np
array1 = np.array(array1)
print(array1[array1[:,1] > 6])
# array([[3, 7], [4, 8]])

Pairing up all possible objects in a list

Consider the following code:
list_example = [1,2,3,4,5,6,7,8,9]
List_of_ball_permutations = []
for i in list_example :
for j in list_example:
if j>i:
List_of_ball_permutations.append([i,j])
This will result in a list being formed as follows:
[[1, 2],
[1, 3],
[1, 4],
[1, 5],
[1, 6],
[1, 7],
[1, 8],
[1, 9],
[2, 3],
[2, 4],
[2, 5],
[2, 6],
[2, 7],
[2, 8],
[2, 9],
[3, 4],
[3, 5],
[3, 6],
[3, 7],
[3, 8],
[3, 9],
[4, 5],
[4, 6],
[4, 7],
[4, 8],
[4, 9],
[5, 6],
[5, 7],
[5, 8],
[5, 9],
[6, 7],
[6, 8],
[6, 9],
[7, 8],
[7, 9],
[8, 9]]
Whereby each number is paired with another number in the list and no repeats i.e. if [1,2] exists then [2,1] will not be created also pairs with two of the same numbers e.g. [1,1] will not be created either.
However now consider a list of objects whereby I would like to pair each object with one other object (not itself and no repeats) in a similar fashion as the numbers were. For some reason my code does not allow me to do that as it presents a message '>' not supported between instances of 'Ball' and 'Ball'. (The class I created was called Ball which generated the objects).
Any help to resolve this issue would be very much appreciated.
Of course, itertools is the proper "pythonic" solution:
import itertools
list(itertools.combinations(["a", "b", "c"], 2))
However, you have the correct idea, you can generate all the indices of the objects to be paired, and retrieve them:
def get_pairs(n):
for i in range(n) :
for j in range(i+1, n):
yield (i, j)
def get_objects_pairs(objects):
for first, second in get_pairs(len(objects)):
yield objects[first], objects[second]
objects = ['a', 'ball', 'toothbrush']
for pair in (get_objects_pairs(objects)):
print(pair)
output:
('a', 'ball')
('a', 'toothbrush')
('ball', 'toothbrush')

Categories

Resources