Get the list of all possible numpy array column deletions - python

Given the following numpy array:
>>> a = np.arange(9).reshape((3, 3))
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
How can get the list of all possible column deletions? So in this case:
array([[[1, 2],
[4, 5],
[7, 8]],
[[0, 2],
[3, 5],
[6, 8]],
[[0, 1],
[3, 4],
[6, 7]]])

You can use itertools.combinations:
>>> from itertools import combinations
>>> np.array([a[:, list(comb)] for comb in combinations(range(a.shape[1]), r=2)])
array([[[0, 1],
[3, 4],
[6, 7]],
[[0, 2],
[3, 5],
[6, 8]],
[[1, 2],
[4, 5],
[7, 8]]])

Alternatively you can create a list of needed column indices first and then use integer array indexing to pick up the required columns from the original array:
r = range(a.shape[1])
cols = [[j for j in r if i != j] for i in r]
cols
# [[1, 2], [0, 2], [0, 1]]
a[:, cols].swapaxes(0, 1)
#[[[1 2]
# [4 5]
# [7 8]]
#
# [[0 2]
# [3 5]
# [6 8]]
#
# [[0 1]
# [3 4]
# [6 7]]]

Related

Divide a 2d numpy in 3D according to a window of size w and a step p

I can do it with a loop but it takes me forever. Is there a way to do it without a loop or much faster? Here is my code explained. "data" is my 2D-array (M, N). "seq" is my window size (e.g., 40) and size = data.shape[0] = M.
X = list()
for j in range(size):
end_idx = j + seq
if end_idx >= size:
break
seq_x = data[j:end_idx, :]
X.append(seq_x)
final_data = np.array(X)
It will look like below:
data = [[0, 1]
[2, 3]
[3, 4]
[4, 5]
[5, 6]
[6, 7]
[7, 8]
[8, 9]
[9, 7]]
For a window of size w = 2 we have
res = [[[0, 1]
[2, 3]]
[[2, 3]
[3, 4]]
[[3, 4]
[4, 5]]
...
[[8, 9]
[9, 7]]]
Is any one as an idea of how to do it so that it can be executed quickly?
import numpy as np
data = np.array([[0, 1],
[2, 3],
[3, 4],
[4, 5],
[5, 6],
[6, 7],
[7, 8],
[8, 9],
[9, 7]])
w = 2
window_width = data.shape[1]
out = np.lib.stride_tricks.sliding_window_view(data, window_shape=(w, window_width)).squeeze()
out:
array([[[0, 1],
[2, 3]],
[[2, 3],
[3, 4]],
...
[[7, 8],
[8, 9]],
[[8, 9],
[9, 7]]])

How can I sort the lists in the list?

I'd like to know how to sort the lists in the list. However, I don't want to align by key. I'd like to change it according to the following method.
arr = [[2, 3], [5, 1], [4, 1], [5, 3], [4, 2]]
# solution...
I_want_arr = [[2, 3], [1, 5], [1, 4], [3, 5], [2, 4]]
i tried it
for i in arr:
i.sort()
but, it didn't work
using list comprehenstion:
arr = [[2, 3], [5, 1], [4, 1], [5, 3], [4, 2]]
sorted_output = [sorted(l) for l in arr]
using map():
sorted_output = list(map(sorted, arr))
#Gabip's solution includes this and a more time efficient one, check that out first!
How about
arr = [[2, 3], [5, 1], [4, 1], [5, 3], [4, 2]]
I_want_arr = [sorted(x) for x in arr]
This outputs
[[2, 3], [1, 5], [1, 4], [3, 5], [2, 4]]

row-wise Cartesian product between a 1d array and a 2d array

I think I'm missing something obvious. I want to find a cartesian product of arr1 (a 1d numpy array), and the ROWS of arr2 (a 2d numpy array). So, if arr1 has 4 elements and arr2 has shape (5,2), the output should have shape (20,3). (see below)
import numpy as np
arr1 = np.array([1, 4, 7, 3])
arr2 = np.array([[0, 1],
[2, 3],
[4, 5],
[4, 0],
[9, 9]])
The desired output is:
arr3 = np.array([[1, 0, 1],
[1, 2, 3],
[1, 4, 5],
[1, 4, 0],
[1, 9, 9],
[4, 0, 1],
[4, 2, 3],
[4, 4, 5],
[4, 4, 0],
[4, 9, 9],
[7, 0, 1],
[7, 2, 3],
[7, 4, 5],
[7, 4, 0],
[7, 9, 9],
[3, 0, 1],
[3, 2, 3],
[3, 4, 5],
[3, 4, 0],
[3, 9, 9]])
I've been trying to use transpose and reshape with code like np.array(np.meshgrid(arr1,arr2)), but no success yet.
I'm hoping the solution can be generalized because I also need to deal with situations like this: Get all combinations of the ROWS of a 2d (10,2) array and the ROWS of a 2d array (20, 5) to get an output array (200,7).
Here is a vectorized solution that works for your general case as well:
arr1 = np.array([[1, 4],
[7, 3]])
arr2 = np.array([[0, 1],
[2, 3],
[4, 5],
[4, 0],
[9, 9]])
np.hstack((np.repeat(arr1,len(arr2),0),np.stack((arr2,)*len(arr1)).reshape(-1,arr2.shape[1])))
output of shape (2,2)*(5,2)->(10,4):
[[1 4 0 1]
[1 4 2 3]
[1 4 4 5]
[1 4 4 0]
[1 4 9 9]
[7 3 0 1]
[7 3 2 3]
[7 3 4 5]
[7 3 4 0]
[7 3 9 9]]
You can use hstack to add columns to arr2, and vstack to get the final array.
np.vstack(np.apply_along_axis(lambda x: np.hstack([np.repeat(x[0], arr2.shape[0]).reshape(-1, 1),
arr2]),
1,
arr1[:, None]))
I think this should do it:
import numpy as np
arr0 = np.array([1, 4, 7, 3])
arr1 = np.reshape(arr0, (len(arr0),1))
arr2 = np.array([[0, 1],
[2, 3],
[4, 5],
[4, 0],
[9, 9]])
r1,c1 = arr1.shape
r2,c2 = arr2.shape
arrOut = np.zeros((r1,r2,c1+c2), dtype=arr1.dtype)
arrOut[:,:,:c1] = arr1[:,None,:]
arrOut[:,:,c1:] = arr2
arrOut.reshape(-1,c1+c2)
The output is:
array([[1, 0, 1],
[1, 2, 3],
[1, 4, 5],
[1, 4, 0],
[1, 9, 9],
[4, 0, 1],
[4, 2, 3],
[4, 4, 5],
[4, 4, 0],
[4, 9, 9],
[7, 0, 1],
[7, 2, 3],
[7, 4, 5],
[7, 4, 0],
[7, 9, 9],
[3, 0, 1],
[3, 2, 3],
[3, 4, 5],
[3, 4, 0],
[3, 9, 9]])

Numpy: Check for duplicates in first column and keep row with highest value [duplicate]

I have a large n x 2 numpy array that is formatted as (x, y) coordinates. I would like to filter this array so as to:
Identify coordinate pairs with duplicated x-values.
Keep only the coordinate pair of those duplicates with the highest y-value.
For example, in the following array:
arr = [[1, 4]
[1, 8]
[2, 3]
[4, 6]
[4, 2]
[5, 1]
[5, 2]
[5, 6]]
I would like the result to be:
arr = [[1, 8]
[2, 3]
[4, 6]
[5, 6]]
Ive explored np.unique and np.where but cannot figure out how to leverage them to solve this problem. Thanks so much!
Here's one way based on np.maximum.reduceat -
def grouby_maxY(a):
b = a[a[:,0].argsort()] # if first col is already sorted, skip this
grp_idx = np.flatnonzero(np.r_[True,(b[:-1,0] != b[1:,0])])
grp_maxY = np.maximum.reduceat(b[:,1], grp_idx)
return np.c_[b[grp_idx,0], grp_maxY]
Alternatively, if you want to bring np.unique, we can use it to find grp_idx with np.unique(b[:,0], return_index=1)[1].
Sample run -
In [453]: np.random.seed(0)
In [454]: arr = np.random.randint(0,5,(10,2))
In [455]: arr
Out[455]:
array([[4, 0],
[3, 3],
[3, 1],
[3, 2],
[4, 0],
[0, 4],
[2, 1],
[0, 1],
[1, 0],
[1, 4]])
In [456]: grouby_maxY(arr)
Out[456]:
array([[0, 4],
[1, 4],
[2, 1],
[3, 3],
[4, 0]])

How to gather data in my case using gather_nd in tensorflow?

I need to gather some data from Tensor, I used gather_nd. Now code is above
import tensorflow as tf
indices = [[[0, 4], [0, 1], [0, 6], [0, 2]],
[[1, 1], [1, 4], [1, 0], [1, 9]],
[[2, 5], [2, 1], [2, 9], [2, 6]]]
params = [[4,6,3,6,7,8,4,5,3,8], [9,5,6,2,6,5,1,9,6,4], [4,6,6,1,3,2,6,7,1,8]]
output = tf.gather_nd(params, indices)
sess = tf.Session()
print sess.run(output)
The output is
[[7 6 4 3]
[5 6 9 4]
[2 6 8 6]]
Yep, that's what I want. I want to take out the values located at 4,1,6,2 in params[0]. They are 7, 6, 4, 3 because params[0][4] = 7, params[0][1] = 6, params[0][6] = 4, params[0][2] = 3.
However, tf.gather_nd only receives a indices like above. Now my raw_indices is like,
[[4, 1, 6, 2],
[1, 4, 0, 9],
[5, 1, 9, 6]]
How can I transfer the raw_indices to indices in tensorflow? Yes, I have to do this step in tensor graph since raw_indices is generated in the middle of the graph.
A mixture of tf.range() and some tiling seems to work:
def index_matrix_to_pairs(index_matrix):
replicated_first_indices = tf.tile(
tf.expand_dims(tf.range(tf.shape(index_matrix)[0]), dim=1),
[1, tf.shape(index_matrix)[1]])
return tf.pack([replicated_first_indices, index_matrix], axis=2)
start = [[4, 1, 6, 2],
[1, 4, 0, 9],
[5, 1, 9, 6]]
with tf.Session():
print(index_matrix_to_pairs(start).eval())
Gives:
[[[0 4]
[0 1]
[0 6]
[0 2]]
[[1 1]
[1 4]
[1 0]
[1 9]]
[[2 5]
[2 1]
[2 9]
[2 6]]]
It's just generating the first part of each pair with a tiled tf.range() op, then packing that with the specified indices.

Categories

Resources