Cartesian product of rows of a very big array - python

I have an array of size (100, 50). I need to generate an output array which represents a cartesian product of input array rows.
For simplification purposes, let's have an input array:
array([[2, 6, 5],
[7, 3, 6]])
As output I would like to have:
array([[2, 7],
[2, 3],
[2, 6],
[6, 7],
[6, 3],
[6, 6],
[5, 7],
[5, 3],
[5, 6]])
Note: itertools.product doesn't work here, because of the size of the input vector. Also all another similar answers, assumes number of rows smaller than 32, what is not the case here

This question has been asked many times, for example here.
The array of a size (100, 50) is too big and can't be handled by numpy. However, smaller array size might be solved.
Anyway, I prefer to use itertools for this kind of stuff:
import itertools
a = np.array([[2, 6, 5], [7, 3, 6]])
np.array(list(itertools.product(*a)))
array([[2, 7],
[2, 3],
[2, 6],
[6, 7],
[6, 3],
[6, 6],
[5, 7],
[5, 3],
[5, 6]])

a = np.array([[2, 6, 5],[7, 3, 6]])
out = np.array(np.meshgrid(a[0], a[1])).T.reshape(-1,2)
print(out)
"""
prints
[[2 7]
[2 3]
[2 6]
[6 7]
[6 3]
[6 6]
[5 7]
[5 3]
[5 6]]
"""

Related

Divide a 2d numpy in 3D according to a window of size w and a step p

I can do it with a loop but it takes me forever. Is there a way to do it without a loop or much faster? Here is my code explained. "data" is my 2D-array (M, N). "seq" is my window size (e.g., 40) and size = data.shape[0] = M.
X = list()
for j in range(size):
end_idx = j + seq
if end_idx >= size:
break
seq_x = data[j:end_idx, :]
X.append(seq_x)
final_data = np.array(X)
It will look like below:
data = [[0, 1]
[2, 3]
[3, 4]
[4, 5]
[5, 6]
[6, 7]
[7, 8]
[8, 9]
[9, 7]]
For a window of size w = 2 we have
res = [[[0, 1]
[2, 3]]
[[2, 3]
[3, 4]]
[[3, 4]
[4, 5]]
...
[[8, 9]
[9, 7]]]
Is any one as an idea of how to do it so that it can be executed quickly?
import numpy as np
data = np.array([[0, 1],
[2, 3],
[3, 4],
[4, 5],
[5, 6],
[6, 7],
[7, 8],
[8, 9],
[9, 7]])
w = 2
window_width = data.shape[1]
out = np.lib.stride_tricks.sliding_window_view(data, window_shape=(w, window_width)).squeeze()
out:
array([[[0, 1],
[2, 3]],
[[2, 3],
[3, 4]],
...
[[7, 8],
[8, 9]],
[[8, 9],
[9, 7]]])

Picking rows from two NumPy arrays at random

Starting from two numpy arrays:
A = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
B = np.array([[9, 8], [8, 7], [7, 6], [6, 5]])
I would like to create a new array C picking, for each index, one row from the same index but randomly from A or B. The idea is that at each index of random_selector, if the value is higher than 0.1, then we chose the same-index row from A, otherwise, the same-index row from B.
random_selector = np.random.random(size=len(A))
C = np.where(random_selector > .1, A, B)
# example of desired result picking rows from respectively A, B, B, A:
# [[1, 2], [8, 7], [7, 6], [4, 5]]
Running the above code, however, produces the following error:
ValueError: operands could not be broadcast together with shapes (4,) (4,2) (4,2)
Try adding a new dimension:
import numpy as np
A = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
B = np.array([[9, 8], [8, 7], [7, 6], [6, 5]])
random_selector = np.random.random(size=len(A))
C = np.where((random_selector > .1)[:, None], A, B)
print(C)
Output (of a single run)
[[1 2]
[8 7]
[3 4]
[4 5]]

Get the list of all possible numpy array column deletions

Given the following numpy array:
>>> a = np.arange(9).reshape((3, 3))
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
How can get the list of all possible column deletions? So in this case:
array([[[1, 2],
[4, 5],
[7, 8]],
[[0, 2],
[3, 5],
[6, 8]],
[[0, 1],
[3, 4],
[6, 7]]])
You can use itertools.combinations:
>>> from itertools import combinations
>>> np.array([a[:, list(comb)] for comb in combinations(range(a.shape[1]), r=2)])
array([[[0, 1],
[3, 4],
[6, 7]],
[[0, 2],
[3, 5],
[6, 8]],
[[1, 2],
[4, 5],
[7, 8]]])
Alternatively you can create a list of needed column indices first and then use integer array indexing to pick up the required columns from the original array:
r = range(a.shape[1])
cols = [[j for j in r if i != j] for i in r]
cols
# [[1, 2], [0, 2], [0, 1]]
a[:, cols].swapaxes(0, 1)
#[[[1 2]
# [4 5]
# [7 8]]
#
# [[0 2]
# [3 5]
# [6 8]]
#
# [[0 1]
# [3 4]
# [6 7]]]

Extract a subset of data from numpy array

I have a 2D numpy array that I need to extract a subset of data from where the value of the 2nd column is higher than a certain value. What's the best way to do this?
E.g. given the array:
array1 = [[1, 5], [2, 6], [3, 7], [4, 8]]
I would want to extract all rows where the 2nd column was higher than 6, so I'd get:
[3, 7], [4, 8]
Or, even more simply:
a[a[:,1] > 6]
Output:
array([[3, 7], [4, 8]])
Where a is the array.
Use numpy.where:
import numpy as np
a = np.array([[1, 5], [2, 6], [3, 7], [4, 8]])
# all elements where the second item it greater than 6:
print(a[np.where(a[:, 1] > 6)])
# output: [[3 7], [4 8]]
Use list comprehension:
array1 = [[1, 5], [2, 6], [3, 7], [4, 8]]
threshold = 6
print([elem for elem in array1 if elem[1] > threshold])
# [[3, 7], [4, 8]]
Or using numpy:
import numpy as np
array1 = np.array(array1)
print(array1[array1[:,1] > 6])
# array([[3, 7], [4, 8]])

Filling an array with arrays or vectors in python using numpy without a loop

I'm trying to find a way to fill an array with rows of values. It's much easier to express my desired output with an example. Given the input of an N x M matrix, array1,
array1 = np.array([[2, 3, 4],
[4, 8, 3],
[7, 6, 3]])
I would like to output an array of arrays in which each row is an N x N consisting of the values from the respective row. The output would be
[[[2, 3, 4],
[2, 3, 4],
[2, 3, 4]],
[[4, 8, 3],
[4, 8, 3],
[4, 8, 3]],
[[7, 6, 3],
[7, 6, 3],
[7, 6, 3]]]
You can reshape the array from 2d to 3d, then use numpy.repeat() along the desired axis:
np.repeat(array1[:, None, :], 3, axis=1)
#array([[[2, 3, 4],
# [2, 3, 4],
# [2, 3, 4]],
# [[4, 8, 3],
# [4, 8, 3],
# [4, 8, 3]],
# [[7, 6, 3],
# [7, 6, 3],
# [7, 6, 3]]])
Or equivalently you can use numpy.tile:
np.tile(array1[:, None, :], (1,3,1))
Another solution which is sometimes useful is the following
out = np.empty((3,3,3), dtype=array1.dtype)
out[...] = array1[:, None, :]

Categories

Resources