Slice an array into segments - python

Suppose I have an array [1,2,3,4,5,6,7,8], and the array is composed of two samples [1,2,3,4], and [5,6,7,8]. For each sample, I want to do a slicing window with window size n. And if there are not enough elements, pad the result with the last elements. Each row in the return value should be the sliced window starting from the element in that row.
For example:
if n=3, then the result should be:
[[1,2,3],
[2,3,4],
[3,4,4],
[4,4,4],
[5,6,7],
[6,7,8],
[7,8,8],
[8,8,8]]
How can I achieve this with efficient slicing instead of a for loop? Thanks.

Similar approach of #hpaulj using some numpy built-in functionalities
import numpy as np
samples = [[1,2,3,4],[5,6,7,8]]
ws = 3 #window size
# add padding
samples = [s + [s[-1]]*(ws-1) for s in samples]
# rolling window function for arrays
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1]-window+1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
result = sum([rolling_window(np.array(s), ws).tolist() for s in samples ], [])
result
[[1, 2, 3],
[2, 3, 4],
[3, 4, 4],
[4, 4, 4],
[5, 6, 7],
[6, 7, 8],
[7, 8, 8],
[8, 8, 8]]

A python list approach:
In [201]: order = [1,3,2,3,5,8]
In [202]: samples = [[1,2,3,4],[5,6,7,8]]
expand samples to take care of the padding issue:
In [203]: samples = [row+([row[-1]]*n) for row in samples]
In [204]: samples
Out[204]: [[1, 2, 3, 4, 4, 4, 4], [5, 6, 7, 8, 8, 8, 8]]
define a function:
def foo(i, samples):
for row in samples:
try:
j = row.index(i)
except ValueError:
continue
return row[j:j+n]
In [207]: foo(3,samples)
Out[207]: [3, 4, 4]
In [208]: foo(9,samples) # non-found case isn't handled well
for all the order elements:
In [209]: [foo(i,samples) for i in order]
Out[209]: [[1, 2, 3], [3, 4, 4], [2, 3, 4], [3, 4, 4], [5, 6, 7], [8, 8, 8]]

I have a simple oneliner :
import numpy as np
samples = np.array([[1,2,3,4],[5,6,7,8]])
n,d = samples.shape
ws = 3
result = samples[:,np.minimum(np.arange(d)[:,None]+np.arange(ws)[None,:],d-1)]
The output is :
No loop, only broadcasting. This makes it probably the most efficient way of doing it. The dimension of the output is not exactly what you asked for, but it is easy to correct with a simple np.reshape
[[[1 2 3]
[2 3 4]
[3 4 4]
[4 4 4]]
[[5 6 7]
[6 7 8]
[7 8 8]
[8 8 8]]]

Related

Cross-reference between numpy arrays

I have a 1d array of ids, for example:
a = [1, 3, 4, 7, 9]
Then another 2d array:
b = [[1, 4, 7, 9], [3, 7, 9, 1]]
I would like to have a third array with the same shape of b where each item is the index of the corresponding item from a, that is:
c = [[0, 2, 3, 4], [1, 3, 4, 0]]
What's a vectorized way to do that using numpy?
this may not make sense but ... you can use np.interp to do that ...
a = [1, 3, 4, 7, 9]
sorting = np.argsort(a)
positions = np.arange(0,len(a))
xp = np.array(a)[sorting]
fp = positions[sorting]
b = [[1, 4, 7, 9], [3, 7, 9, 1]]
c = np.rint(np.interp(b,xp,fp)) # rint is better than astype(int) because floats are tricky.
# but astype(int) should work faster for small len(a) but not recommended.
this should work as long as the len(a) is smaller than the largest representable int by float (16,777,217) .... and this algorithm is of O(n*log(n)) speed, (or rather len(b)*log(len(a)) to be precise)
Effectively, this solution is a one-liner. The only catch is that you need to reshape the array before you do the one-liner, and then reshape it back again:
import numpy as np
a = np.array([1, 3, 4, 7, 9])
b = np.array([[1, 4, 7, 9], [3, 7, 9, 1]])
original_shape = b.shape
c = np.where(b.reshape(b.size, 1) == a)[1]
c = c.reshape(original_shape)
This results with:
[[0 2 3 4]
[1 3 4 0]]
Broadcasting to the rescue!
>>> ((np.arange(1, len(a) + 1)[:, None, None]) * (a[:, None, None] == b)).sum(axis=0) - 1
array([[0, 2, 3, 4],
[1, 3, 4, 0]])

Use numpy to stack combinations of a 1D and 2D array

I have 2 numpy arrays, one 2D and the other 1D, for example like this:
import numpy as np
a = np.array(
[
[1, 2],
[3, 4],
[5, 6]
]
)
b = np.array(
[7, 8, 9, 10]
)
I want to get all possible combinations of the elements in a and b, treating a like a 1D array, so that it leaves the rows in a intact, but also joins the rows in a with the items in b. It would look something like this:
>>> combine1d(a, b)
[ [1 2 7] [1 2 8] [1 2 9] [1 2 10]
[3 4 7] [3 4 8] [3 4 9] [3 4 10]
[5 6 7] [5 6 8] [5 6 9] [5 6 10] ]
I know that there are slow solutions for this (like a for loop), but I need a fast solution to this as I am working with datasets with millions of integers.
Any ideas?
This is one of those cases where it's easier to build a higher dimensional object, and then fix the axes when you're done. The first two dimensions are the length of b and the length of a. The third dimension is the number of elements in each row of a plus 1. We can then use broadcasting to fill in this array.
x, y = a.shape
z, = b.shape
result = np.empty((z, x, y + 1))
result[...,:y] = a
result[...,y] = b[:,None]
At this point, to get the exact answer you asked for, you'll need to swap the first two axes, and then merge those two axes into a single axis.
result.swapaxes(0, 1).reshape(-1, y + 1)
An hour later. . . .
I realized by being a little bit more clever, I didn't need to swap axes. This also has the nice benefit that the result is a contiguous array.
def convert1d(a, b):
x, y = a.shape
z, = b.shape
result = np.empty((x, z, y + 1))
result[...,:y] = a[:,None,:]
result[...,y] = b
return result.reshape(-1, y + 1)
this is very "scotch tape" solution:
import numpy as np
a = np.array(
[
[1, 2],
[3, 4],
[5, 6]
]
)
b = np.array(
[7, 8, 9, 10]
)
z = []
for x in b:
for y in a:
z.append(np.append(y, x))
np.array(z).reshape(3, 4, 3)
You need to use np.c_ to attach to join two dataframe. I also used np.full to generate a column of second array (b). The result are like what follows:
result = [np.c_[a, np.full((a.shape[0],1), x)] for x in b]
result
Output
[array([[1, 2, 7],
[3, 4, 7],
[5, 6, 7]]),
array([[1, 2, 8],
[3, 4, 8],
[5, 6, 8]]),
array([[1, 2, 9],
[3, 4, 9],
[5, 6, 9]]),
array([[ 1, 2, 10],
[ 3, 4, 10],
[ 5, 6, 10]])]
The output might be kind of messy. But it's exactly like what you mentioned as your desired output. To make sure, you cun run below to see what comes from the first element in the result array:
print(result[0])
Output
array([[1, 2, 7],
[3, 4, 7],
[5, 6, 7]])

Comparing three numpy arrays so wherever one is not a given value, the other are not that given value either

Is there an effective way in which to compare all three numpy arrays at once?
For example, if the given value to check is 5, wherever the value is not 5, it should be not 5 for all three arrays.
The only way I've thought of how to do this would be checking that occurrences that arr1 != 5 & arr2 == 5 is 0. However this only checks one direction between the two arrays, and then I need to also incorporate arr3. This seems inefficient and might end up with some logical hole.
This should pass:
arr1 = numpy.array([[1, 7, 3],
[4, 5, 6],
[4, 5, 2]])
arr2 = numpy.array([[1, 2, 3],
[4, 5, 6],
[8, 5, 6]])
arr3 = numpy.array([[1, 1, 3],
[4, 5, 6],
[9, 5, 6]])
However this should fail due to arr2 having a 3 where other arrays have 5s
arr1 = numpy.array([[1, 2, 3],
[8, 5, 6],
[4, 5, 6]])
arr2 = numpy.array([[1, 2, 3],
[2, 3, 1],
[2, 5, 6]])
arr3 = numpy.array([[1, 2, 3],
[4, 5, 6],
[4, 5, 3]])
There is a general solution (regardless number of arrays). And it's quite educational:
import numpy as np #a recommended way of import
arr = np.array([arr1, arr2, arr3])
is_valid = np.all(arr==5, axis=0) == np.any(arr==5, axis=0) #introduce axis
out = np.all(is_valid)
#True for the first case, False for the second one
Is this a valid solution?
numpy.logical_and(((arr1==5)==(arr2==5)).all(), ((arr2==5)==(arr3==5)).all())
You could AND all comparisons to 5 and compare to any one of the comparisons:
A = (arr1==5)
(A==(A&(arr2==5)&(arr3==5))).all()
Output: True for the first example, False for the second
NB. This works for any number of arrays

how to get the index of the subarray in pytorch?

a and b are torch tensor No repeating elements
a shape is [n,2] like:
[[1,2]
[2,3]
[4,6]
...]
b is[m,2] like:
[[1,2]
[4,6]
....
]
how to get the index of b in a, example:
a = [[1,2]
[2,4]
[6,7]
]
b = [[1,2]
[6,7]]
the index should be (0,3), we can use gpu,
I can think of the following trick that can work for you.
Since we have two tensors with different numbers of rows (n and m), first we transform them into the same shape (m x n x 2) and then subtract. If two rows match, then after subtraction, the entire row will be zero. Then, we need to identify the indices of those rows.
n = a.shape[0] # 3
m = b.shape[0] # 2
_a = a.unsqueeze(0).repeat(m, 1, 1) # m x n x 2
_b = b.unsqueeze(1).repeat(1, n, 1) # m x n x 2
match = (_a - _b).sum(-1) # m x n
indices = (match == 0).nonzero()
if indices.nelement() > 0: # empty tensor check
row_indices = indices[:, 1]
else:
row_indices = []
print(row_indices)
Sample Input/Output
Example 1
a = torch.tensor([[1, 2], [2, 4], [6, 7]])
b = torch.tensor([[1, 2], [6, 7]])
tensor([0, 2])
Example 2
a = torch.tensor([[1, 2], [2, 4], [6, 7]])
b = torch.tensor([[1, 3], [6, 7]])
tensor([2])
Example 3
a = torch.tensor([[1, 2], [2, 4], [6, 7]])
b = torch.tensor([[1, 2], [6, 5], [8, 9]])
tensor([0])
Example 4
a = torch.tensor([[1, 2], [2, 4], [6, 7]])
b = torch.tensor([[1, 3], [6, 5], [8, 9]])
[]
Here #jpp 's, numpy solution is almost your answer after this
You just need to get indices using nonzero and flatten tensor using flatten to get expected shape.
a = torch.tensor([[1, 2], [2, 4], [6, 7]])
b = torch.tensor([[1, 2], [6, 7]])
(a[:, None] == b).all(-1).any(-1).nonzero().flatten()
tensor([0, 2])

Search Numpy array with multiple values

I have numpy 2d array having duplicate values.
I am searching the array like this.
In [104]: import numpy as np
In [105]: array = np.array
In [106]: a = array([[1, 2, 3],
...: [1, 2, 3],
...: [2, 5, 6],
...: [3, 8, 9],
...: [4, 8, 9],
...: [4, 2, 3],
...: [5, 2, 3])
In [107]: num_list = [1, 4, 5]
In [108]: for i in num_list :
...: print(a[np.where(a[:,0] == num_list)])
...:
[[1 2 3]
[1 2 3]]
[[4 8 9]
[4 2 3]]
[[5 2 3]]
The input is list having number similar to column 0 values.
The end result I want is the resulting rows in any format like array, list or tuple for example
array([[1, 2, 3],
[1, 2, 3],
[4, 8, 9],
[4, 2, 3],
[5, 2, 3]])
My code works fine but doesn't seem pythonic. Is there any better searching strategy with multiple values?
like a[np.where(a[:,0] == l)] where only one time lookup is done to get all the values.
my real array is large
Approach #1 : Using np.in1d -
a[np.in1d(a[:,0], num_list)]
Approach #2 : Using np.searchsorted -
num_arr = np.sort(num_list) # Sort num_list and get as array
# Get indices of occurrences of first column in num_list
idx = np.searchsorted(num_arr, a[:,0])
# Take care of out of bounds cases
idx[idx==len(num_arr)] = 0
out = a[a[:,0] == num_arr[idx]]
You can do
a[numpy.in1d(a[:, 0], num_list), :]

Categories

Resources