I want to make a list of integer sequences with random start points. The way I would do this in pure python is
x = np.zeros(1000, 10) # 1000 sequences of 10 elements each
starts = np.random.randint(1, 1000, 1000)
for i in range(len(x)):
x[i] = np.arange(starts[i], starts[i] + 10)
I wonder if there is a more elegant way of doing this using Numpy functionality.
You can use broadcasting after extending starts to a 2D version and adding in the 1D range array, like so -
x = starts[:,None] + np.arange(10)
Explanation
Let's take a small example for starts to see what that broadcasting does in this case.
In [382]: starts
Out[382]: array([3, 1, 3, 2])
In [383]: starts.shape
Out[383]: (4,)
In [384]: starts[:,None]
Out[384]:
array([[3],
[1],
[3],
[2]])
In [385]: starts[:,None].shape
Out[385]: (4, 1)
In [386]: np.arange(10).shape
Out[386]: (10,)
Thus, looking at the shapes and putting those together, a schematic diagram of the same would look something like this -
starts : 4
np.arange(10) : 10
After extending starts :
starts[:,None] : 4 x 1
np.arange(10) : 10
Thus, when we add starts[:,None] with np.arange(10), the elems of starts[:,None] would be broadcasted along its second axis 10 times corresponding to the length of the other array along that axis. For np.arange(10), it would be converted to 2D with its first dim being a singleton dim and its elems being broadcasted along it 4 times correspoinding to the length of 4 for the other array starts[:,None] along that axis. Please note that there aren't explicit replications, as under the hood the elems are broadcasted and added on the fly.
Thus, functionally we would have the replications, like so -
In [391]: np.repeat(starts[:,None],10,axis=1)
Out[391]:
array([[3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]])
In [392]: np.repeat(np.arange(10)[None],4,axis=0)
Out[392]:
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
These broadcasted elems are then added to give us the desired output x.
Related
Say I have some time-series data in the form of a simple array.
X1 = np.array[(1, 2, 3, 4]
The Hankel matrix can be obtained by using scipy.linalg.hankel, which would look something like this:
hankel(X1)
array([[1, 2, 3, 4],
[2, 3, 4, 0],
[3, 4, 0, 0],
[4, 0, 0, 0]])
Now assume I had a larger array in the form of
X2 = np.array([1, 2, 3, 4, 5, 6, 7])
What I want to do is fill in the zeros in this matrix with the numbers that are next in the index (specific to each row). Taking the same Hankel matrix earlier by using the first four values in the array X2, I'd like to see the following output:
hankel(X2[:4])
array([[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6],
[4, 5, 6, 7]])
How would I do this? I'd ideally like to use this for larger data.
Appreciate any tips or pointers given. Thanks!
If you have a matrix with the appropriate index values into your dataset, you can use integer array indexing directly into your dataset.
To create the index matrix, you can simply use the upper-left quadrant of a double-sized Hankel array. There are likely simpler ways to create the index matrix, but this does the trick.
>>> X = np.array([9, 8, 7, 6, 5, 4, 3])
>>> N = 4 # the size of the "window"
>>> indices = scipy.linalg.hankel(np.arange(N*2))[:N, :N]
>>> indices
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]])
>>> X[indices]
array([[9, 8, 7, 6],
[8, 7, 6, 5],
[7, 6, 5, 4],
[6, 5, 4, 3]])
import torch
import numpy as np
a = torch.tensor([[1, 4], [2, 5],[3, 6]])
bb=a.detach().numpy()
b = a.view(6).detach().numpy()
Element b is like:
[1 4 2 5 3 6]
How do I reshape back to the following:
[1 2 3 4 5 6]
This is just an example, want some generic answers, even 3D.
In Pytorch you can use reshape and permute as in this example:
Import torch
a = torch.randn((3,3,2))
b = a.permute(2,0,1).reshape(-1)
a
tensor([[[ 0.2372, 0.5550],
[ 0.7700, -0.3693],
[-0.4151, 0.6247]],
[[ 1.2179, 0.6992],
[ 0.5033, 1.6290],
[-1.2165, -0.4180]],
[[ 0.3189, 0.3208],
[ 0.3894, 2.5544],
[-1.3069, -0.6905]]])
b
tensor([ 0.2372, 0.7700, -0.4151, 1.2179, 0.5033, -1.2165, 0.3189, 0.3894,
-1.3069, 0.5550, -0.3693, 0.6247, 0.6992, 1.6290, -0.4180, 0.3208,
2.5544, -0.6905])
I think this solves the problem.
If you want to remain in PyTorch, you can view b in a's shape, then apply a transpose and flatten:
>>> b.view(-1,2).T.flatten()
tensor([1, 2, 3, 4, 5, 6])
In the 3D case, you can perform similar manipulations using torch.transpose which enables you to swap two axes. You get the desired result by combining it with torch.view:
First case (extra dimension last):
>>> b = a.view(-1, 1).expand(-1,3).flatten()
tensor([1, 1, 1, 4, 4, 4, 2, 2, 2, 5, 5, 5, 3, 3, 3, 6, 6, 6])
>>> b.view(-1,2,3).transpose(0,1).flatten()
tensor([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6])
Second case (extra dimension first):
>>> b = a.view(1,-1).expand(3,-1).flatten()
tensor([1, 4, 2, 5, 3, 6, 1, 4, 2, 5, 3, 6, 1, 4, 2, 5, 3, 6])
>>> b.view(3,-1).T.view(-1,2,3).transpose(0,1).flatten()
tensor([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6])
I can't help with the torch step, but starting with a numpy array:
In [70]: a=np.array([[1, 4], [2, 5],[3, 6]])
In [71]: a
Out[71]:
array([[1, 4],
[2, 5],
[3, 6]])
In [72]: a.ravel() # can also use reshape
Out[72]: array([1, 4, 2, 5, 3, 6])
To get a column major copy:
In [73]: a.ravel(order='F')
Out[73]: array([1, 2, 3, 4, 5, 6])
In [74]: a.T.ravel()
Out[74]: array([1, 2, 3, 4, 5, 6])
the transpose:
In [79]: a.T
Out[79]:
array([[1, 2, 3],
[4, 5, 6]])
For 3d arrays, you can use transpose with an order parameter.
Consider an array, M, made up of pairs of elements. (I've used spaces to emphasize that we will be dealing with element PAIRS). The actual arrays will have a large number of rows, and 4,6,8 or 10 columns.
import numpy as np
M = np.array([[1,3, 2,1, 4,2, 3,3],
[3,5, 6,9, 5,1, 3,4],
[1,3, 2,4, 3,4, 7,2],
[4,5, 1,2, 2,1, 2,3],
[6,4, 4,1, 6,1, 4,7],
[6,7, 7,6, 9,7, 6,2],
[5,3, 1,5, 3,3, 3,3]])
PROBLEM: I want to eliminate rows from M having an element pair that has no common elements with any of the other pairs in that row.
In array M, the 2nd row and the 4th row should be eliminated. Here's why:
2nd row: the pair (6,9) has no common element with (3,5), (5,1), or (3,4)
4th row: the pair (4,5) has no common element with (1,2), (2,1), or (2,3)
I'm sure there's a nice broadcasting solution, but I can't see it.
This is a broadcasting solution. Hope it's self-explained:
a = M.reshape(M.shape[0],-1,2)
mask = ~np.eye(a.shape[1], dtype=bool)[...,None]
is_valid = (((a[...,None,:]==a[:,None,...])&mask).any(axis=(-1,-2))
|((a[...,None,:]==a[:,None,:,::-1])&mask).any(axis=(-1,-2))
).all(-1)
M[is_valid]
Output:
array([[1, 3, 2, 1, 4, 2, 3, 3],
[1, 3, 2, 4, 3, 4, 7, 2],
[6, 4, 4, 1, 6, 1, 4, 7],
[6, 7, 7, 6, 9, 7, 6, 2],
[5, 3, 1, 5, 3, 3, 3, 3]])
Another way of solving this would be the following -
M = np.array([[1,3, 2,1, 4,2, 3,3],
[3,5, 6,9, 5,1, 3,4],
[1,3, 2,4, 3,4, 7,2],
[4,5, 1,2, 2,1, 2,3],
[6,4, 4,1, 6,1, 4,7],
[6,7, 7,6, 9,7, 6,2],
[5,3, 1,5, 3,3, 3,3]])
MM = M.reshape(M.shape[0],-1,2)
matches_M = np.any(MM[:,:,None,:,None] == MM[:,None,:,None,:], axis=(-1,-2))
mask = ~np.eye(MM.shape[1], dtype=bool)[None,:]
is_valid = np.all(np.any(matches_M&mask, axis=-1), axis=-1)
M[is_valid]
array([[1, 3, 2, 1, 4, 2, 3, 3],
[1, 3, 2, 4, 3, 4, 7, 2],
[6, 4, 4, 1, 6, 1, 4, 7],
[6, 7, 7, 6, 9, 7, 6, 2],
[5, 3, 1, 5, 3, 3, 3, 3]])
(short version of my question: In numpy, is there an elegant way of emulating tf.sequence_mask from tensorflow?)
I have a 2d array a (each row represents a sequence of different length). Next, there is a 1d array b (representing sequence lengths). Is there an elegant way to get a (flattened) array that would contain only such elements of a that belong to the sequences as specified by their length b:
a = np.array([
[1, 2, 3, 2, 1], # I want just [:3] from this row
[4, 5, 5, 5, 1], # [:2] from this row
[6, 7, 8, 9, 0] # [:4] from this row
])
b = np.array([3,2,4]) # 3 elements from the 1st row, 2 from the 2nd, 4 from the 4th row
the desired result:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
By elegant way I mean something that avoids loops.
Use broadcasting to create a mask of the same shape as the 2D array and then simply mask and extract valid elements -
a[b[:,None] > np.arange(a.shape[1])]
Sample run -
In [360]: a
Out[360]:
array([[1, 2, 3, 2, 1],
[4, 5, 5, 5, 1],
[6, 7, 8, 9, 0]])
In [361]: b
Out[361]: array([3, 2, 4])
In [362]: a[b[:,None] > np.arange(a.shape[1])]
Out[362]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
I work with large data sets in my research.
I need to duplicate an element in a Numpy array. The code below achieves this, but is there a function in Numpy that performs the operation in a more efficient manner?
"""
Example output
>>> (executing file "example.py")
Choose a number between 1 and 10:
2
Choose number of repetitions:
9
Your output array is:
[1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>>
"""
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = int(input('Choose the number you want to repeat (1-10):\n'))
repetitions = int(input('Choose number of repetitions:\n'))
output = []
for i in range(len(x)):
if x[i] != y:
output.append(x[i])
else:
for j in range(repetitions):
output.append(x[i])
print('Your output array is:\n', output)
One approach would be to find the index of the element to be repeated with np.searchsorted. Use that index to slice the left and right sides of the array and insert the repeated array in between.
Thus, one solution would be -
idx = np.searchsorted(x,y)
out = np.concatenate(( x[:idx], np.repeat(y, repetitions), x[idx+1:] ))
Let's consider a bit more generic sample case with x as -
x = [2, 4, 5, 6, 7, 8, 9, 10]
Let the number to be repeated is y = 5 and repetitions = 7.
Now, use the proposed codes -
In [57]: idx = np.searchsorted(x,y)
In [58]: idx
Out[58]: 2
In [59]: np.concatenate(( x[:idx], np.repeat(y, repetitions), x[idx+1:] ))
Out[59]: array([ 2, 4, 5, 5, 5, 5, 5, 5, 5, 6, 7, 8, 9, 10])
For the specific case of x always being [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], we would have a more compact/elegant solution, like so -
np.r_[x[:y-1], [y]*repetitions, x[y:]]
There is the numpy.repeat function:
>>> np.repeat(3, 4)
array([3, 3, 3, 3])
>>> x = np.array([[1,2],[3,4]])
>>> np.repeat(x, 2)
array([1, 1, 2, 2, 3, 3, 4, 4])
>>> np.repeat(x, 3, axis=1)
array([[1, 1, 1, 2, 2, 2],
[3, 3, 3, 4, 4, 4]])
>>> np.repeat(x, [1, 2], axis=0)
array([[1, 2],
[3, 4],
[3, 4]])