Numpy operation to expand array into sequential slices of given length? - python

my_function must expand a 1D numpy array to a 2D numpy array, with the 2nd axis containing the slices of length starting from the first index until the end. Example:
import numpy as np
a = np.arange(10)
print (my_function(a, length=3))
Expected output
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
I can achieve this using a for loop, but I was wondering if there is a numpy vectorization technique for this.
def my_function(a, length):
b = np.zeros((len(a)-(length-1), length))
for i in range(len(b)):
b[i] = a[i:i+length]
return b

If you're careful with the math and heed the warningin the docs, you can use np.lib.stride_tricks.as_strided(). You need to calculate the correct dimensions for your array so you don't overflow. Also note that as_strided() shares memory, so you will multiple references to the same memory in the final output. (You can of course, copy this to a new array).
>> import numpy as np
>> def my_function(a, length):
stride = a.strides[0]
l = len(a) - length + 1
return np.lib.stride_tricks.as_strided(a, (l, length), (stride,stride) )
>> np.array(my_function(np.arange(10), 3))
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
>> np.array(my_function(np.arange(15), 7))
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 1, 2, 3, 4, 5, 6, 7],
[ 2, 3, 4, 5, 6, 7, 8],
[ 3, 4, 5, 6, 7, 8, 9],
[ 4, 5, 6, 7, 8, 9, 10],
[ 5, 6, 7, 8, 9, 10, 11],
[ 6, 7, 8, 9, 10, 11, 12],
[ 7, 8, 9, 10, 11, 12, 13],
[ 8, 9, 10, 11, 12, 13, 14]])

How about this function?
import numpy as np
def my_function(a, length):
result = []
for i in range(length):
result.append(a + i)
return np.vstack(result).T[:len(a) - length + 1]
a = np.arange(10)
length = 3
my_function(a, length)

Related

Python - how to resize an array and duplicate the elements

I've got an array with data like this
a = [[1,2,3],[4,5,6],[7,8,9]]
and I want to change it to
b = [[1,1,2,2,3,3],[1,1,2,2,3,3],[4,4,5,5,6,6],[4,4,5,5,6,6],[7,7,8,8,9,9],[7,7,8,8,9,9]]
I've tried to use numpy.resize() function but after resizing, it gives [[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]]. I can use a for loop to put the numbers at the indexes I need but just wondering if there is any easier way of doing that?
To visualise the task, here is the original array
This is what I want
My initial though was that np.tile would work but in fact what you are looking for is np.repeat twice on two different axes.
Try this runnable example!
#!/usr/bin/env python
import numpy as np
a = [[1,2,3],[4,5,6],[7,8,9]]
b = np.repeat(np.repeat(a, 2, axis=1), 2, axis=0)
b
<script src="https://modularizer.github.io/pyprez/pyprez.min.js"></script>
You can think of your problem as resizing each 1x1 block to a 2x2 block. This can simply be done using numpy.kron(a, b), which operates on each element of a – each 1x1 block – and "expands" it according to b – which should thus be a 2x2 block.
>>> import numpy as np
>>> a = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> np.kron(a, [[1, 1], [1, 1]])
array([[1, 1, 2, 2, 3, 3],
[1, 1, 2, 2, 3, 3],
[4, 4, 5, 5, 6, 6],
[4, 4, 5, 5, 6, 6],
[7, 7, 8, 8, 9, 9],
[7, 7, 8, 8, 9, 9]])
An efficient way to create the second operand for larger structures is using np.ones and related functions.
>>> np.kron(a, np.ones((2,4), dtype=int))
array([[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
[4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6],
[4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6],
[7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9],
[7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9]])

Is there a better method to create such a numpy array?

I want a numpy array like this:
b = np.array([[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7],
[8, 8, 8, 8, 8, 8],
[9, 9, 9, 9, 9, 9]])
Is there a faster way to create a NumPy array like this instead of typing them manually?
You can do something like this:
>>> np.repeat(np.arange(1, 10).reshape(-1,1), 6, axis=1)
array([[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7],
[8, 8, 8, 8, 8, 8],
[9, 9, 9, 9, 9, 9]])
Explanation:
np.arange(1, 10).reshape(-1,1) creates an array
array([[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9]])
np.repeat(_, 6, axis=1) repeats this 6 times on the first (or second in human words) axis.
Yes. There are plenty of methods. This is one:
np.repeat(np.arange(1,10),6,axis=0).reshape(9,6)
Another method is to use broadcasting:
>>> np.arange(1,10)[:,None] * np.ones(6, dtype=int)
array([[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7],
[8, 8, 8, 8, 8, 8],
[9, 9, 9, 9, 9, 9]])
For any w*l size, convert a list of lists into an np.array like so:
w = 6
l = 9
[np.array([[1+i]*w for i in range(d)])
array([[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7],
[8, 8, 8, 8, 8, 8],
[9, 9, 9, 9, 9, 9]])
np.transpose(np.array(([np.arange(1,10)] * 6)))
np.arange(1,10) creates an numpy array from 1 to 9.
[] puts the array into a list.
*6 augments the array 6 times.
np.array() converts the resulting structure (list of arrays) to a numpy array
np.transpose() rotates the orientation of the numpy array to get vertical one.

Numpy: replace each element in a row by the maximum of other elements in the same row

Let say we have a 2-D array like this:
>>> a
array([[1, 1, 2],
[0, 2, 2],
[2, 2, 0],
[0, 2, 0]])
For each line I want to replace each element by the maximum of the 2 others in the same line.
I've found how to do it for each column separately, using numpy.amax and an identity array, like this:
>>> np.amax(a*(1-np.eye(3)[0]), axis=1)
array([ 2., 2., 2., 2.])
>>> np.amax(a*(1-np.eye(3)[1]), axis=1)
array([ 2., 2., 2., 0.])
>>> np.amax(a*(1-np.eye(3)[2]), axis=1)
array([ 1., 2., 2., 2.])
But I would like to know if there is a way to avoid a for loop and get directly the result which in this case should look like this:
>>> numpy_magic(a)
array([[2, 2, 1],
[2, 2, 2],
[2, 2, 2],
[2, 0, 2]])
Edit: after a few hours playing in the console, I've finally come up with the solution I was looking for. Be ready for some mind blowing one line code:
np.amax(a[[range(a.shape[0])]*a.shape[1],:][(np.eye(a.shape[1]) == 0)[:,[range(a.shape[1])*a.shape[0]]].reshape(a.shape[1],a.shape[0],a.shape[1])].reshape((a.shape[1],a.shape[0],a.shape[1]-1)),axis=2).transpose()
array([[2, 2, 1],
[2, 2, 2],
[2, 2, 2],
[2, 0, 2]])
Edit2: Paul has suggested a much more readable and faster alternative which is:
np.max(a[:, np.where(~np.identity(a.shape[1], dtype=bool))[1].reshape(a.shape[1], -1)], axis=-1)
After timing these 3 alternatives, both Paul's solutions are 4 times faster in every contexts (I've benchmarked for 2, 3 and 4 columns with 200 rows). Congratulations for these amazing pieces of code!
Last Edit (sorry): after replacing np.identity with np.eye which is faster, we now have the fastest and most concise solution:
np.max(a[:, np.where(~np.eye(a.shape[1], dtype=bool))[1].reshape(a.shape[1], -1)], axis=-1)
Here are two solutions, one that is specifically designed for max and a more general one that works for other operations as well.
Using the fact that all except possibly one maximums in each row are the maximum of the entire row, we can use argpartition to cheaply find the indices of the largest two elements. Then in the position of the largest we put the value of the second largest and everywhere else the largest value. Works also for more than 3 columns.
>>> a
array([[6, 0, 8, 8, 0, 4, 4, 5],
[3, 1, 5, 0, 9, 0, 3, 6],
[1, 6, 8, 3, 4, 7, 3, 7],
[2, 1, 6, 2, 9, 1, 8, 9],
[7, 3, 9, 5, 3, 7, 4, 3],
[3, 4, 3, 5, 8, 2, 2, 4],
[4, 1, 7, 9, 2, 5, 9, 6],
[5, 6, 8, 5, 5, 3, 3, 3]])
>>>
>>> M, N = a.shape
>>> result = np.empty_like(a)
>>> largest_two = np.argpartition(a, N-2, axis=-1)
>>> rng = np.arange(M)
>>> result[...] = a[rng, largest_two[:, -1], None]
>>> result[rng, largest_two[:, -1]] = a[rng, largest_two[:, -2]]>>>
>>> result
array([[8, 8, 8, 8, 8, 8, 8, 8],
[9, 9, 9, 9, 6, 9, 9, 9],
[8, 8, 7, 8, 8, 8, 8, 8],
[9, 9, 9, 9, 9, 9, 9, 9],
[9, 9, 7, 9, 9, 9, 9, 9],
[8, 8, 8, 8, 5, 8, 8, 8],
[9, 9, 9, 9, 9, 9, 9, 9],
[8, 8, 6, 8, 8, 8, 8, 8]])
This solution depends on specific properties of max.
A more general solution that for example also works for sum instead of max would be. Glue two copies of a together (side-by-side, not on top of each other). So the rows are something like a0 a1 a2 a3 a0 a1 a2 a3. For an index x we can get all but ax by slicing [x+1:x+4]. To do this vectorized we use stride_tricks:
>>> a
array([[2, 6, 0],
[5, 0, 0],
[5, 0, 9],
[6, 4, 4],
[5, 0, 8],
[1, 7, 5],
[9, 7, 7],
[4, 4, 3]])
>>> M, N = a.shape
>>> aa = np.c_[a, a]
>>> ast = np.lib.stride_tricks.as_strided(aa, (M, N+1, N-1), aa.strides + aa.strides[1:])
>>> result = np.max(ast[:, 1:, :], axis=-1)
>>> result
array([[6, 2, 6],
[0, 5, 5],
[9, 9, 5],
[4, 6, 6],
[8, 8, 5],
[7, 5, 7],
[7, 9, 9],
[4, 4, 4]])
# use sum instead of max
>>> result = np.sum(ast[:, 1:, :], axis=-1)
>>> result
array([[ 6, 2, 8],
[ 0, 5, 5],
[ 9, 14, 5],
[ 8, 10, 10],
[ 8, 13, 5],
[12, 6, 8],
[14, 16, 16],
[ 7, 7, 8]])
List comprehension solution.
np.array([np.amax(a * (1 - np.eye(3)[j]), axis=1) for j in range(a.shape[1])]).T
Similar to #Ethan's answer but with np.delete(), np.max(), and np.dstack():
np.dstack([np.max(np.delete(a, i, 1), axis=1) for i in range(a.shape[1])])
array([[2, 2, 1],
[2, 2, 2],
[2, 2, 2],
[2, 0, 2]])
delete() "filters" out each column successively;
max() finds the row-wise maximum of the remaining two columns
dstack() stacks the resulting 1d arrays
If you have more than 3 columns, note that this will find the maximum of "all other" columns rather than the "2-greatest" columns per row. For example:
a2 = np.arange(25).reshape(5,5)
np.dstack([np.max(np.delete(a2, i, 1), axis=1) for i in range(a2.shape[1])])
array([[[ 4, 4, 4, 4, 3],
[ 9, 9, 9, 9, 8],
[14, 14, 14, 14, 13],
[19, 19, 19, 19, 18],
[24, 24, 24, 24, 23]]])

Stack slices of numpy array from given indices

I'm struggling to perform the below operation on a numpy vector.
I want to take previous_n samples from vector finishing at indices.
It's like I want to perform a np.take with slicing of the previous_n samples.
Example:
import numpy as np
vector = np.array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
# number of previous samples
previous_n = 3
indices = np.array([ 5, 7, 12])
result
array([[ 3, 4, 5],
[ 5, 6, 7],
[10, 11, 12]])
Ok, this seems to do what I want. Found here
def stack_slices(arr, previous_n, indices):
all_idx = indices[:, None] + np.arange(previous_n) - (previous_n - 1)
return arr[all_idx]
>>> stack_slices(vector, 3, indices)
array([[ 3, 4, 5],
[ 5, 6, 7],
[10, 11, 12]])

Better way to get sublist in python

I am working on the following problem:
This function returns a list of all possible sublists in L of length n without skipping elements in L. The sublists in the returned list should be ordered in the way they appear in L, with those sublists starting from a smaller index being at the front of the list.
Example 1, if L = [10, 4, 6, 8, 3, 4, 5, 7, 7, 2] and n = 4 then your function should return the list [[10, 4, 6, 8], [4, 6, 8, 3], [6, 8, 3, 4], [8, 3, 4, 5], [3, 4, 5, 7], [4, 5, 7, 7], [5, 7, 7, 2]]
My solution works but how can I make it shorter? What is a better way to do this?
def getSublists(L, n):
newN = n
myList = []
for i in range(len(L)):
orginalLen = L[i:n]
if(len(orginalLen) == n):
myList.append(L[i:n])
n = n + 1
else:
myList.append(L[i:n])
n = n + 1
if(newN == 1):
print(myList)
else:
print(myList[:len(myList)-(n-1)])
getSublists([10, 4, 6, 8, 3, 4, 5, 7, 7, 2],4)
getSublists([1], 1)
getSublists([0, 0, 0, 0, 0], 2)
OUTPUT
[[10, 4, 6, 8], [4, 6, 8, 3], [6, 8, 3, 4], [8, 3, 4, 5], [3, 4, 5, 7], [4, 5, 7, 7], [5, 7, 7, 2]]
[[1]]
[[0, 0], [0, 0], [0, 0], [0, 0]]
l = [1,2,3,4,5,6,87,9]
n = ..
print [l[i:i+n] for i in range(len(l)-n+1)]
maybe you need.
In one line:
get_sublists = lambda ls, n: [ls[x:x+n] for x in range(len(ls)-n+1)]
get_sublists([10, 4, 6, 8, 3, 4, 5, 7, 7, 2], 4)
[[10, 4, 6, 8], [4, 6, 8, 3], [6, 8, 3, 4], [8, 3, 4, 5], [3, 4, 5, 7], [4, 5, 7, 7], [5, 7, 7, 2]]
def get_sublists(L, n):
return [ L[i:i+n] for i in range(len(L)-n) ]
I completed the program a little better understanding of the reader.
def getSublists(L, n):
new_list = []
for i in range(len(L)-n+1):
a = L[i:i+n]
new_list.append(a)
return new_list
answer:
[[10, 4, 6, 8],
[4, 6, 8, 3],
[6, 8, 3, 4],
[8, 3, 4, 5],
[3, 4, 5, 7],
[4, 5, 7, 7],
[5, 7, 7, 2]]
This is pretty readable I think, to understand the concept. The idea here is to iterate through the numbers from 0 to the length of L, minus 4. And just take the sublist of L from your current index i, to i+4. Iterating to length-4 ensures you don't try to access an index out of bounds!
>>> for i in range(len(L)-4+1):
print L[i:i+4]
[10, 4, 6, 8]
[4, 6, 8, 3]
[6, 8, 3, 4]
[8, 3, 4, 5]
[3, 4, 5, 7]
[4, 5, 7, 7]
[5, 7, 7, 2]

Categories

Resources