I want to find a concise way to sample n consecutive elements with stride m from a numpy array. The simplest case is with sampling 1 element with stride 2, which means getting every other element in a list, which can be done like this:
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[::2]
array([0, 2, 4, 6, 8])
However, what if I wanted to slice n consecutive elements with a stride of m where n and m can be any integers? For example, if I wanted to slice 2 consecutive elements with a stride of 3 I would get something like this:
array([0, 1, 3, 4, 6, 7, 9])
Is there a pythonic and concise way of doing this? Thank you!
If a is long enough you could reshape, slice, and ravel
a.reshape(-1,3)[:,:2].ravel()
But a has to be (9,) or (12,). And the result will still be a copy.
The suggested:
np.lib.stride_tricks.as_strided(a, (4,2), (8*3, 8)).ravel()[:-1]
is also a copy. The as_strided part is a view, but ravel will make a copy. And there is the ugliness of that extra element.
sliding_window_view was added as a safer version:
In [81]: np.lib.stride_tricks.sliding_window_view(a,(3))
Out[81]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
In [82]: np.lib.stride_tricks.sliding_window_view(a,(3))[::3,:2]
Out[82]:
array([[0, 1],
[3, 4],
[6, 7]])
Again ravel will make a copy. This omits the "extra" 9.
np.resize does a reshape with padding (repeating a as needed):
In [83]: np.resize(a, (4,3))
Out[83]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8],
[9, 0, 1]])
In [84]: np.resize(a, (4,3))[:,:2]
Out[84]:
array([[0, 1],
[3, 4],
[6, 7],
[9, 0]])
This code might be useful, I tested it on the example in the question (n=2, m=3)
import numpy as np
def get_slice(arr, n, m):
b = np.array([])
for i in range(0, len(arr), m):
b = np.concatenate((b, arr[i:i + n]))
return b
sliced_arr = get_slice(np.arange(10), n=2, m=3)
print(sliced_arr)
Output
[0. 1. 3. 4. 6. 7. 9.]
Related
So I've created a numpy array:
import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
I'm trying to delete the end element of this array's subarray:
a[0] = (a[0])[:-1]
And encounter this issue:
a[0] = (a[0])[:-1]
ValueError: could not broadcast input array from shape (2) into shape (3)
Why can't I change it ?
How do I do it?
Given:
>>> a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
You can do:
>>> a[:,0:2]
array([[1, 2],
[4, 5],
[7, 8]])
Or:
>>> np.delete(a,2,1)
array([[1, 2],
[4, 5],
[7, 8]])
Then in either case, assign that back to a since the result is a new array.
So:
>>> a=a[:,0:2]
>>> a
array([[1, 2],
[4, 5],
[7, 8]])
If you wanted only to delete 3 in the first row, that is a different problem. You can only do that if you have have an array of python lists since the sublists are not the same length.
Example:
>>> a = np.array([[1,2],[4,5,6],[7,8,9]])
>>> a
array([list([1, 2]), list([4, 5, 6]), list([7, 8, 9])], dtype=object)
If you do that, just stick to Python. You will have lost all the speed and other advantages of Numpy.
If by 'universal' you mean the last element of each row of a N x M array, just use .shape to find the dimensions:
>>> a
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> a.shape
(3, 4)
>>> np.delete(a,a.shape[1]-1,1)
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11]])
Or,
>>> a[:,0:a.shape[1]-1]
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11]])
>>> a = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> type(a)
<class 'numpy.ndarray'>
>>> a.shape
(3, 3)
The variable a is matrix (2D array). It has certain number of rows and columns. In a matrix all the rows must be of same length. As so, in the above example, the matrix cannot be formed if the first row has length 2 and others 3. So deleting the last element of only the first(or any other subset) sub-array is not possible.
Instead you have to delete the last element of all the sub-arrays at the same time.
That can be done as
>>> a[:,0:2]
array([[1, 2],
[4, 5],
[7, 8]])
Or,
>>> np.delete(a,2,1)
array([[1, 2],
[4, 5],
[7, 8]])
This also applies to the elements of other positions. Deleting can be done of any element of the sub-arrays keeping in mind that all the sub-arrays should have same length.
However you can manipulate the last element(or any other) of any sub-array unless the shape remains constant.
>>> a[0][-1] = 19
>>> a
array([[ 1, 2, 19],
[ 4, 5, 6],
[ 7, 8, 9]])
In case you try to form a matrix with rows of unequal length, a 1D array of lists is formed on which no Numpy operations like vector processing, slicing, etc. works (the list operation works)
>>> b = np.array([[1,2,3],[1,2,3]])
>>> c = np.array([[1,2],[1,2,3]])
>>> b
array([[1, 2, 3],
[1, 2, 3]])
>>> b.shape
(2, 3)
>>> c
array([list([1, 2]), list([1, 2, 3])], dtype=object)
>>> c.shape
(2,)
>>> print(type(b),type(c))
<class 'numpy.ndarray'> <class 'numpy.ndarray'>
Both are ndarray, but you can see the second variable c has is a 1D array of lists.
>>> b+b
array([[2, 4, 6],
[2, 4, 6]])
>>> c+c
array([list([1, 2, 1, 2]), list([1, 2, 3, 1, 2, 3])], dtype=object)
Similarly, b+b operation performs the element-wise addition of b with b, but c+c performs the concatenation operation among the two lists.
For Further Ref
How to make a multidimension numpy array with a varying row size?
Here is how:
import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
a = a[:-1]
print(a)
Output:
[[1 2 3]
[4 5 6]]
a = np.array([[0, 1, 2, 3], [4, 5, 6, 7]], dtype=int)
b = np.array([[8], [9]], dtype=int)
result wanted:
alist = [[0, 1, 2, 3, 8], [4, 5, 6, 7, 9]] # as np.array
I tried:
np.concatenate(alist,blist)
np.concatenate((alist,blist))
np.concatenate(alist, blist[0])
for a,b in zip(alist,blist): np.concatenate(a,b)
alist = [*map(np.concatenate, alist, blist)])
This got me various error messages I tried to fix by using the next trial. Nothing worked so far.
You are just missing the axis=1 keyword argument.
np.concatenate((a, b), axis=1)
Normally np.concatenate works on axis 0 (going down the array). But in this case you want to concatenate along axis 1 (going across the array). See the glossary for more information.
You can achieve this by using np.hstack, this will concatenate the two arrays, but at the second axis.
a = np.array([[0, 1, 2, 3], [4, 5, 6, 7]], dtype=int)
b = np.array([[8], [9]], dtype=int)
>>> np.hstack((a,b))
array([[0, 1, 2, 3, 8],
[4, 5, 6, 7, 9]])
I have two arrays, values and indexes
>>> values
array([[5, 4, 2, 4, 6],
[7, 9, 7, 3, 6]])
>>> indexes
array([[2, 4],
[0, 3],
[0, 1],
[1, 3]])
What i would like is a fast way (as my arrays are very large) to get, for each value of values the sum of the elements corresponding to all index collections that are in indexes.
I.e I want, for the first value [5, 4, 2, 4, 6] to get
>>> values[0][indexes.flatten()].reshape(indexes.shape)
array([[2, 6],
[5, 4],
[5, 4],
[4, 4]])
>>> values[0][indexes.flatten()].reshape(indexes.shape).sum(axis=1)
array([8, 9, 9, 8])
using this technique and looping over all values is the fastest I could come up with. Is there a better way? Thank you in advance for your time.
Approach #1
Simply index into columns and sum along the last axis -
values[:,indexes].sum(axis=-1)
Sample run -
In [39]: values
Out[39]:
array([[5, 4, 2, 4, 6],
[7, 9, 7, 3, 6]])
In [40]: indexes
Out[40]:
array([[2, 4],
[0, 3],
[0, 1],
[1, 3]])
In [41]: values[:,indexes].sum(axis=-1)
Out[41]:
array([[ 8, 9, 9, 8],
[13, 10, 16, 12]])
Approach #2
If there are no duplicates in each row of indexes, we can simply use matrix-multiplication to get the sum-reductions and this would be much faster -
m,n = indexes.shape[0], values.shape[1]
mask = np.zeros((n,m),dtype=bool) # faster with float dtype
mask[indexes, np.arange(m)[:,None]] = 1
out = values.dot(mask)
I'm having a bit of a difficulty. I'm trying to vectorize some code in python in order to make it faster. I have an array which I sort (A) and get the index list (Ind). I have another array (B) which I would like to sort by the index list, without using loops which I think bottlenecks the computation.
A = array([[2, 1, 9],
[1, 1, 5],
[7, 4, 1]])
Ind = np.argsort(A)
This is the result of Ind:
Ind = array([[1, 0, 2],
[0, 1, 2],
[2, 1, 0]], dtype=int64)
B is the array i would like to sort by Ind:
B = array([[ 6, 3, 9],
[ 1, 5, 3],
[ 2, 7, 13]])
I would like to use Ind to rearrange my elements in B as such (B rows sorted by A rows indexes):
B = array([[ 3, 6, 9],
[ 1, 5, 3],
[13, 7, 2]])
Any Ideas? I would be glad to get any good suggestion. I want to mention I am using millions of values, I mean arrays of 30000*5000.
Cheers,
Robert
I would do something like this:
import numpy as np
from numpy import array
A = array([[2, 1, 9],
[1, 1, 5],
[7, 4, 1]])
Ind = np.argsort(A)
B = array([[ 3, 6, 9],
[ 1, 5, 3],
[13, 7, 2]])
# an array of the same shape as A and B with row numbers for each element
rownums = np.tile(np.arange(3), (3, 1)).T
new_B = np.take(B, rownums * 3 + Ind)
print(new_B)
# [[ 6 3 9]
# [ 1 5 3]
# [ 2 7 13]]
You can replace the magic number 3 with the array shape.
I want to extract the second and the 3rd to the fifth columns of the NumPy array, how would I go about it?
A = array([[0, 1, 2, 3, 4, 5, 6], [4, 5, 6, 7, 4, 5, 6]])
A[:, [1, 4:6]]
This obviously doesn't work.
Assuming I've understood you -- it's usually a good idea to explicitly specify the output you want, because it's not obvious -- you could use numpy.r_:
In [27]: A
Out[27]:
array([[0, 1, 2, 3, 4, 5, 6],
[4, 5, 6, 7, 4, 5, 6]])
In [28]: A[:, [1,3,4,5]]
Out[28]:
array([[1, 3, 4, 5],
[5, 7, 4, 5]])
In [29]: A[:, r_[1, 3:6]]
Out[29]:
array([[1, 3, 4, 5],
[5, 7, 4, 5]])
In [37]: A[1:, r_[1, 3:6]]
Out[37]: array([[5, 7, 4, 5]])
which you can then flatten or reshape as you like. r_ is basically a convenience function to generate the right indices, e.g.
In [30]: r_[1, 3:6]
Out[30]: array([1, 3, 4, 5])
Perhaps you are looking for this?
In [10]: A[1:, [1]+range(3,6)]
Out[10]: array([[5, 7, 4, 5]])
Note this gives you the second, fourth, fifth and six columns of all rows but the first.
The second element is A[:,1]. Elements 3-5 (I'm assuming you want inclusive) are A[:,2:5]. You won't be able to extract them with a single call. To get them as an array, you could do
import numpy as np
A = np.array([[0, 1, 2, 3, 4, 5, 6], [4, 5, 6, 7, 4, 5, 6]])
my_cols = np.hstack((A[:,1][...,np.newaxis], A[:,2:5]))
The np.newaxis stuff is just to make A[:,1] a 2D array, consistent with A[:,2:5].
Hope this helps.