Build a list of array without ',' with sliding window - python

I have an array a as follow:
import numpy as np
a= np.array([[1, 3, 5, 7, 8, 7, 1],
[11, 13, 51, 17, 18, 17, 10]])
I want to build a list of that array with a sliding window. Here is the output that I want:
I have using the following code, however it does not provide the output which I want:
lag = 3
out = []
for i in range(2):
eachrow =[]
for col in range(a.shape[1]-lag):
X_row = []
xtmp = a[i, col:col+lag]
X_row.append(xtmp)
ytmp = a[i, col+lag]
X_row.append(ytmp)
eachrow.append(X_row)
out.append(eachrow)
Any help appreciated. Thanks.

You can use numpy.lib.stride_tricks.sliding_window_view and numpy.apply_along_axis like below:
numpy.lib.stride_tricks.sliding_window_view is New in version 1.20.0. you need update your numpy
a = np.array([[1, 3, 5, 7, 8, 7, 1],[11, 13, 51, 17, 18, 17, 10]])
b = np.lib.stride_tricks.sliding_window_view(a.ravel(),4)
def create_array(row):
return np.array([row[:3],np.array(row[-1])], dtype=object)
c = np.apply_along_axis(create_array, 1, b)
print(c)
Output:
[[array([1, 3, 5]) array(7)]
[array([3, 5, 7]) array(8)]
[array([5, 7, 8]) array(7)]
[array([7, 8, 7]) array(1)]
[array([8, 7, 1]) array(11)]
[array([ 7, 1, 11]) array(13)]
[array([ 1, 11, 13]) array(51)]
[array([11, 13, 51]) array(17)]
[array([13, 51, 17]) array(18)]
[array([51, 17, 18]) array(17)]
[array([17, 18, 17]) array(10)]]

Your code produces:
In [3]: out
Out[3]:
[[[array([1, 3, 5]), 7],
[array([3, 5, 7]), 8],
[array([5, 7, 8]), 7],
[array([7, 8, 7]), 1]],
[[array([11, 13, 51]), 17],
[array([13, 51, 17]), 18],
[array([51, 17, 18]), 17],
[array([17, 18, 17]), 10]]]
That's a list of length 2. Within that lists.
If we make an array from that - with object dtype, we get a 3d array, where some elements are arrays, and some are integers:
In [6]: arr = np.array(out, object)
In [7]: arr
Out[7]:
array([[[array([1, 3, 5]), 7],
[array([3, 5, 7]), 8],
[array([5, 7, 8]), 7],
[array([7, 8, 7]), 1]],
[[array([11, 13, 51]), 17],
[array([13, 51, 17]), 18],
[array([51, 17, 18]), 17],
[array([17, 18, 17]), 10]]], dtype=object)
In [8]: arr.shape
Out[8]: (2, 4, 2)
Change one line of your code to
X_row.append(np.array([ytmp]))
In [11]: np.array(out,object)
Out[11]:
array([[[array([1, 3, 5]), array([7])],
[array([3, 5, 7]), array([8])],
[array([5, 7, 8]), array([7])],
[array([7, 8, 7]), array([1])]],
[[array([11, 13, 51]), array([17])],
[array([13, 51, 17]), array([18])],
[array([51, 17, 18]), array([17])],
[array([17, 18, 17]), array([10])]]], dtype=object)
or displayed with the str/print array formatting:
In [12]: print(_)
[[[array([1, 3, 5]) array([7])]
[array([3, 5, 7]) array([8])]
[array([5, 7, 8]) array([7])]
[array([7, 8, 7]) array([1])]]
[[array([11, 13, 51]) array([17])]
[array([13, 51, 17]) array([18])]
[array([51, 17, 18]) array([17])]
[array([17, 18, 17]) array([10])]]]
We could reshape that to a (8,2) array (still object dtype):
In [14]: print(Out[11].reshape(-1,2))
[[array([1, 3, 5]) array([7])]
[array([3, 5, 7]) array([8])]
[array([5, 7, 8]) array([7])]
[array([7, 8, 7]) array([1])]
[array([11, 13, 51]) array([17])]
[array([13, 51, 17]) array([18])]
[array([51, 17, 18]) array([17])]
[array([17, 18, 17]) array([10])]]
Since the inner most arrays have a mixed size - some 3 some 1, the result can only be object dtype - or list of lists. That's isn't optimal for array calculations.
Commas are part of the display, along with [] and words like array. Together they give us clues as to the underlying objects, whether they are lists or arrays. Equally important are the shape and dtype (if the object is an array) or length if a list.
===
Another answer uses a striding_tricks function. Here's that method in more detail. While x is a view, slicing and reshaping will make copies, so it's hard to say whether this is any faster. For this small example I bet your code is faster, but for larger case it might not be.
In [16]: np.lib.stride_tricks.sliding_window_view(a,(1,4))
Out[16]:
array([[[[ 1, 3, 5, 7]],
[[ 3, 5, 7, 8]],
[[ 5, 7, 8, 7]],
[[ 7, 8, 7, 1]]],
[[[11, 13, 51, 17]],
[[13, 51, 17, 18]],
[[51, 17, 18, 17]],
[[17, 18, 17, 10]]]])
In [17]: x = np.lib.stride_tricks.sliding_window_view(a,(1,4))
In [18]: x.shape
Out[18]: (2, 4, 1, 4)
That's a 4d view of the original 1d array.
Your size 3 'arrays' can be sliced from that:
In [19]: x[:,:,0,:3]
Out[19]:
array([[[ 1, 3, 5],
[ 3, 5, 7],
[ 5, 7, 8],
[ 7, 8, 7]],
[[11, 13, 51],
[13, 51, 17],
[51, 17, 18],
[17, 18, 17]]])
In [20]: x[:,:,0,:3].reshape(-1,3)
Out[20]:
array([[ 1, 3, 5],
[ 3, 5, 7],
[ 5, 7, 8],
[ 7, 8, 7],
[11, 13, 51],
[13, 51, 17],
[51, 17, 18],
[17, 18, 17]])
and the 1 element column:
In [21]: x[:,:,0,-1].reshape(-1,1)
Out[21]:
array([[ 7],
[ 8],
[ 7],
[ 1],
[17],
[18],
[17],
[10]])
These 2 arrays may be more useful than your object out.
The arrays shown in [14] could be split into 2 similar arrays:
In [27]: np.stack(arr.reshape(-1,2)[:,0])
Out[27]:
array([[ 1, 3, 5],
[ 3, 5, 7],
[ 5, 7, 8],
[ 7, 8, 7],
[11, 13, 51],
[13, 51, 17],
[51, 17, 18],
[17, 18, 17]])
In [28]: arr.reshape(-1,2)[:,1].astype(int)
Out[28]: array([ 7, 8, 7, 1, 17, 18, 17, 10])

There are two issues I see:
The last element in the window isn't enclosed in an np.array.
Then, a quick fix would be to change this line:
X_row.append(ytmp)
to
X_row.append(np.array([ytmp]))
which produces the desired output.
The result is two dimensions, since you create a separate sublist for each row in the array, and then append that sublist to the result. To resolve, change:
out.append(eachrow)
to
out.extend(eachrow)

You can use numpy.lib.stride_tricks.sliding_window_view for a fast, vectorized solution:
x = np.lib.stride_tricks.sliding_window_view(a, (1,3))[:, :-1]
x.shape = (*x.shape[:2], *x.shape[3:])
y = a[:, -x.shape[1]:, None]
Output:
>>> x
array([[[ 1, 3, 5],
[ 3, 5, 7],
[ 5, 7, 8],
[ 7, 8, 7]],
[[11, 13, 51],
[13, 51, 17],
[51, 17, 18],
[17, 18, 17]]])
>>> y
array([[[ 7],
[ 8],
[ 7],
[ 1]],
[[17],
[18],
[17],
[10]]])
Now, just use zip + list:
out = [list(zip(x[i], y[i])) for i in range(len(y))]
Output:
>>> out
[[(array([1, 3, 5]), array([7])),
(array([3, 5, 7]), array([8])),
(array([5, 7, 8]), array([7])),
(array([7, 8, 7]), array([1]))],
[(array([11, 13, 51]), array([17])),
(array([13, 51, 17]), array([18])),
(array([51, 17, 18]), array([17])),
(array([17, 18, 17]), array([10]))]]

Related

Python numpy array of numpy arrays as object

I have some np arrays. I want to concatenate them as objects in an np array.
coords1 = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
coords2 = np.array([[13, 14, 15, 16], [17, 18, 19, 20]])
I want to obtain coordsAll
coordsAll = [[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]],
[[13, 14, 15, 16], [17, 18, 19, 20]]]
This is my code:
coords1 = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
coords2 = np.array([[13, 14, 15, 16], [17, 18, 19, 20]])
coordsAll = np.empty(np.array(np.array((0, 4), int)), object)
coordsAll = np.append (coordsAll, coords1, axis=0)
coordsAll = np.append(coordsAll, coords2, axis=0)
coordsAll is now
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16], [17, 18, 19, 20]]
but i want two objects in my output array like
[[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]], [[13, 14, 15, 16], [17, 18, 19, 20]]]
Many thanks.
Maybe something like that:
coordsAll = np.array([coords2, coords1], dtype=object)
print(coordsAll)
print(coordsAll.dtype)
In [457]: coordsAll
Out[457]:
array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20]], dtype=object)
Your repeated use of np.append joins a (0,4) and (3,4) to make a (3,4), and then adds a (2,4), resulting in a (5,4). Specifying object dtype doesn't change that behavior. You might as well do:
In [458]: np.concatenate((coords1, coords2), axis=0)
Out[458]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20]])
Where concatenate takes a whole list of arrays. Repeated calls in a loop is inefficient. Plus you have to make that weird (0,4) array to start with. Modeling array operations on list ones is not a good idea.
The safest way to make an object array of a desired size, is to initial it, and then fill:
In [459]: res = np.empty(2, object)
In [460]: res
Out[460]: array([None, None], dtype=object)
In [461]: res[0] = coords1
In [462]: res[1] = coords2
In [463]: res
Out[463]:
array([array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]]), array([[13, 14, 15, 16],
[17, 18, 19, 20]])], dtype=object)
np.array with object dtype also works in this case, but may fail with other combinations of shapes:
In [464]: np.array((coords1, coords2), object)
Out[464]:
array([array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]]), array([[13, 14, 15, 16],
[17, 18, 19, 20]])], dtype=object)
arrays of the same shape produces a 3d array:
In [465]: np.array((coords1, coords1), object)
Out[465]:
array([[[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]],
[[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]]], dtype=object)
and (4,3) with (4,2) produces an error:
In [466]: np.array((coords1.T, coords2.T), object)
Traceback (most recent call last):
File "<ipython-input-466-dff6d2a13fa4>", line 1, in <module>
np.array((coords1.T, coords2.T), object)
ValueError: could not broadcast input array from shape (4,3) into shape (4,)
Keep in mind that your desired array is not a "normal" ndarray. It's much closer to the simple list [coords1, coords2], with few of multidimensional advantages of Out[458].

How do we keep specific dimension unchanged while reshaping in numpy?

I have a huge (N*20) matrix where every 5 rows is a valid sample, ie. every (5*20) matrix. I'm trying to reshape it into a (N/5,1,20,5) matrix where the dimension 20 is kept unchanged. I could do it in tensroflow using keep_dim, but how can I achieve this in numpy?
Thanks in advance.
Reshape and then swap the axes around:
arr1 = arr.reshape(N/5,5,1,20)
arr2 = arr1.transpose(0,2,3,1)
for example
In [476]: arr = np.arange(24).reshape(6,4)
In [477]: arr
Out[477]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])
In [478]: arr1 = arr.reshape(2,3,1,4)
In [479]: arr2 = arr1.transpose(0,2,3,1)
In [480]: arr2.shape
Out[480]: (2, 1, 4, 3)
In [482]: arr2
Out[482]:
array([[[[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]]],
[[[12, 16, 20],
[13, 17, 21],
[14, 18, 22],
[15, 19, 23]]]])

Different shapes for the 'same' parts of ndarray

I'm considering a numpy array:
import numpy as np
b = np.empty((10,11,12))
Now I would expect the following shapes to be the same, but they apparently are not:
>>> b[0,:,1].shape
>>> (11,)
and
>>> b[0][:][1].shape
>>> (12,)
Can somebody explain to me why the shapes are different? I read the Numpy documentation on indexing but there it says that writing a[k][l] is the same as a[k,l].
This happens because b[0][:] is a view of b[0], so that b[0][:][1] is really b[0, 1, :]. A numeric example may help to highlight what is going on:
In [5]: b = np.arange(3*4*5).reshape((3, 4, 5))
In [6]: b[0]
Out[6]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
In [7]: b[0, :]
Out[7]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
In [8]: b[0, :, 1]
Out[8]: array([ 1, 6, 11, 16])
In [10]: b[0][:]
Out[10]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
In [11]: b[0][:][1]
Out[11]: array([5, 6, 7, 8, 9])
In [13]: b[0, 1, :]
Out[13]: array([5, 6, 7, 8, 9])
In [32]: b[0][:, 1]
Out[32]: array([ 1, 6, 11, 16])

numpy reshaping multdimensional array through arbitrary axis

so this is a question regarding the use of reshape and how this functions uses each axis on a multidimensional scale.
Suppose I have the following array that contains matrices indexed by the first index. What I want to achieve is to instead index the columns of each matrix with the first index. In order to illustrate this problem, consider the following example where the given numpy array that indexes matrices with its first index is z.
x = np.arange(9).reshape((3, 3))
y = np.arange(9, 18).reshape((3, 3))
z = np.dstack((x, y)).T
Where z looks like:
array([[[ 0, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8]],
[[ 9, 12, 15],
[10, 13, 16],
[11, 14, 17]]])
And its shape is (2, 3, 3). Here, the first index are the two images and the three x three is a matrix.
The question more specifically phrased then, is how to use reshape to obtain the following desired output:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
Whose shape is (6, 3). This achieves that the dimension of the array indexes the columns of the matrix x and y as presented above. My natural inclination was to use reshape directly on z in the following way:
out = z.reshape(2 * 3, 3)
But its output is the following which indexes the rows of the matrices and not the columns:
array([[ 0, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8],
[ 9, 12, 15],
[10, 13, 16],
[11, 14, 17]]
Could reshape be used to obtain the desired output above? Or more general, can you control how each axis is used when you use the reshape function?
Two things:
I know how to solve the problem. I can go through each element of the big matrix (z) transposed and then apply reshape in the way above. This increases computation time a little bit and is not really problematic. But it does not generalize and it does not feel python. So I was wondering if there is a standard enlightened way of doing this.
I was not clear on how to phrase this question. If anyone has suggestion on how to better phrase this problem I am all ears.
Every array has a natural (1D flattened) order to its elements. When you reshape an array, it is as though it were flattened first (thus obtaining the natural order), and then reshaped:
In [54]: z.ravel()
Out[54]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [55]: z.ravel().reshape(2*3, 3)
Out[55]:
array([[ 0, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8],
[ 9, 12, 15],
[10, 13, 16],
[11, 14, 17]])
Notice that in the "natural order", 0 and 1 are far apart. However you reshape it, 0 and 1 will not be next to each other along the last axis, which is what you want in the desired array:
desired = np.array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
This requires some reordering, which in this case can be done by swapaxes:
In [53]: z.swapaxes(1,2).reshape(2*3, 3)
Out[53]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
because swapaxes(1,2) places the values in the desired order
In [56]: z.swapaxes(1,2).ravel()
Out[56]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17])
In [57]: desired.ravel()
Out[57]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17])
Note that the reshape method also has a order parameter which can be used to control the (C- or F-) order with which the elements are read from the array and placed in the reshaped array. However, I don't think this helps in your case.
Another way to think about the limits of reshape is to say that all reshapes followed by ravel are the same:
In [71]: z.reshape(3,3,2).ravel()
Out[71]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [72]: z.reshape(3,2,3).ravel()
Out[72]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [73]: z.reshape(3*2,3).ravel()
Out[73]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [74]: z.reshape(3*3,2).ravel()
Out[74]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
So if the ravel of the desired array is different, there is no way to obtain it only be reshaping.
The same goes for reshaping with order='F', provided you also ravel with order='F':
In [109]: z.reshape(2,3,3, order='F').ravel(order='F')
Out[109]:
array([ 0, 9, 1, 10, 2, 11, 3, 12, 4, 13, 5, 14, 6, 15, 7, 16, 8,
17])
In [110]: z.reshape(2*3*3, order='F').ravel(order='F')
Out[110]:
array([ 0, 9, 1, 10, 2, 11, 3, 12, 4, 13, 5, 14, 6, 15, 7, 16, 8,
17])
In [111]: z.reshape(2*3,3, order='F').ravel(order='F')
Out[111]:
array([ 0, 9, 1, 10, 2, 11, 3, 12, 4, 13, 5, 14, 6, 15, 7, 16, 8,
17])
It is possible to obtain the desired array using two reshapes:
In [83]: z.reshape(2, 3*3, order='F').reshape(2*3, 3)
Out[83]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
but I stumbled upon this serendipidously.
If I've totally misunderstood your question and x and y are the givens (not z) then you could obtain the desired array using row_stack instead of dstack:
In [88]: z = np.row_stack([x, y])
In [89]: z
Out[89]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
It you look at dstack code you'll discover that
np.dstack((x, y)).T
is effectively:
np.concatenate([i[:,:,None] for i in (x,y)],axis=2).transpose([2,1,0])
It reshapes each component array and then joins them along this new axis. Finally it transposes axes.
Your target is the same as (row stack)
np.concatenate((x,y),axis=0)
So with a bit of reverse engineering we can create it from z with
np.concatenate([i[...,0] for i in np.split(z.T,2,axis=2)],axis=0)
np.concatenate([i.T[:,:,0] for i in np.split(z,2,axis=0)],axis=0)
or
np.concatenate(np.split(z.T,2,axis=2),axis=0)[...,0]
or with a partial transpose we can keep the split-and-rejoin axis first, and just use concatenate:
np.concatenate(z.transpose(0,2,1),axis=0)
or its reshape equivalent
(z.transpose(0,2,1).reshape(-1,3))

How to slice and extend a 2D numpy array?

I have a numpy array of size nxm. I want the number of columns to be limited to k and rest of the columns to be extended in new rows. Following is the scenario -
Initial array: nxm
Final array: pxk
where p = (m/k)*n
Eg. n = 2, m = 6, k = 2
Initial array:
[[1, 2, 3, 4, 5, 6,],
[7, 8, 9, 10, 11, 12]]
Final array:
[[1, 2],
[7, 8],
[3, 4],
[9, 10],
[5, 6],
[11, 12]]
I tried using reshape but not getting the desired result.
Here's one way to do it
q=array([[1, 2, 3, 4, 5, 6,],
[7, 8, 9, 10, 11, 12]])
r=q.T.reshape(-1,2,2)
s=r.swapaxes(1,2)
t=s.reshape(-1,2)
as a one liner,
q.T.reshape(-1,2,2).swapaxes(1,2).reshape(-1,2)
array([[ 1, 2],
[ 7, 8],
[ 3, 4],
[ 9, 10],
[ 5, 6],
[11, 12]])
EDIT: for the general case, use
q=arange(1,1+n*m).reshape(n,m) #example input
r=q.T.reshape(-1,k,n)
s=r.swapaxes(1,2)
t=s.reshape(-1,k)
one liner is:
q.T.reshape(-1,k,n).swapaxes(1,2).reshape(-1,k)
example for n=3,m=12,k=4
q=array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
[25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]])
result is
array([[ 1, 2, 3, 4],
[13, 14, 15, 16],
[25, 26, 27, 28],
[ 5, 6, 7, 8],
[17, 18, 19, 20],
[29, 30, 31, 32],
[ 9, 10, 11, 12],
[21, 22, 23, 24],
[33, 34, 35, 36]])
Using numpy.vstack and numpy.hsplit:
a = np.array([[1, 2, 3, 4, 5, 6,],
[7, 8, 9, 10, 11, 12]])
n, m, k = 2, 6, 2
np.vstack(np.hsplit(a, m/k))
result array:
array([[ 1, 2],
[ 7, 8],
[ 3, 4],
[ 9, 10],
[ 5, 6],
[11, 12]])
UPDATE As flebool commented, above code is very slow, because hsplit returns a python list, and then vstack reconstructs the final array from a list of arrays.
Here's alternative solution that is much faster.
a.reshape(-1, m/k, k).transpose(1, 0, 2).reshape(-1, k)
or
a.reshape(-1, m/k, k).swapaxes(0, 1).reshape(-1, k)

Categories

Resources