How to properly stack numpy arrays? - python

I am having trouble understanding how data is being stacked in a numpy array and why I cannot match the last data that I added to an array with the last generated data. Here is a MWE:
import numpy as np
np.random.seed(1)
# build storage
container = []
# gen data
x = np.random.random((13, 1, 64, 768))
# add to container
container.append(x)
# gen data
x2 = np.random.random((13, 1, 64, 768))
# add to container
container.append(x2)
# convert to np array
container = np.asarray(container)
# reshape to [13, 2, 64, 768]
container = container.reshape(13, 2, 64, 768)
# check that the last generated data matches the last appended data
assert np.all(x2.flatten() == container[:, -1, :, :].flatten()), 'not a match'

Instead of stacking manually with appending to lists and then reshaping you could use the vstack or the concatenate function of numpy.
# gen data
x1 = np.random.random((13, 1, 64, 768))
x2 = np.random.random((13, 1, 64, 768))
container = np.vstack((x1,x2))
assert np.all(x2.flatten()) == np.all(container[:, -1, :, :].flatten()), 'not a match'
To answer your question: your code does work, just make sure to put np.all() at both sides of the comparison. It's always a good idea to make your input much smaller (say (2,1,2,2)) so you can see what actually happens.

In [152]: alist = []
In [154]: alist.append(np.random.random((2,1,3)))
In [155]: alist.append(np.random.random((2,1,3)))
In [156]: alist
Out[156]:
[array([[[0.85221826, 0.56088315, 0.06232853]],
[[0.0966469 , 0.89513922, 0.44814579]]]),
array([[[0.86207845, 0.88895573, 0.62069196]],
[[0.11475614, 0.29473531, 0.11179268]]])]
Using np.array to join the list elements produces a 4d array - it has joined them on a new leading dimension:
In [157]: arr = np.array(alist)
In [158]: arr.shape
Out[158]: (2, 2, 1, 3)
In [159]: arr[-1,] # same as alist[-1]
Out[159]:
array([[[0.86207845, 0.88895573, 0.62069196]],
[[0.11475614, 0.29473531, 0.11179268]]])
If we concatenate on one of the dimensions:
In [160]: arr = np.concatenate(alist, axis=1)
In [161]: arr
Out[161]:
array([[[0.85221826, 0.56088315, 0.06232853],
[0.86207845, 0.88895573, 0.62069196]],
[[0.0966469 , 0.89513922, 0.44814579],
[0.11475614, 0.29473531, 0.11179268]]])
In [162]: arr.shape
Out[162]: (2, 2, 3) # note the shape - that 2nd 2 is the join axis
In [163]: arr[:,-1]
Out[163]:
array([[0.86207845, 0.88895573, 0.62069196],
[0.11475614, 0.29473531, 0.11179268]])
[163] has the same numbers as [159], but a (2,3) shape.
reshape keeps the values, but may 'shuffle' them:
In [164]: np.array(alist).reshape(2,2,3)
Out[164]:
array([[[0.85221826, 0.56088315, 0.06232853],
[0.0966469 , 0.89513922, 0.44814579]],
[[0.86207845, 0.88895573, 0.62069196],
[0.11475614, 0.29473531, 0.11179268]]])
We have transpose the leading 2 axes before reshape to match [161]
In [165]: np.array(alist).transpose(1,0,2,3)
Out[165]:
array([[[[0.85221826, 0.56088315, 0.06232853]],
[[0.86207845, 0.88895573, 0.62069196]]],
[[[0.0966469 , 0.89513922, 0.44814579]],
[[0.11475614, 0.29473531, 0.11179268]]]])
In [166]: np.array(alist).transpose(1,0,2,3).reshape(2,2,3)
Out[166]:
array([[[0.85221826, 0.56088315, 0.06232853],
[0.86207845, 0.88895573, 0.62069196]],
[[0.0966469 , 0.89513922, 0.44814579],
[0.11475614, 0.29473531, 0.11179268]]])

Related

Python | Addition of numpy arrays with different shapes [duplicate]

I have a question regarding the conversion between (N,) dimension arrays and (N,1) dimension arrays. For example, y is (2,) dimension.
A=np.array([[1,2],[3,4]])
x=np.array([1,2])
y=np.dot(A,x)
y.shape
Out[6]: (2,)
But the following will show y2 to be (2,1) dimension.
x2=x[:,np.newaxis]
y2=np.dot(A,x2)
y2.shape
Out[14]: (2, 1)
What would be the most efficient way of converting y2 back to y without copying?
Thanks,
Tom
reshape works for this
a = np.arange(3) # a.shape = (3,)
b = a.reshape((3,1)) # b.shape = (3,1)
b2 = a.reshape((-1,1)) # b2.shape = (3,1)
c = b.reshape((3,)) # c.shape = (3,)
c2 = b.reshape((-1,)) # c2.shape = (3,)
note also that reshape doesn't copy the data unless it needs to for the new shape (which it doesn't need to do here):
a.__array_interface__['data'] # (22356720, False)
b.__array_interface__['data'] # (22356720, False)
c.__array_interface__['data'] # (22356720, False)
Use numpy.squeeze:
>>> x = np.array([[[0], [1], [2]]])
>>> x.shape
(1, 3, 1)
>>> np.squeeze(x).shape
(3,)
>>> np.squeeze(x, axis=(2,)).shape
(1, 3)
Slice along the dimension you want, as in the example below. To go in the reverse direction, you can use None as the slice for any dimension that should be treated as a singleton dimension, but which is needed to make shapes work.
In [786]: yy = np.asarray([[11],[7]])
In [787]: yy
Out[787]:
array([[11],
[7]])
In [788]: yy.shape
Out[788]: (2, 1)
In [789]: yy[:,0]
Out[789]: array([11, 7])
In [790]: yy[:,0].shape
Out[790]: (2,)
In [791]: y1 = yy[:,0]
In [792]: y1.shape
Out[792]: (2,)
In [793]: y1[:,None]
Out[793]:
array([[11],
[7]])
In [794]: y1[:,None].shape
Out[794]: (2, 1)
Alternatively, you can use reshape:
In [795]: yy.reshape((2,))
Out[795]: array([11, 7])
the opposite translation can be made by:
np.atleast_2d(y).T
Another option in your toolbox could be ravel:
>>> y2.shape
(2, 1)
>>> y_ = y2.ravel()
>>> y_.shape
(2,)
Again, a copy is made only if needed, but this is not the case:
>>> y2.__array_interface__["data"]
(2700295136768, False)
>>> y_.__array_interface__["data"]
(2700295136768, False)
For further details, you can take a look at this answer.

How to multiply a numpy array by a list to get a multidimentional array?

In Python, I have a list and a numpy array.
I would like to multiply the array by the list in such a way that I get an array where the 3rd dimension represents the input array multiplied by each element of the list. Therefore:
in_list = [2,4,6]
in_array = np.random.rand(5,5)
result = ...
np.shape(result) ---> (3,5,5)
where (0,:,:) is the input array multiplied by the first element of the list (2);
(1,:,:) is the input array multiplied by the second element of the list (4), etc.
I have a feeling this question will be answered by broadcasting, but I'm not sure how to go around doing this.
You want np.multiply.outer. The outer method is defined for any NumPy "ufunc", including multiplication. Here's a demonstration:
In [1]: import numpy as np
In [2]: in_list = [2, 4, 6]
In [3]: in_array = np.random.rand(5, 5)
In [4]: result = np.multiply.outer(in_list, in_array)
In [5]: result.shape
Out[5]: (3, 5, 5)
In [6]: (result[1, :, :] == in_list[1] * in_array).all()
Out[6]: True
As you suggest, broadcasting gives an alternative solution: if you convert in_list to a 1d NumPy array of length 3, you can then reshape to an array of shape (3, 1, 1), and then a multiplication with in_array will broadcast appropriately:
In [9]: result2 = np.array(in_list)[:, None, None] * in_array
In [10]: result2.shape
Out[10]: (3, 5, 5)
In [11]: (result2[1, :, :] == in_list[1] * in_array).all()
Out[11]: True

numpy apply along n-spaces

I have a 4d array, and I would like to apply a function to each 2d slice taken by iterating over the last two dimensions. Viz, apply f(2d_array) to (x,y,0,0), and f(2d_array) to (x,y,0,1), etc etc. My function operates on the array in place, so the dimensions would be the same, but a general solution would return an array of shape (x',y',w,z), where w and z are the last two dimensions of the original array.
This could obviously be generalized to mD slices over an nD array.
Is there any built-in functionality that does this thing?
The 'basic' apply-along-axis model is to iterate on one axis, and pass the other to your function:
In [197]: def foo(x): # return same size
...: return x*2
...: np.array([foo(x) for x in np.arange(12).reshape(3,4)])
...:
Out[197]:
array([[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22]])
In [198]: def foo(x):
...: return x.sum() # return one less dim
...: np.array([foo(x) for x in np.arange(12).reshape(3,4)])
...:
Out[198]: array([ 6, 22, 38])
In [199]: def foo(x):
...: return x.sum(keepdims=True) # condense the dim
...: np.array([foo(x) for x in np.arange(12).reshape(3,4)])
...:
Out[199]:
array([[ 6],
[22],
[38]])
Your 4d problem can be massaged to fit this.
In [200]: arr_4d = np.arange(24).reshape(2,3,2,2)
In [201]: arr_2d = arr_4d.reshape(6,4).T
In [202]: res = np.array([foo(x) for x in arr_2d])
In [203]: res
Out[203]:
array([[60],
[66],
[72],
[78]])
In [204]: res.reshape(2,2)
Out[204]:
array([[60, 66],
[72, 78]])
which is the equivalent of doing:
In [205]: arr_4d[:,:,0,0].sum()
Out[205]: 60
In [206]: foo(arr_4d[:,:,0,0].ravel())
Out[206]: array([60])
apply_along_axis requires a function that takes a 1d array, but can be applied thus:
In [209]: np.apply_along_axis(foo,0,arr_4d.reshape(6,2,2))
Out[209]:
array([[[60, 66],
[72, 78]]])
foo could reshape its input to 2d, and pass it to a function that takes 2d. apply_along_index uses np.ndindex to generate the indices for the iteration axes.
In [212]: list(np.ndindex(2,2))
Out[212]: [(0, 0), (0, 1), (1, 0), (1, 1)]
np.vectorize normally works with a function that takes a scalar. But recent versions have a signature parameter, which I believe could be used to work with your case. It may require transposing the input so it iterates on the first two axes, passing the last two to function. See my answer at https://stackoverflow.com/a/46004266/901925.
None of these approaches offers a speed advantage.
Without reshaping or swapping, I can iterate with the help of ndindex.
Define a function that expects a 2d input:
def foo2(x):
return x.sum(axis=1, keepdims=True) # 2d
Index iterator for the last 2 dim of arr_4d:
In [260]: idx = np.ndindex(arr_4d.shape[-2:])
Do test calc to determine the shape of the return. vectorize and apply... do this sort of test.
In [261]: r1 = foo2(arr_4d[:,:,0,0]).shape
In [262]: r1
Out[262]: (2, 1)
The result array:
In [263]: res = np.zeros(r1+arr_4d.shape[-2:])
In [264]: res.shape
Out[264]: (2, 1, 2, 2)
Now iterate:
In [265]: for i,j in idx:
...: res[...,i,j] = foo2(arr_4d[...,i,j])
...:
In [266]: res
Out[266]:
array([[[[ 12., 15.],
[ 18., 21.]]],
[[[ 48., 51.],
[ 54., 57.]]]])
I guess you're looking for something like numpy.apply_over_axes coupled with a for loop to iterate other the varying axes.
I rolled my own. I'd be interested to know if there are any performance differences between this and #hpaulj's method and if there is reason to believe that writing a custom c module would be offer significant improvement. Of course #hpaulj's method is more general, since this is specific to my needing to just perform an operation on the array in place.
def apply_along_space(f, np_array, axes):
# apply the function f on each subspace given by iterating over the axes listed in axes, e.g. axes=(0,2)
for slic in itertools.product(*map(lambda ax: range(np_array.shape[ax]) if ax in axes else [slice(None,None,None)], range(len(np_array.shape)))):
f(np_array[slic])
return np_array

Inserting newaxis at variable position in NumPy arrays

Normally, when we know where should we insert the newaxis, we can do a[:, np.newaxis,...]. Is there any good way to insert the newaxis at certain axis?
Here is how I do it now. I think there must be some much better ways than this:
def addNewAxisAt(x, axis):
_s = list(x.shape)
_s.insert(axis, 1)
return x.reshape(tuple(_s))
def addNewAxisAt2(x, axis):
ind = [slice(None)]*x.ndim
ind.insert(axis, np.newaxis)
return x[ind]
That singleton dimension (dim length = 1) could be added as a shape criteria to the original array shape with np.insert and thus directly change its shape, like so -
x.shape = np.insert(x.shape,axis,1)
Well, we might as well extend this to invite more than one new axes with a bit of np.diff and np.cumsum trick, like so -
insert_idx = (np.diff(np.append(0,axis))-1).cumsum()+1
x.shape = np.insert(x.shape,insert_idx,1)
Sample runs -
In [151]: def addNewAxisAt(x, axis):
...: insert_idx = (np.diff(np.append(0,axis))-1).cumsum()+1
...: x.shape = np.insert(x.shape,insert_idx,1)
...:
In [152]: A = np.random.rand(4,5)
In [153]: addNewAxisAt(A, axis=1)
In [154]: A.shape
Out[154]: (4, 1, 5)
In [155]: A = np.random.rand(5,6,8,9,4,2)
In [156]: addNewAxisAt(A, axis=5)
In [157]: A.shape
Out[157]: (5, 6, 8, 9, 4, 1, 2)
In [158]: A = np.random.rand(5,6,8,9,4,2,6,7)
In [159]: addNewAxisAt(A, axis=(1,3,4,6))
In [160]: A.shape
Out[160]: (5, 1, 6, 1, 1, 8, 1, 9, 4, 2, 6, 7)
np.insert does
slobj = [slice(None)]*ndim
...
slobj[axis] = slice(None, index)
...
new[slobj] = arr[slobj2]
Like you it constructs a list of slices, and modifies one or more elements.
apply_along_axis constructs an array, and converts it to indexing tuple
outarr[tuple(i.tolist())] = res
Other numpy functions work this way as well.
My suggestion is to make initial list large enough to hold the None. Then I don't need to use insert:
In [1076]: x=np.ones((3,2,4),int)
In [1077]: ind=[slice(None)]*(x.ndim+1)
In [1078]: ind[2]=None
In [1080]: x[ind].shape
Out[1080]: (3, 2, 1, 4)
In [1081]: x[tuple(ind)].shape # sometimes converting a list to tuple is wise
Out[1081]: (3, 2, 1, 4)
Turns out there is a np.expand_dims
In [1090]: np.expand_dims(x,2).shape
Out[1090]: (3, 2, 1, 4)
It uses reshape like you do, but creates the new shape with tuple concatenation.
def expand_dims(a, axis):
a = asarray(a)
shape = a.shape
if axis < 0:
axis = axis + len(shape) + 1
return a.reshape(shape[:axis] + (1,) + shape[axis:])
Timings don't tell me much about which is better. They are the 2 µs range, where simply wrapping the code in a function makes a difference.

Python: numpy shape confusion

I have a numpy array:
>>> type(myArray1)
Out[14]: numpy.ndarray
>>> myArray1.shape
Out[13]: (500,)
I have another array:
>>> type(myArray2)
Out[14]: numpy.ndarray
>>> myArray2.shape
Out[13]: (500,1)
( 1 ) What is the difference between (500,) and (500,1) ?
( 2 ) How do I change (500,) to (500,1)
(1) The difference between (500,) and (500,1) is that the first is the shape of a one-dimensional array, while the second is the shape of a 2-dimensional array whose 2nd dimension has length 1. This may be confusing at first since other languages don't make that distinction.
(2) You can use np.reshape to do that:
myArray1.reshape(-1,1).
You can also add a dimension to your array using np.expand_dims: np.expand_dims(myArray1, axis = 1).
The difference between (500,) and (500,1) is the number of dimension (the first one is "totally flat").
You can try it by yourself:
import numpy as np
arr = np.array([i for i in range(250)])
arr.shape
# (250,)
new_arr = np.array([i for i in range(250)], ndmin=2).T
new_arr.shape
# (250, 1)
# You can also reshape it directly:
arr.shape = (250, 1)
# And look the result:
arr
# array([[ 0],
# [ 1],
# [ 2],
# [ 3],
# [ 4],
# (...)
Try also to reverse the shape, like (1, 500) instead of (500, 1).

Categories

Resources