Multidimensional array in numpy - python

I have an array of shape (5,2) which each row consist of an array of shape (4,3,2) and a float number.
After I slice that array[:,0], I get an array of shape (5,) which each element has shape of (4,3,2), instead of an array of shape (5,4,3,2) (even if I'd use np.array()).
Why?
Edited
Example:
a1 = np.arange(50).reshape(5, 5, 2)
a2 = np.arange(50).reshape(5, 5, 2)
b1 = 15.0
b2 = 25.0
h = []
h.append(np.array([a1, b1]))
h.append(np.array([a2, b2]))
h = np.array(h)[:,0]
np.shape(h) # (2,)
np.shape(h[0]) # (5, 5, 2)
np.shape(h[1]) # (5, 5, 2)
h = np.array(h)
np.shape(h) # (2,) Why not (2, 5, 5, 2)?

You have an array of objects; You can use np.stack to convert it to the shape you need if you are sure all the sub elements have the same shape:
np.stack(a[:,0])
a = np.array([[np.arange(24).reshape(4,3,2), 1.]]*5)
a.shape
# (5, 2)
a[:,0].shape
# (5,)
a[:,0][0].shape
# (4, 3, 2)
np.stack(a[:,0]).shape
# (5, 4, 3, 2)

In [121]: a1.dtype, a1.shape
Out[121]: (dtype('int32'), (5, 5, 2))
In [122]: c1 = np.array([a1,b1])
In [123]: c1.dtype, c1.shape
Out[123]: (dtype('O'), (2,))
Because a1 and b1 are different shaped objects (b1 isn't even an array), an array made from them will have dtype object. And the h made from several continues to be object dtype.
In [124]: h = np.array(h)
In [125]: h.dtype, h.shape
Out[125]: (dtype('O'), (2, 2))
In [126]: h[:,1]
Out[126]: array([15.0, 25.0], dtype=object)
In [127]: h[:,0].dtype
Out[127]: dtype('O')
After the appends, h (as an array) is object dtype. The 2nd column is the b1 and b2 values, the 1st column the a1 and a2.
Some form of concatenate is required to combine those a1 a2 arrays into one. stack does it on a new axis.
In [128]: h[0,0].shape
Out[128]: (5, 5, 2)
In [129]: np.array(h[:,0]).shape # np.array doesn't cross the object boundary
Out[129]: (2,)
In [130]: np.stack(h[:,0]).shape
Out[130]: (2, 5, 5, 2)
In [131]: np.concatenate(h[:,0],0).shape
Out[131]: (10, 5, 2)
Turning the (2,) array into a list, does allow np.array to recombine the elements into a higher dimensional array, just as np.stack does:
In [133]: np.array(list(h[:,0])).shape
Out[133]: (2, 5, 5, 2)

You appear to believe that Numpy can magically divine your intent. As #Barmar explains in the comments, when you slice a shape(5,2) array with [:, 0] you get all rows of the first column of that array. Each element of that slice is a shape(4,3,2) array. Numpy is giving you exactly what you asked for.
If you want to convert that into a shape(5,4,3,2) array you'll need to perform further processing to extract the elements from the shape(4,3,2) arrays.

Related

How to reorder dstack

I have 6 files with shape (6042,) or 1 column. I used dstack to stack the 6 files in hopes of getting a shape (6042, 1, 6). But after I stack it I get shape (1, 6042, 6). Then I tried to change the order using
new_train = np.reshape(train_x,(train_x[1],1,train_x[2]))
error appears:
IndexError: index 1 is out of bounds for axis 0 with size 1
This is my dstack code:
train_x = dstack([train_data['gx'],train_data['gy'], train_data['gz'], train_data['ax'],train_data['ay'], train_data['az']])
error is because
train_x[1]
tries looking 2nd row of train_x but it has only 1 row as you said shape 1, 6042, 6). So you need to look shape and index it
new_train = np.reshape(train_x, (train_x.shape[1], 1, train_x.shape[2]))
but this can be also doable with transpose
new_train = train_x.transpose(1, 0, 2)
so this changes axes 0 and 1's positions.
Other solution is fixing dstack's way. It gives "wrong" shape because your datas shape not (6042, 1) but (6042,) as you say. So if you reshape the datas before dstack it should also work:
datas = [train_data['gx'],train_data['gy'], train_data['gz'],
train_data['ax'],train_data['ay'], train_data['az']]
#this list comprehension makes all shape (6042, 1) now
new_datas = [td[:, np.newaxis] for td in datas]
new_train = dstack(new_datas)
You can use np.moveaxis(X, 0, -2), where X is your (1,6042,6) array.
This function swaps the axis. 0 for your source axis and -2 is your destination axis.
np.dstack uses:
arrs = atleast_3d(*tup)
to convert the list of arrays to a list of 3d arrays.
In [51]: alist = [np.ones(3,int),np.zeros(3,int)]
In [52]: alist
Out[52]: [array([1, 1, 1]), array([0, 0, 0])]
In [53]: np.atleast_3d(*alist)
Out[53]:
[array([[[1],
[1],
[1]]]),
array([[[0],
[0],
[0]]])]
In [54]: _[0].shape
Out[54]: (1, 3, 1)
Concatenating those on the last dimension produces the (1,n,6) kind of result.
With expand_dims we can adjust the shape of all arrays to (n,1,1), and then do the concatenate:
In [62]: np.expand_dims(alist[0],[1,2]).shape
Out[62]: (3, 1, 1)
In [63]: np.concatenate([np.expand_dims(a,[1,2]) for a in alist], axis=2)
Out[63]:
array([[[1, 0]],
[[1, 0]],
[[1, 0]]])
In [64]: _.shape
Out[64]: (3, 1, 2)
direct reshape or newaxis would work just as well:
In [65]: np.concatenate([a[:,None,None] for a in alist], axis=2).shape
Out[65]: (3, 1, 2)
stack is another cover that adjusts shapes before concatenate:
In [67]: np.stack(alist,1).shape
Out[67]: (3, 2)
In [68]: np.stack(alist,1)[:,None].shape
Out[68]: (3, 1, 2)
So there are lots of ways to get what you want, whether it means adjusting shapes before the concatenate, or after.

Numpy Matrix Subtraction Different Dimensions

I currently have a 5D numpy array of dimensions 40 x 3 x 3 x 5 x 1000 where the dimensions are labelled by a x b x c x d x e respectively.
I have another 2D numpy array of dimensions 3 x 1000 where the dimensions are labelled by b x e respectively.
I wish to subtract the 5D array from the 2D array.
One way I was thinking of was to expand the 2D into a 5D array (since the 2D array does not change for all combinations of the other 3 dimensions). I am not sure what array method/numpy function I can use to do this.
I tend to start getting lost with nD array manipulations. Thank you for assisting.
In [217]: a,b,c,d,e = 2,3,4,5,6
In [218]: A = np.ones((a,b,c,d,e),int); B = np.ones((b,e),int)
In [219]: A.shape
Out[219]: (2, 3, 4, 5, 6)
In [220]: B.shape
Out[220]: (3, 6)
In [221]: B[None,:,None,None,:].shape # could also use reshape()
Out[221]: (1, 3, 1, 1, 6)
In [222]: C = B[None,:,None,None,:]-A
In [223]: C.shape
Out[223]: (2, 3, 4, 5, 6)
The first None isn't essential; numpy will add it as needed, but as a human it might help to see it.
IIUC, suppose your arrays are a and b:
np.swapaxes(np.swapaxes(a, 1, 3) - b, 1, 3)

combine numpy array "element-wise"

Currently I have two arrays: the shape of a1 is (5,4,6,3), the second one a2 is (5,4,6) and finally I want to get a merged array (5,4,6,4)
Currently I "for-loop" each (6,3) array and np.stack it with corresponding (6,1) to (6,4).
for i in range(a1.shape[0]):
for j in range(a1.shape[1]):
a = np.hstack((a1[i,j], a2[i,j].reshape(6,1)))
However, it's not pretty efficient if it's much bigger than 5*4.
Do you have a better way?
Is this what you want?
import numpy as np
a1 = np.ones((5, 4, 6, 3))
a2 = np.ones((5, 4, 6))
result = np.concatenate((a1, a2[..., np.newaxis]), axis=-1)
print(result.shape)
(5, 4, 6, 4)

broadcasting arrays in numpy

I got an array and reshaped it to the following dimentions: (-1,1,1,1) and (-1,1):
Array A:
[-0.888788523827 0.11842529285 0.319928774626 0.319928774626 0.378755429421 1.225877519716 3.830653798838]
A.reshape(-1,1,1,1):
[[[[-0.888788523827]]]
[[[ 0.11842529285 ]]]
[[[ 0.319928774626]]]
[[[ 0.319928774626]]]
[[[ 0.378755429421]]]
[[[ 1.225877519716]]]
[[[ 3.830653798838]]]]
A.reshape(-1,1):
[[-0.888788523827]
[ 0.11842529285 ]
[ 0.319928774626]
[ 0.319928774626]
[ 0.378755429421]
[ 1.225877519716]
[ 3.830653798838]]
Then I have done substractig and broadcasting came in, so my resulting matrix is 7x1x7x1.
I have a hard time to visualize the intermediate step what broadcasting does. I mean I cannot imagine what elements of arrays are repeated and what they look like while broadcasting.
Could somebody shed some light on this problem,please?
In [5]: arr = np.arange(4)
In [6]: A = arr.reshape(-1,1,1,1)
In [7]: B = arr.reshape(-1,1)
In [8]: C = A + B
In [9]: C.shape
Out[9]: (4, 1, 4, 1)
In [10]: A.shape
Out[10]: (4, 1, 1, 1)
In [11]: B.shape
Out[11]: (4, 1)
There are 2 basic broadcasting rules:
expand the dimensions to match - by adding size 1 dimensions at the start
adjust all size 1 dimensions to match
So in this example:
(4,1,1,1) + (4,1)
(4,1,1,1) + (1,1,4,1) # add 2 size 1's to B
(4,1,4,1) + (4,1,4,1) # adjust 2 of the 1's to 4
(4,1,4,1)
The first step is, perhaps, the most confusing. The (4,1) is expanded to (1,1,4,1), not (4,1,1,1). The rule is intended to avoid ambiguity - by expanding in a consistent manner, not necessarily what a human might intuitively want.
Imagine the case where both arrays need expansion to match, and it could add a dimension in either direction:
(4,) and (3,)
(1,4) and (3,1) or (4,1) and (1,3)
(3,4) or (4,3)
confusion
The rule requires that the programmer choose which one expands to the right (4,1) or (3,1). numpy can then unambiguously add the other.
For a simpler example:
In [22]: A=np.arange(3).reshape(-1,1)
In [23]: B=np.arange(3)
In [24]: C = A+B (3,1)+(3,) => (3,1)+(1,3) => (3,3)
In [25]: C
Out[25]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4]])
In [26]: C.shape
Out[26]: (3, 3)
The [0,2,4] are present, but on the diagonal of C.
When broadcasting like this, the result is a kind of outer sum:
In [27]: np.add.outer(B,B)
Out[27]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4]])

Explaining the differences between dim, shape, rank, dimension and axis in numpy

I'm new to python and numpy in general. I read several tutorials and still so confused between the differences in dim, ranks, shape, aixes and dimensions. My mind seems to be stuck at the matrix representation. So if you say that A is a matrix that looks like this:
A =
1 2 3
4 5 6
then all I can think of is a 2x3 matrix (two rows and three columns). Here I understand that the shape is 2x3. But I really I am unable to go out side the thinking of a 2D matrices. I don't understand for example the dot() documentation when it says "For N dimensions it is a sum product over the last axis of a and the second-to-last of b". I'm so confused and unable to understand this. I don't understand like if V is a N:1 vector and M is N:N matrix, how dot(V,M) or dot(M,V) work and the difference between them.
Can anyone then please explain to me what is a N dimensional array, what's a shape, what's an axis and how does it relate to the documentation of the dot() function? It would be great if the explanation visualizes the ideas.
Dimensionality of NumPy arrays must be understood in the data structures sense, not the mathematical sense, i.e. it's the number of scalar indices you need to obtain a scalar value.(*)
E.g., this is a 3-d array:
>>> X = np.arange(24).reshape(2, 3, 4)
>>> X
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
Indexing once gives a 2-d array (matrix):
>>> X[0]
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Indexing twice gives a 1-d array (vector), and indexing three times gives a scalar.
The rank of X is its number of dimensions:
>>> X.ndim
3
>>> np.rank(X)
3
Axis is roughly synonymous with dimension; it's used in broadcasting operations:
>>> X.sum(axis=0)
array([[12, 14, 16, 18],
[20, 22, 24, 26],
[28, 30, 32, 34]])
>>> X.sum(axis=1)
array([[12, 15, 18, 21],
[48, 51, 54, 57]])
>>> X.sum(axis=2)
array([[ 6, 22, 38],
[54, 70, 86]])
To be honest, I find this definition of "rank" confusing since it matches neither the name of the attribute ndim nor the linear algebra definition of rank.
Now regarding np.dot, what you have to understand is that there are three ways to represent a vector in NumPy: 1-d array, a column vector of shape (n, 1) or a row vector of shape (1, n). (Actually, there are more ways, e.g. as a (1, n, 1)-shaped array, but these are quite rare.) np.dot performs vector multiplication when both arguments are 1-d, matrix-vector multiplication when one argument is 1-d and the other is 2-d, and otherwise it performs a (generalized) matrix multiplication:
>>> A = np.random.randn(2, 3)
>>> v1d = np.random.randn(2)
>>> np.dot(v1d, A)
array([-0.29269547, -0.52215117, 0.478753 ])
>>> vrow = np.atleast_2d(v1d)
>>> np.dot(vrow, A)
array([[-0.29269547, -0.52215117, 0.478753 ]])
>>> vcol = vrow.T
>>> np.dot(vcol, A)
Traceback (most recent call last):
File "<ipython-input-36-98949c6de990>", line 1, in <module>
np.dot(vcol, A)
ValueError: matrices are not aligned
The rule "sum product over the last axis of a and the second-to-last of b" matches and generalizes the common definition of matrix multiplication.
(*) Arrays of dtype=object are a bit of an exception, since they treat any Python object as a scalar.
np.dot is a generalization of matrix multiplication.
In regular matrix multiplication, an (N,M)-shape matrix multiplied with a (M,P)-shaped matrix results in a (N,P)-shaped matrix. The resultant shape can be thought of as being formed by squashing the two shapes together ((N,M,M,P)) and then removing the middle numbers, M (to produce (N,P)). This is the property that np.dot preserves while generalizing to arrays of higher dimension.
When the docs say,
"For N dimensions it is a sum product over the last axis of a and the
second-to-last of b".
it is speaking to this point. An array of shape (u,v,M) dotted with an array of shape (w,x,y,M,z) would result in an array of shape (u,v,w,x,y,z).
Let's see how this rule looks when applied to
In [25]: V = np.arange(2); V
Out[25]: array([0, 1])
In [26]: M = np.arange(4).reshape(2,2); M
Out[26]:
array([[0, 1],
[2, 3]])
First, the easy part:
In [27]: np.dot(M, V)
Out[27]: array([1, 3])
There is no surprise here; this is just matrix-vector multiplication.
Now consider
In [28]: np.dot(V, M)
Out[28]: array([2, 3])
Look at the shape of V and M:
In [29]: V.shape
Out[29]: (2,)
In [30]: M.shape
Out[30]: (2, 2)
So np.dot(V,M) is like matrix multiplication of a (2,)-shaped matrix with a (2,2)-shaped matrix, which should result in a (2,)-shaped matrix.
The last (and only) axis of V and the second-to-last axis of M (aka the first axis of M) are multiplied and summed over, leaving only the last axis of M.
If you want to visualize this: np.dot(V, M) looks as though V has 1 row and 2 columns:
[[0, 1]] * [[0, 1],
[2, 3]]
and so, when V is multiplied by M, np.dot(V, M) equals
[[0*0 + 1*2], [2,
[0*1 + 1*3]] = 3]
However, I don't really recommend trying to visualize NumPy arrays this way -- at least I never do. I focus almost exclusively on the shape.
(2,) * (2,2)
\ /
\ /
(2,)
You just think about the "middle" axes being dotted, and disappearing from the resultant shape.
np.sum(arr, axis=0) tells NumPy to sum the elements in arr eliminating the 0th axis. If arr is 2-dimensional, the 0th axis are the rows. So for example, if arr looks like this:
In [1]: arr = np.arange(6).reshape(2,3); arr
Out[1]:
array([[0, 1, 2],
[3, 4, 5]])
then np.sum(arr, axis=0) will sum along the columns, thus eliminating the 0th axis (i.e. the rows).
In [2]: np.sum(arr, axis=0)
Out[2]: array([3, 5, 7])
The 3 is the result of 0+3, the 5 equals 1+4, the 7 equals 2+5.
Notice arr had shape (2,3), and after summing, the 0th axis is removed so the result is of shape (3,). The 0th axis had length 2, and each sum is composed of adding those 2 elements. The shape (2,3) "becomes" (3,). You can know the resultant shape in advance! This can help guide your thinking.
To test your understanding, consider np.sum(arr, axis=1). Now the 1-axis is removed. So the resultant shape will be (2,), and element in the result will be the sum of 3 values.
In [3]: np.sum(arr, axis=1)
Out[3]: array([ 3, 12])
The 3 equals 0+1+2, and the 12 equals 3+4+5.
So we see that summing an axis eliminates that axis from the result. This has bearing on np.dot, since the calculation performed by np.dot is a sum of products. Since np.dot performs a summing operation along certain axes, that axis is removed from the result. That is why applying np.dot to arrays of shape (2,) and (2,2) results in an array of shape (2,). The first 2 in both arrays is summed over, eliminating both, leaving only the second 2 in the second array.
In your case,
A is a 2D array, namely a matrix, with its shape being (2, 3). From docstring of numpy.matrix:
A matrix is a specialized 2-D array that retains its 2-D nature through operations.
numpy.rank return the number of dimensions of an array, which is quite different from the concept of rank in linear algebra, e.g. A is an array of dimension/rank 2.
np.dot(V, M), or V.dot(M) multiplies matrix V with M. Note that numpy.dot do the multiplication as far as possible. If V is N:1 and M is N:N, V.dot(M) would raise an ValueError.
e.g.:
In [125]: a
Out[125]:
array([[1],
[2]])
In [126]: b
Out[126]:
array([[2, 3],
[1, 2]])
In [127]: a.dot(b)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-127-9a1f5761fa9d> in <module>()
----> 1 a.dot(b)
ValueError: objects are not aligned
EDIT:
I don't understand the difference between Shape of (N,) and (N,1) and it relates to the dot() documentation.
V of shape (N,) implies an 1D array of length N, whilst shape (N, 1) implies a 2D array with N rows, 1 column:
In [2]: V = np.arange(2)
In [3]: V.shape
Out[3]: (2,)
In [4]: Q = V[:, np.newaxis]
In [5]: Q.shape
Out[5]: (2, 1)
In [6]: Q
Out[6]:
array([[0],
[1]])
As the docstring of np.dot says:
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D
arrays to inner product of vectors (without complex conjugation).
It also performs vector-matrix multiplication if one of the parameters is a vector. Say V.shape==(2,); M.shape==(2,2):
In [17]: V
Out[17]: array([0, 1])
In [18]: M
Out[18]:
array([[2, 3],
[4, 5]])
In [19]: np.dot(V, M) #treats V as a 1*N 2D array
Out[19]: array([4, 5]) #note the result is a 1D array of shape (2,), not (1, 2)
In [20]: np.dot(M, V) #treats V as a N*1 2D array
Out[20]: array([3, 5]) #result is still a 1D array of shape (2,), not (2, 1)
In [21]: Q #a 2D array of shape (2, 1)
Out[21]:
array([[0],
[1]])
In [22]: np.dot(M, Q) #matrix multiplication
Out[22]:
array([[3], #gets a result of shape (2, 1)
[5]])

Categories

Resources