I have a 4D numpy array. While slicing for multiple indices in a single dimension, my axis get interchanged. Am I missing something trivial here.
import numpy as np
from smartprint import smartprint as prints
a = np.random.rand(50, 60, 70, 80)
b = a[:, :, :, [2,3,4]]
prints (b.shape) # this works as expected
c = a[1, :, :, [2,3,4]]
prints (c.shape) # here, I see the axes are interchanged
Output:
b.shape : (50, 60, 70, 3)
c.shape : (3, 60, 70)
Here are some observations that may help explain the problem.
Start with a 3d array, with the expect strides:
In [158]: x=np.arange(24).reshape(2,3,4)
In [159]: x.shape,x.strides
Out[159]: ((2, 3, 4), (48, 16, 4))
Advanced indexing on the last axis:
In [160]: y=x[:,:,[0,1,2,3]]
In [161]: y.shape, y.strides
Out[161]: ((2, 3, 4), (12, 4, 24))
Notice that the strides are not in the normal C-contiguous order. For a 2d array we'd describe this a F-contiguous. It's an obscure indexing detail that usually doesn't matter.
Apparently when doing this indexing it first makes an array with the last, the indexed dimension, first:
In [162]: y.base.shape
Out[162]: (4, 2, 3)
In [163]: y.base.strides
Out[163]: (24, 12, 4)
y is this base with swapped axes, a view of its base.
The case with a slice in the middle is
In [164]: z=x[1,:,[0,1,2,3]]
In [165]: z.shape, z.strides
Out[165]: ((4, 3), (12, 4))
In [166]: z.base # its own base, not a view
Transposing z to the expected (3,4) shape would switch the strides to (4,12), F-contiguous.
With the two step indexing, we get an array with the expect shape, but the F strides. And its base looks a lot like z.
In [167]: w=x[1][:,[0,1,2,3]]
In [168]: w.shape, w.strides
Out[168]: ((3, 4), (4, 12))
In [169]: w.base.shape, w.base.strides
Out[169]: ((4, 3), (12, 4))
The docs justify the switch in axes by saying that there's an ambiguity when performing advanced indexing with a slice in the middle. It's perhaps clearest when using a (2,1) and (4,) indices:
In [171]: w=x[[[0],[1]],:,[0,1,2,3]]
In [172]: w.shape, w.strides
Out[172]: ((2, 4, 3), (48, 12, 4))
The middle, size 3 dimension, is "tacked on last". With x[1,:,[0,1,2,3]] that ambibuity argument isn't as good, but apparently it's using the same indexing method. When this was raised in github issues, the claim was that reworking the indexing to correct this was too difficult. Individual cases might be corrected, but a comprehensive change was too complicated.
This dimension switch seems to come up on SO a couple of times a year, an annoyance, but not a critical issue.
Related
I'm working on a problem where I've to reshape a (63,16,3) array's each element to an array (4,4,3), and I'm stuck there.
I generated an array of (63,16,3) using the random function of NumPy. Please help me how to reshape that array's each element into a (4,4,3) and store those outputs into an array.
import numpy as np
a = np.random.rand(63, 16, 3)
return an array b whose each element is (4,4,3)
I have successfully converted the array (63, 16, 3) into (4, 4, 3) but elementwise. What I mean can be cleared using the below snippet of code.
a_resize_0th_element = a[0].reshape(4,4,3)
But I'm looking for a method where this element-wise operation of transforming a (16, 3) array into the shape of (4, 4, 3) and can be done for all the 63 elements of array a and store everything into array b.
You just need reshape(). The size of the array is 63 * 16 * 3 = 3,024 elements. If you want to divide that into 4x4x3 arrays, that's 3,024 / (4 * 4 * 3) = 63 elements.
So:
b = np.reshape(a, (63, 4, 4, 3))
print(b[0].shape)
Result:
(4, 4, 3)
So, b is an array with 63 shape (4, 4, 3) arrays.
Note: obviously, 4 * 4 = 16 here, but generally this works. However, if you don't want to do the math yourself, you can also just use this:
b = np.reshape(a, (-1, 4, 4, 3))
The -1 will cause numpy to figure it out itself and it will give you the same result.
I have 6 files with shape (6042,) or 1 column. I used dstack to stack the 6 files in hopes of getting a shape (6042, 1, 6). But after I stack it I get shape (1, 6042, 6). Then I tried to change the order using
new_train = np.reshape(train_x,(train_x[1],1,train_x[2]))
error appears:
IndexError: index 1 is out of bounds for axis 0 with size 1
This is my dstack code:
train_x = dstack([train_data['gx'],train_data['gy'], train_data['gz'], train_data['ax'],train_data['ay'], train_data['az']])
error is because
train_x[1]
tries looking 2nd row of train_x but it has only 1 row as you said shape 1, 6042, 6). So you need to look shape and index it
new_train = np.reshape(train_x, (train_x.shape[1], 1, train_x.shape[2]))
but this can be also doable with transpose
new_train = train_x.transpose(1, 0, 2)
so this changes axes 0 and 1's positions.
Other solution is fixing dstack's way. It gives "wrong" shape because your datas shape not (6042, 1) but (6042,) as you say. So if you reshape the datas before dstack it should also work:
datas = [train_data['gx'],train_data['gy'], train_data['gz'],
train_data['ax'],train_data['ay'], train_data['az']]
#this list comprehension makes all shape (6042, 1) now
new_datas = [td[:, np.newaxis] for td in datas]
new_train = dstack(new_datas)
You can use np.moveaxis(X, 0, -2), where X is your (1,6042,6) array.
This function swaps the axis. 0 for your source axis and -2 is your destination axis.
np.dstack uses:
arrs = atleast_3d(*tup)
to convert the list of arrays to a list of 3d arrays.
In [51]: alist = [np.ones(3,int),np.zeros(3,int)]
In [52]: alist
Out[52]: [array([1, 1, 1]), array([0, 0, 0])]
In [53]: np.atleast_3d(*alist)
Out[53]:
[array([[[1],
[1],
[1]]]),
array([[[0],
[0],
[0]]])]
In [54]: _[0].shape
Out[54]: (1, 3, 1)
Concatenating those on the last dimension produces the (1,n,6) kind of result.
With expand_dims we can adjust the shape of all arrays to (n,1,1), and then do the concatenate:
In [62]: np.expand_dims(alist[0],[1,2]).shape
Out[62]: (3, 1, 1)
In [63]: np.concatenate([np.expand_dims(a,[1,2]) for a in alist], axis=2)
Out[63]:
array([[[1, 0]],
[[1, 0]],
[[1, 0]]])
In [64]: _.shape
Out[64]: (3, 1, 2)
direct reshape or newaxis would work just as well:
In [65]: np.concatenate([a[:,None,None] for a in alist], axis=2).shape
Out[65]: (3, 1, 2)
stack is another cover that adjusts shapes before concatenate:
In [67]: np.stack(alist,1).shape
Out[67]: (3, 2)
In [68]: np.stack(alist,1)[:,None].shape
Out[68]: (3, 1, 2)
So there are lots of ways to get what you want, whether it means adjusting shapes before the concatenate, or after.
I don't udnerstand how tensordot works and I was reading the official documentation but I don't understand at all what is happening there.
a = np.arange(60.).reshape(3,4,5)
b = np.arange(24.).reshape(4,3,2)
c = np.tensordot(a,b, axes=([1,0],[0,1]))
c.shape
(5, 2)
Why is the shape (5, 2)? What exactly is happening?
I also read this article but the answer is confusing me.
In [7]: A = np.random.randint(2, size=(2, 6, 5))
...: B = np.random.randint(2, size=(3, 2, 4))
...:
In [9]: np.tensordot(A, B, axes=((0),(1))).shape
Out[9]: (6, 5, 3, 4)
A : (2, 6, 5) -> reduction of axis=0
B : (3, 2, 4) -> reduction of axis=1
Output : `(2, 6, 5)`, `(3, 2, 4)` ===(2 gone)==> `(6,5)` + `(3,4)` => `(6,5,3,4)`
Why is the shape (6, 5, 3, 4)?
In [196]: a = np.arange(60.).reshape(3,4,5)
...: b = np.arange(24.).reshape(4,3,2)
...: c = np.tensordot(a,b, axes=([1,0],[0,1]))
In [197]: c
Out[197]:
array([[4400., 4730.],
[4532., 4874.],
[4664., 5018.],
[4796., 5162.],
[4928., 5306.]])
I find the einsum equivalent to be easier to "read":
In [198]: np.einsum('ijk,jil->kl',a,b)
Out[198]:
array([[4400., 4730.],
[4532., 4874.],
[4664., 5018.],
[4796., 5162.],
[4928., 5306.]])
tensordot works by transposing and reshaping the inputs to reduce the problem to a simple dot:
In [204]: a1 = a.transpose(2,1,0).reshape(5,12)
In [205]: b1 = b.reshape(12,2)
In [206]: np.dot(a1,b1) # or a1#b1
Out[206]:
array([[4400., 4730.],
[4532., 4874.],
[4664., 5018.],
[4796., 5162.],
[4928., 5306.]])
tensordot can do further manipulation to the result, but that's not needed here.
I had to try several things before I got a1/b1 right. For example a.transpose(2,0,1).reshape(5,12) produces the right shape, but different values.
yet another version:
In [210]: (a.transpose(1,0,2)[:,:,:,None]*b[:,:,None,:]).sum((0,1))
Out[210]:
array([[4400., 4730.],
[4532., 4874.],
[4664., 5018.],
[4796., 5162.],
[4928., 5306.]])
I have an array of shape (5,2) which each row consist of an array of shape (4,3,2) and a float number.
After I slice that array[:,0], I get an array of shape (5,) which each element has shape of (4,3,2), instead of an array of shape (5,4,3,2) (even if I'd use np.array()).
Why?
Edited
Example:
a1 = np.arange(50).reshape(5, 5, 2)
a2 = np.arange(50).reshape(5, 5, 2)
b1 = 15.0
b2 = 25.0
h = []
h.append(np.array([a1, b1]))
h.append(np.array([a2, b2]))
h = np.array(h)[:,0]
np.shape(h) # (2,)
np.shape(h[0]) # (5, 5, 2)
np.shape(h[1]) # (5, 5, 2)
h = np.array(h)
np.shape(h) # (2,) Why not (2, 5, 5, 2)?
You have an array of objects; You can use np.stack to convert it to the shape you need if you are sure all the sub elements have the same shape:
np.stack(a[:,0])
a = np.array([[np.arange(24).reshape(4,3,2), 1.]]*5)
a.shape
# (5, 2)
a[:,0].shape
# (5,)
a[:,0][0].shape
# (4, 3, 2)
np.stack(a[:,0]).shape
# (5, 4, 3, 2)
In [121]: a1.dtype, a1.shape
Out[121]: (dtype('int32'), (5, 5, 2))
In [122]: c1 = np.array([a1,b1])
In [123]: c1.dtype, c1.shape
Out[123]: (dtype('O'), (2,))
Because a1 and b1 are different shaped objects (b1 isn't even an array), an array made from them will have dtype object. And the h made from several continues to be object dtype.
In [124]: h = np.array(h)
In [125]: h.dtype, h.shape
Out[125]: (dtype('O'), (2, 2))
In [126]: h[:,1]
Out[126]: array([15.0, 25.0], dtype=object)
In [127]: h[:,0].dtype
Out[127]: dtype('O')
After the appends, h (as an array) is object dtype. The 2nd column is the b1 and b2 values, the 1st column the a1 and a2.
Some form of concatenate is required to combine those a1 a2 arrays into one. stack does it on a new axis.
In [128]: h[0,0].shape
Out[128]: (5, 5, 2)
In [129]: np.array(h[:,0]).shape # np.array doesn't cross the object boundary
Out[129]: (2,)
In [130]: np.stack(h[:,0]).shape
Out[130]: (2, 5, 5, 2)
In [131]: np.concatenate(h[:,0],0).shape
Out[131]: (10, 5, 2)
Turning the (2,) array into a list, does allow np.array to recombine the elements into a higher dimensional array, just as np.stack does:
In [133]: np.array(list(h[:,0])).shape
Out[133]: (2, 5, 5, 2)
You appear to believe that Numpy can magically divine your intent. As #Barmar explains in the comments, when you slice a shape(5,2) array with [:, 0] you get all rows of the first column of that array. Each element of that slice is a shape(4,3,2) array. Numpy is giving you exactly what you asked for.
If you want to convert that into a shape(5,4,3,2) array you'll need to perform further processing to extract the elements from the shape(4,3,2) arrays.
Suppose that I have two numpy arrays of the form
x = [[1,2]
[2,4]
[3,6]
[4,NaN]
[5,10]]
y = [[0,-5]
[1,0]
[2,5]
[5,20]
[6,25]]
is there an efficient way to merge them such that I have
xmy = [[0, NaN, -5 ]
[1, 2, 0 ]
[2, 4, 5 ]
[3, 6, NaN]
[4, NaN, NaN]
[5, 10, 20 ]
[6, NaN, 25 ]
I can implement a simple function using search to find the index but this is not elegant and potentially inefficient for a lot of arrays and large dimensions. Any pointer is appreciated.
See numpy.lib.recfunctions.join_by
It only works on structured arrays or recarrays, so there are a couple of kinks.
First you need to be at least somewhat familiar with structured arrays. See here if you're not.
import numpy as np
import numpy.lib.recfunctions
# Define the starting arrays as structured arrays with two fields ('key' and 'field')
dtype = [('key', np.int), ('field', np.float)]
x = np.array([(1, 2),
(2, 4),
(3, 6),
(4, np.NaN),
(5, 10)],
dtype=dtype)
y = np.array([(0, -5),
(1, 0),
(2, 5),
(5, 20),
(6, 25)],
dtype=dtype)
# You want an outer join, rather than the default inner join
# (all values are returned, not just ones with a common key)
join = np.lib.recfunctions.join_by('key', x, y, jointype='outer')
# Now we have a structured array with three fields: 'key', 'field1', and 'field2'
# (since 'field' was in both arrays, it renamed x['field'] to 'field1', and
# y['field'] to 'field2')
# This returns a masked array, if you want it filled with
# NaN's, do the following...
join.fill_value = np.NaN
join = join.filled()
# Just displaying it... Keep in mind that as a structured array,
# it has one dimension, where each row contains the 3 fields
for row in join:
print row
This outputs:
(0, nan, -5.0)
(1, 2.0, 0.0)
(2, 4.0, 5.0)
(3, 6.0, nan)
(4, nan, nan)
(5, 10.0, 20.0)
(6, nan, 25.0)
Hope that helps!
Edit1: Added example
Edit2: Really shouldn't join with floats... Changed 'key' field to an int.