Suppose I have 3 numpy arrays a, b, c, of the same shape, say
a.shape == b.shape == c.shape == (7,9)
Now I'd like to create a 3-dimensional array of size (7,9,3), say x, such that
x[:,:,0] == a
x[:,:,1] == b
x[:,:,2] == c
What is the "pythonic" way of doing it (perhaps in one line)?
Thanks in advance!
There's a function that does exactly that: numpy.dstack ("d" for "depth"). For example:
In [10]: import numpy as np
In [11]: a = np.ones((7, 9))
In [12]: b = a * 2
In [13]: c = a * 3
In [15]: x = np.dstack((a, b, c))
In [16]: x.shape
Out[16]: (7, 9, 3)
In [17]: (x[:, :, 0] == a).all()
Out[17]: True
In [18]: (x[:, :, 1] == b).all()
Out[18]: True
In [19]: (x[:, :, 2] == c).all()
Out[19]: True
TL;DR:
Use numpy.stack (docs), which joins a sequence of arrays along a new axis of your choice.
Although #NPE answer is very good and cover many cases, there are some scenarios in which numpy.dstack isn't the right choice (I've just found that out while trying to use it). That's because numpy.dstack, according to the docs:
Stacks arrays in sequence depth wise (along third axis).
This is equivalent to concatenation along the third axis after 2-D
arrays of shape (M,N) have been reshaped to (M,N,1) and 1-D arrays of
shape (N,) have been reshaped to (1,N,1).
Let's walk through an example in which this function isn't desirable. Suppose you have a list with 512 numpy arrays of shape (3, 3, 3) and want to stack them in order to get a new array of shape (3, 3, 3, 512). In my case, those 512 arrays were filters of a 2D-convolutional layer. If you use numpy.dstack:
>>> len(arrays_list)
512
>>> arrays_list[0].shape
(3, 3, 3)
>>> numpy.dstack(arrays_list).shape
(3, 3, 1536)
That's because numpy.dstack always stacks the arrays along the third axis! Alternatively, you should use numpy.stack (docs), which joins a sequence of arrays along a new axis of your choice:
>>> numpy.stack(arrays_list, axis=-1).shape
(3, 3, 3, 512)
In my case, I passed -1 to the axis parameter because I wanted the arrays stacked along the last axis.
Related
For example, I have a matrix with shape:
x = np.random.rand(3, 10, 2, 6)
As you can see, there are only two arrays along an axis=2.
I have a function that accepts these two arrays:
def f(arr1, arr2): # arr1 with shape (6, ) and arr2 with (6, )
return np.sum(arr1, arr2) # for simplicity
How can I apply this function along the second axis to x array in a vectorized way? Such that resulting array will be of shape [3, 10, dim of output].
I came across apply_along_axis routine, but it requires that f accepts only 1D slice.
You can't do it entirely arbitrarily, but your particular case reduces to
x.sum(axis=2)
If you want to add the arrays as in your code:
x[:, :, 0, :] + x[:, :, 1, :]
Let's suppose I have these two variables
matrices = np.random.rand(4,3,3)
vectors = np.random.rand(4,3,1)
What I would like to perform is the following:
dot_products = [matrix # vector for (matrix,vector) in zip(matrices,vectors)]
Therefore, I've tried using the np.tensordot method, which at first seemed to make sense, but this happened when testing
>>> np.tensordot(matrices,vectors,axes=([-2,-1],[-2,-1]))
...
ValueError: shape-mismatch for sum
>>> np.tensordot(matrices,vectors,axes=([-2,-1]))
...
ValueError: shape-mismatch for sum
Is it possible to achieve these multiple dot products with the mentioned Numpy method? If not, is there another way that I can accomplish this using Numpy?
The documentation for # is found at np.matmul. It is specifically designed for this kind of 'batch' processing:
In [76]: matrices = np.random.rand(4,3,3)
...: vectors = np.random.rand(4,3,1)
In [77]: dot_products = [matrix # vector for (matrix,vector) in zip(matrices,vectors)]
In [79]: np.array(dot_products).shape
Out[79]: (4, 3, 1)
In [80]: (matrices # vectors).shape
Out[80]: (4, 3, 1)
In [81]: np.allclose(np.array(dot_products), matrices#vectors)
Out[81]: True
A couple of problems with tensordot. The axes parameter specify which dimensions are summed, "dotted", In your case it would be the last of matrices and 2nd to the last of vectors. That's the standard dot paring.
In [82]: np.dot(matrices, vectors).shape
Out[82]: (4, 3, 4, 1)
In [84]: np.tensordot(matrices, vectors, (-1,-2)).shape
Out[84]: (4, 3, 4, 1)
You tried to specify 2 pairs of axes for summing. Also dot/tensordot does a kind of outer product on the other dimensions. You'd have to take the "diagonal" on the 4's. tensordot is not what you want for this operation.
We can be more explicit about the dimensions with einsum:
In [83]: np.einsum('ijk,ikl->ijl',matrices, vectors).shape
Out[83]: (4, 3, 1)
I have a 3D numpy array A representing a batch of images:
A.shape -> (batch_size, height, width)
I want to access this array using two other arrays Hs,Ws, of size batch_size.
They contain the x index and y index of each image that I want to access.
Example 2 images of size 3x3:
A.shape(2,3,3)
A = [[[1,2,3],[5,6,7],[8,9,10]], [[10,20,30],[50,60,70],[80,90,100]]]
Hs = [0,2]
Ws = [1,2]
I want to acces A so that I get:
A[:, Hs,Ws] = [2,100]
Doing it like this (A[:, Hs,Ws]) unfortunately results in a 2x2 array (batch_size x batch_size)
Executed with a for loop this would look like this:
Result = np.zeros(batch_size)
for b in range(0,batch_size):
Result[b] = A[b,Hs[b],Ws[b]]
Is it possible to do this without a for loop by accessing A directly in a vectorized manner?
Do you mean this:
In [6]: A = np.array(A); Hs=np.array(Hs); Ws=np.array(Ws)
In [7]: A.shape
Out[7]: (2, 3, 3)
In [8]: A[np.arange(2), Hs, Ws]
Out[8]: array([ 2, 100])
When using indexing arrays, they 'broadcast' against each other. Here with (2,),(2,),(2,) the broadcasting is eash.
If I create an array X = np.random.rand(D, 1) it has shape (3,1):
[[ 0.31215124]
[ 0.84270715]
[ 0.41846041]]
If I create my own array A = np.array([0,1,2]) then it has shape (1,3) and looks like
[0 1 2]
How can I force the shape (3, 1) on my array A?
You can assign a shape tuple directly to numpy.ndarray.shape.
A.shape = (3,1)
As of 2022, the docs state:
Setting arr.shape is discouraged and may be deprecated in the future.
Using ndarray.reshape is the preferred approach.
The current best solution would be
A = np.reshape(A, (3,1))
A=np.array([0,1,2])
A.shape=(3,1)
or
A=np.array([0,1,2]).reshape((3,1)) #reshape takes the tuple shape as input
The numpy module has a reshape function and the ndarray has a reshape method, either of these should work to create an array with the shape you want:
import numpy as np
A = np.reshape([1, 2, 3, 4], (4, 1))
# Now change the shape to (2, 2)
A = A.reshape(2, 2)
Numpy will check that the size of the array does not change, ie prod(old_shape) == prod(new_shape). Because of this relation, you're allowed to replace one of the values in shape with -1 and numpy will figure it out for you:
A = A.reshape([1, 2, 3, 4], (-1, 1))
You can set the shape directy i.e.
A.shape = (3L, 1L)
or you can use the resize function:
A.resize((3L, 1L))
or during creation with reshape
A = np.array([0,1,2]).reshape((3L, 1L))
Your 1-D array has the shape (3,):
>>>A = np.array([0,1,2]) # create 1-D array
>>>print(A.shape) # print array shape
(3,)
If you create an array with shape (1,3), you can use the numpy.reshape mentioned in other answers or numpy.swapaxes:
>>>A = np.array([[0,1,2]]) # create 2-D array
>>>print(A.shape) # print array shape
>>>A = np.swapaxes(A,0,1) # swap 0th and 1st axes
>>>A # display array with swapped axes
(1, 3)
array([[0],
[1],
[2]])
How do I get the dimensions of an array? For instance, this is 2x2:
a = np.array([[1,2],[3,4]])
Use .shape to obtain a tuple of array dimensions:
>>> a.shape
(2, 2)
First:
By convention, in Python world, the shortcut for numpy is np, so:
In [1]: import numpy as np
In [2]: a = np.array([[1,2],[3,4]])
Second:
In Numpy, dimension, axis/axes, shape are related and sometimes similar concepts:
dimension
In Mathematics/Physics, dimension or dimensionality is informally defined as the minimum number of coordinates needed to specify any point within a space. But in Numpy, according to the numpy doc, it's the same as axis/axes:
In Numpy dimensions are called axes. The number of axes is rank.
In [3]: a.ndim # num of dimensions/axes, *Mathematics definition of dimension*
Out[3]: 2
axis/axes
the nth coordinate to index an array in Numpy. And multidimensional arrays can have one index per axis.
In [4]: a[1,0] # to index `a`, we specific 1 at the first axis and 0 at the second axis.
Out[4]: 3 # which results in 3 (locate at the row 1 and column 0, 0-based index)
shape
describes how many data (or the range) along each available axis.
In [5]: a.shape
Out[5]: (2, 2) # both the first and second axis have 2 (columns/rows/pages/blocks/...) data
import numpy as np
>>> np.shape(a)
(2,2)
Also works if the input is not a numpy array but a list of lists
>>> a = [[1,2],[1,2]]
>>> np.shape(a)
(2,2)
Or a tuple of tuples
>>> a = ((1,2),(1,2))
>>> np.shape(a)
(2,2)
Use .shape:
In: a = np.array([[1,2,3],[4,5,6]])
In: a.shape
Out: (2, 3)
In: a.shape[0] # x axis
Out: 2
In: a.shape[1] # y axis
Out: 3
You can use .ndim for dimension and .shape to know the exact dimension:
>>> var = np.array([[1,2,3,4,5,6], [1,2,3,4,5,6]])
>>> var.ndim
2
>>> varshape
(2, 6)
You can change the dimension using .reshape function:
>>> var_ = var.reshape(3, 4)
>>> var_.ndim
2
>>> var_.shape
(3, 4)
The shape method requires that a be a Numpy ndarray. But Numpy can also calculate the shape of iterables of pure python objects:
np.shape([[1,2],[1,2]])
a.shape is just a limited version of np.info(). Check this out:
import numpy as np
a = np.array([[1,2],[1,2]])
np.info(a)
Out
class: ndarray
shape: (2, 2)
strides: (8, 4)
itemsize: 4
aligned: True
contiguous: True
fortran: False
data pointer: 0x27509cf0560
byteorder: little
byteswap: False
type: int32
rows = a.shape[0] # 2
cols = a.shape[1] # 2
a.shape #(2,2)
a.size # rows * cols = 4
Execute below code block in python notebook.
import numpy as np
a = np.array([[1,2],[1,2]])
print(a.shape)
print(type(a.shape))
print(a.shape[0])
output
(2, 2)
<class 'tuple'>
2
then you realized that a.shape is a tuple.
so you can get any dimension's size by a.shape[index of dimention]