Understanding the shape of a numpy.array - python

If the array x is declared as:
x = np.array([[1, 2], [3, 4]])
the shape of x is (2, 2) because it is a 2x2 matrix.
However, for a 1-dimensional vector, such as:
x = np.array([1, 2, 3])
why does the shape of x gives (3,) and not (1,3)?
Is it my mistake to understand the shape as (row, column)?

Because np.array([1,2,3]) is one-dimensional array. (3,) means that this is single dimension with three elements.
(1,3) means that this is a two-dimensional array.
If you use reshape() method on the array, and give it arguments (1,3), additional brackets will be added to it.
>>> np.array([1,2,3]).reshape(1,3)
array([[1, 2, 3]])

As per the docs, np.array's are multidimensional containers. Consider:
np.array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]]).shape
# (2, 2, 2)
Also, np.shape always returns a tuple. In your example, (3,) is the representation of a one-element tuple, which is the correct value for the shape of a one-dimensional array.

np.array represents an n-dimensional array. This may include a 2-dimensional array to represent a matrix, for which (row, column) is appropriate. It may also include 1-dimensional, 3-dimensional or other arrays for which (row, column) are too many/few dimensions. Compare:
>>> # 1-dimensional
>>> np.array([1, 2, 3]).shape
(3,)
>>> np.array([1, 2, 3])[1]
2
>>> # 2-dimensional
>>> np.array([[1, 2, 3]]).shape
(1, 3)
>>> np.array([[1, 2, 3]])[0,1]
2
>>> np.array([[1], [2], [3]]).shape
(3, 1)
>>> np.array([[1], [2], [3]])[1, 0]
2
>>> # 3-dimensional
>>> np.array([[[1, 2, 3]]]).shape
(1, 1, 3)
>>> np.array([[[1, 2, 3]]])[0,0,1]
2
>>> np.array([[[1,2],[3,4]],[[5, 6], [7, 8]]]).shape
(2, 2, 2)
Note how the shapes (3,), (1, 3), (3, 1), (1, 1, 3), ... represent different logical layouts, as exemplified by the different position at which a specific element resides.

Related

Repeat specific row or column of Python numpy 2D array [duplicate]

I'd like to copy a numpy 2D array into a third dimension. For example, given the 2D numpy array:
import numpy as np
arr = np.array([[1, 2], [1, 2]])
# arr.shape = (2, 2)
convert it into a 3D matrix with N such copies in a new dimension. Acting on arr with N=3, the output should be:
new_arr = np.array([[[1, 2], [1,2]],
[[1, 2], [1, 2]],
[[1, 2], [1, 2]]])
# new_arr.shape = (3, 2, 2)
Probably the cleanest way is to use np.repeat:
a = np.array([[1, 2], [1, 2]])
print(a.shape)
# (2, 2)
# indexing with np.newaxis inserts a new 3rd dimension, which we then repeat the
# array along, (you can achieve the same effect by indexing with None, see below)
b = np.repeat(a[:, :, np.newaxis], 3, axis=2)
print(b.shape)
# (2, 2, 3)
print(b[:, :, 0])
# [[1 2]
# [1 2]]
print(b[:, :, 1])
# [[1 2]
# [1 2]]
print(b[:, :, 2])
# [[1 2]
# [1 2]]
Having said that, you can often avoid repeating your arrays altogether by using broadcasting. For example, let's say I wanted to add a (3,) vector:
c = np.array([1, 2, 3])
to a. I could copy the contents of a 3 times in the third dimension, then copy the contents of c twice in both the first and second dimensions, so that both of my arrays were (2, 2, 3), then compute their sum. However, it's much simpler and quicker to do this:
d = a[..., None] + c[None, None, :]
Here, a[..., None] has shape (2, 2, 1) and c[None, None, :] has shape (1, 1, 3)*. When I compute the sum, the result gets 'broadcast' out along the dimensions of size 1, giving me a result of shape (2, 2, 3):
print(d.shape)
# (2, 2, 3)
print(d[..., 0]) # a + c[0]
# [[2 3]
# [2 3]]
print(d[..., 1]) # a + c[1]
# [[3 4]
# [3 4]]
print(d[..., 2]) # a + c[2]
# [[4 5]
# [4 5]]
Broadcasting is a very powerful technique because it avoids the additional overhead involved in creating repeated copies of your input arrays in memory.
* Although I included them for clarity, the None indices into c aren't actually necessary - you could also do a[..., None] + c, i.e. broadcast a (2, 2, 1) array against a (3,) array. This is because if one of the arrays has fewer dimensions than the other then only the trailing dimensions of the two arrays need to be compatible. To give a more complicated example:
a = np.ones((6, 1, 4, 3, 1)) # 6 x 1 x 4 x 3 x 1
b = np.ones((5, 1, 3, 2)) # 5 x 1 x 3 x 2
result = a + b # 6 x 5 x 4 x 3 x 2
Another way is to use numpy.dstack. Supposing that you want to repeat the matrix a num_repeats times:
import numpy as np
b = np.dstack([a]*num_repeats)
The trick is to wrap the matrix a into a list of a single element, then using the * operator to duplicate the elements in this list num_repeats times.
For example, if:
a = np.array([[1, 2], [1, 2]])
num_repeats = 5
This repeats the array of [1 2; 1 2] 5 times in the third dimension. To verify (in IPython):
In [110]: import numpy as np
In [111]: num_repeats = 5
In [112]: a = np.array([[1, 2], [1, 2]])
In [113]: b = np.dstack([a]*num_repeats)
In [114]: b[:,:,0]
Out[114]:
array([[1, 2],
[1, 2]])
In [115]: b[:,:,1]
Out[115]:
array([[1, 2],
[1, 2]])
In [116]: b[:,:,2]
Out[116]:
array([[1, 2],
[1, 2]])
In [117]: b[:,:,3]
Out[117]:
array([[1, 2],
[1, 2]])
In [118]: b[:,:,4]
Out[118]:
array([[1, 2],
[1, 2]])
In [119]: b.shape
Out[119]: (2, 2, 5)
At the end we can see that the shape of the matrix is 2 x 2, with 5 slices in the third dimension.
Use a view and get free runtime! Extend generic n-dim arrays to n+1-dim
Introduced in NumPy 1.10.0, we can leverage numpy.broadcast_to to simply generate a 3D view into the 2D input array. The benefit would be no extra memory overhead and virtually free runtime. This would be essential in cases where the arrays are big and we are okay to work with views. Also, this would work with generic n-dim cases.
I would use the word stack in place of copy, as readers might confuse it with the copying of arrays that creates memory copies.
Stack along first axis
If we want to stack input arr along the first axis, the solution with np.broadcast_to to create 3D view would be -
np.broadcast_to(arr,(3,)+arr.shape) # N = 3 here
Stack along third/last axis
To stack input arr along the third axis, the solution to create 3D view would be -
np.broadcast_to(arr[...,None],arr.shape+(3,))
If we actually need a memory copy, we can always append .copy() there. Hence, the solutions would be -
np.broadcast_to(arr,(3,)+arr.shape).copy()
np.broadcast_to(arr[...,None],arr.shape+(3,)).copy()
Here's how the stacking works for the two cases, shown with their shape information for a sample case -
# Create a sample input array of shape (4,5)
In [55]: arr = np.random.rand(4,5)
# Stack along first axis
In [56]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[56]: (3, 4, 5)
# Stack along third axis
In [57]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[57]: (4, 5, 3)
Same solution(s) would work to extend a n-dim input to n+1-dim view output along the first and last axes. Let's explore some higher dim cases -
3D input case :
In [58]: arr = np.random.rand(4,5,6)
# Stack along first axis
In [59]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[59]: (3, 4, 5, 6)
# Stack along last axis
In [60]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[60]: (4, 5, 6, 3)
4D input case :
In [61]: arr = np.random.rand(4,5,6,7)
# Stack along first axis
In [62]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[62]: (3, 4, 5, 6, 7)
# Stack along last axis
In [63]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[63]: (4, 5, 6, 7, 3)
and so on.
Timings
Let's use a large sample 2D case and get the timings and verify output being a view.
# Sample input array
In [19]: arr = np.random.rand(1000,1000)
Let's prove that the proposed solution is a view indeed. We will use stacking along first axis (results would be very similar for stacking along the third axis) -
In [22]: np.shares_memory(arr, np.broadcast_to(arr,(3,)+arr.shape))
Out[22]: True
Let's get the timings to show that it's virtually free -
In [20]: %timeit np.broadcast_to(arr,(3,)+arr.shape)
100000 loops, best of 3: 3.56 µs per loop
In [21]: %timeit np.broadcast_to(arr,(3000,)+arr.shape)
100000 loops, best of 3: 3.51 µs per loop
Being a view, increasing N from 3 to 3000 changed nothing on timings and both are negligible on timing units. Hence, efficient both on memory and performance!
This can now also be achived using np.tile as follows:
import numpy as np
a = np.array([[1,2],[1,2]])
b = np.tile(a,(3, 1,1))
b.shape
(3,2,2)
b
array([[[1, 2],
[1, 2]],
[[1, 2],
[1, 2]],
[[1, 2],
[1, 2]]])
A=np.array([[1,2],[3,4]])
B=np.asarray([A]*N)
Edit #Mr.F, to preserve dimension order:
B=B.T
Here's a broadcasting example that does exactly what was requested.
a = np.array([[1, 2], [1, 2]])
a=a[:,:,None]
b=np.array([1]*5)[None,None,:]
Then b*a is the desired result and (b*a)[:,:,0] produces array([[1, 2],[1, 2]]), which is the original a, as does (b*a)[:,:,1], etc.
Summarizing the solutions above:
a = np.arange(9).reshape(3,-1)
b = np.repeat(a[:, :, np.newaxis], 5, axis=2)
c = np.dstack([a]*5)
d = np.tile(a, [5,1,1])
e = np.array([a]*5)
f = np.repeat(a[np.newaxis, :, :], 5, axis=0) # np.repeat again
print('b='+ str(b.shape), b[:,:,-1].tolist())
print('c='+ str(c.shape),c[:,:,-1].tolist())
print('d='+ str(d.shape),d[-1,:,:].tolist())
print('e='+ str(e.shape),e[-1,:,:].tolist())
print('f='+ str(f.shape),f[-1,:,:].tolist())
b=(3, 3, 5) [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
c=(3, 3, 5) [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
d=(5, 3, 3) [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
e=(5, 3, 3) [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
f=(5, 3, 3) [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
Good luck

Transforming a row vector into a column vector in Numpy

Let's say I have a row vector of the shape (1, 256). I want to transform it into a column vector of the shape (256, 1) instead. How would you do it in Numpy?
you can use the transpose operation to do this:
Example:
In [2]: a = np.array([[1,2], [3,4], [5,6]])
In [5]: a.shape
Out[5]: (3, 2)
In [6]: a_trans = a.T #or: np.transpose(a), a.transpose()
In [8]: a_trans.shape
Out[8]: (2, 3)
In [7]: a_trans
Out[7]:
array([[1, 3, 5],
[2, 4, 6]])
Note that the original array a will still remain unmodified. The transpose operation will just make a copy and transpose it.
If your input array is rather 1D, then you can promote the array to a column vector by introducing a new (singleton) axis as the second dimension. Below is an example:
# 1D array
In [13]: arr = np.arange(6)
# promotion to a column vector (i.e., a 2D array)
In [14]: arr = arr[..., None] #or: arr = arr[:, np.newaxis]
In [15]: arr
Out[15]:
array([[0],
[1],
[2],
[3],
[4],
[5]])
In [12]: arr.shape
Out[12]: (6, 1)
For the 1D case, yet another option would be to use numpy.atleast_2d() followed by a transpose operation, as suggested by ankostis in the comments.
In [9]: np.atleast_2d(arr).T
Out[9]:
array([[0],
[1],
[2],
[3],
[4],
[5]])
We can simply use the reshape functionality of numpy:
a=np.array([[1,2,3,4]])
a:
array([[1, 2, 3, 4]])
a.shape
(1,4)
b=a.reshape(-1,1)
b:
array([[1],
[2],
[3],
[4]])
b.shape
(4,1)
Some of the ways I have compiled to do this are:
>>> import numpy as np
>>> a = np.array([1, 2, 3], [2, 4, 5])
>>> a
array([[1, 2],
[2, 4],
[3, 5]])
Another way to do it:
>>> a.T
array([[1, 2],
[2, 4],
[3, 5]])
Another way to do this will be:
>>> a.reshape(a.shape[1], a.shape[0])
array([[1, 2],
[3, 2],
[4, 5]])
I have used a 2-dimensional array in all of these problems, the real problem arises when there is a 1-dimensional row vector which you want to columnize elegantly.
Numpy's reshape has a functionality where you pass the one of the dimension (number of rows or number of columns) you want, numpy can figure out the other dimension by itself if you pass the other dimension as -1
>>> a.reshape(-1, 1)
array([[1],
[2],
[3],
[2],
[4],
[5]])
>>> a = np.array([1, 2, 3])
>>> a.reshape(-1, 1)
array([[1],
[2],
[3]])
>>> a.reshape(2, -1)
...
ValueError: cannot reshape array of size 3 into shape (2,newaxis)
So, you can give your choice of 1-dimension without worrying about the other dimension as long as (m * n) / your_choice is an integer.
If you want to know more about this -1, head over to:
What does -1 mean in numpy reshape?
Note: All these operations return a new array and do not modify the original array.
You can use reshape() method of numpy object.
To transform any row vector to column vector, use
array.reshape(-1, 1)
To convert any column vector to row vector, use
array.reshape(1, -1)
reshape() is used to change the shape of the matrix.
So if you want to create a 2x2 matrix you can call the method like a.reshape(2, 2).
So why this -1 in the answer?
If you dont want to explicitly specify one dimension(or unknown dimension) and wants numpy to find the value for you, you can pass -1 to that dimension. So numpy will automatically calculate the the value for you from the ramaining dimensions. Keep in mind that you can not pass -1 to more than one dimension.
Thus in the first case(array.reshape(-1, 1)) the second dimension(column) is one(1) and the first(row) is unknown(-1). So numpy will figure out how to represent a 1-by-4 to x-by-1 and finds the x for you.
An alternative solutions with reshape method will be a.reshape(a.shape[1], a.shape[0]). Here you are explicitly specifying the diemsions.
Using np.newaxis can be a bit counterintuitive. But it is possible.
>>> a = np.array([1,2,3])
>>> a.shape
(3,)
>>> a[:,np.newaxis].shape
(3, 1)
>>> a[:,None]
array([[1],
[2],
[3]])
np.newaxis is equal to None internally. So you can use None.
But it is not recommended because it impairs readability
To convert a row vector into a column vector in Python can be important e.g. to use broadcasting:
import numpy as np
def colvec(rowvec):
v = np.asarray(rowvec)
return v.reshape(v.size,1)
colvec([1,2,3]) * [[1,2,3], [4,5,6], [7,8,9]]
Multiplies the first row by 1, the second row by 2 and the third row by 3:
array([[ 1, 2, 3],
[ 8, 10, 12],
[ 21, 24, 27]])
In contrast, trying to use a column vector typed as matrix:
np.asmatrix([1, 2, 3]).transpose() * [[1,2,3], [4,5,6], [7,8,9]]
fails with error ValueError: shapes (3,1) and (3,3) not aligned: 1 (dim 1) != 3 (dim 0).

how to understand empty dimension in python numpy array?

In python numpy package, I am having trouble understanding the situation where an ndarray has the 2nd dimension being empty. Here is an example:
In[1]: d2 = np.random.rand(10)
In[2]: d2.shape = (-1, 1)
In[3]: print d2.shape
In[4]: print(d2)
In[5]: print d2[::2, 0].shape
In[6]: print d2[::2, 0]
Out[3]:(10, 1)
Out[4]:
[[ 0.12362278]
[ 0.26365227]
[ 0.33939172]
[ 0.91501369]
[ 0.97008342]
[ 0.95294087]
[ 0.38906367]
[ 0.1012371 ]
[ 0.67842086]
[ 0.23711077]]
Out[5]: (5,)
Out[6]: [ 0.12362278 0.33939172 0.97008342 0.38906367 0.67842086]
My understanding is that d2 is a 10 rows by 1 column ndarray.
Out[6] is obviously a 1 by 5 array, how can the dimensions be (5,) ?
What does the empty 2nd dimension mean?
Let me just give you one example that illustrate one important difference.
d1 = np.array([1,2,3,4,5]) # array([1, 2, 3, 4, 5])
d1.shape -> (5,) # row array.
d1.size -> 5
# Note: d1.T is the same as d1.
d2 = d1[np.newaxis] # array([[1, 2, 3, 4, 5]]). Note extra []
d2.shape -> (1,5)
d2.size -> 5
# Note: d2.T will give a column array
array([[1],
[2],
[3],
[4],
[5]])
d2.T.shape -> (5,1)
I also thought ndarrays would represent even 1-d arrays as 2-d arrays with a thickness of 1. Maybe because of the name "ndarray" makes us think high dimensional, however, n can be 1, so ndarrays can just have one dimension.
Compare these
x = np.array([[1], [2], [3], [4]])
x.shape
# (4, 1)
x = np.array([[1, 2, 3, 4]])
x.shape
#(1, 4)
x = np.array([1, 2, 3, 4])
x.shape
#(4,)
and (4,) means (4).
If I reshape x and back to (4), it comes back to original
x.shape = (2,2)
x
# array([[1, 2],
# [3, 4]])
x.shape = (4)
x
# array([1, 2, 3, 4])
The main thing to understand here is that indexing with an integer is different than indexing with a slice. For example, when you index a 1d array or a list with an integer you get a scalar but when you index with a slice, you get an array or a list respectively. The same thing applies to 2d+ arrays. So for example:
# Make a 3d array:
import numpy as np
array = np.arange(60).reshape((3, 4, 5))
# Indexing with ints gives a scalar
print array[2, 3, 4] == 59
# True
# Indexing with slices gives a 3d array
print array[:2, :2, :2].shape
# (2, 2, 2)
# Indexing with a mix of slices and ints will give an array with < 3 dims
print array[0, :2, :3].shape
# (2, 3)
print array[:, 2, 0:1].shape
# (3, 1)
This can be really useful conceptually, because sometimes its great to think of an array as a collection of vectors, for example I can represent N points in space as an (N, 3) array:
n_points = np.random.random([10, 3])
point_2 = n_points[2]
print all(point_2 == n_points[2, :])
# True

Attach two 3d numpy arrays in Python

I was wondering how I can attach two 3d numpy arrays in python?
For example, I have one with shape (81,81,61) and I would like to get instead a (81,81,122) shape array by attaching the original array to itself in the z direction.
One way is to use np.dstack which concatenates the arrays along the third axis (d is for depth).
For example:
>>> a = np.arange(8).reshape(2,2,2)
>>> np.dstack((a, a))
array([[[0, 1, 0, 1],
[2, 3, 2, 3]],
[[4, 5, 4, 5],
[6, 7, 6, 7]]])
>>> np.dstack((a, a)).shape
(2, 2, 4)
You could also use np.concatenate((a, a), axis=2).

numpy array that is (n,1) and (n,)

What is the difference between a numpy array (lets say X) that has a shape of (N,1) and (N,). Aren't both of them Nx1 matrices ? The reason I ask is because sometimes computations return either one or the other.
This is a 1D array:
>>> np.array([1, 2, 3]).shape
(3,)
This array is a 2D but there is only one element in the first dimension:
>>> np.array([[1, 2, 3]]).shape
(1, 3)
Transposing gives the shape you are asking for:
>>> np.array([[1, 2, 3]]).T.shape
(3, 1)
Now, look at the array. Only the first column of this 2D array is filled.
>>> np.array([[1, 2, 3]]).T
array([[1],
[2],
[3]])
Given these two arrays:
>>> a = np.array([[1, 2, 3]])
>>> b = np.array([[1, 2, 3]]).T
>>> a
array([[1, 2, 3]])
>>> b
array([[1],
[2],
[3]])
You can take advantage of broadcasting:
>>> a * b
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
The missing numbers are filled in. Think for rows and columns in table or spreadsheet.
>>> a + b
array([[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])
Doing this with higher dimensions gets tougher on your imagination.

Categories

Resources