Understanding the syntax of numpy.r_() concatenation

Understanding the syntax of numpy.r_() concatenation - python

I read the following in the numpy documentation for the function r_:
A string integer specifies which axis to stack multiple comma
separated arrays along. A string of two comma-separated integers
allows indication of the minimum number of dimensions to force each
entry into as the second integer (the axis to concatenate along is
still the first integer).
and they give this example:
>>> np.r_['0,2', [1,2,3], [4,5,6]] # concatenate along first axis, dim>=2
array([[1, 2, 3],
[4, 5, 6]])
I don't follow, what does exactly the string '0,2' instruct numpy to do?
Other than the link above, is there another site with more documentation about this function?

'n,m' tells r_ to concatenate along axis=n, and produce a shape with at least m dimensions:
In [28]: np.r_['0,2', [1,2,3], [4,5,6]]
Out[28]:
array([[1, 2, 3],
[4, 5, 6]])
So we are concatenating along axis=0, and we would normally therefore expect the result to have shape (6,), but since m=2, we are telling r_ that the shape must be at least 2-dimensional. So instead we get shape (2,3):
In [32]: np.r_['0,2', [1,2,3,], [4,5,6]].shape
Out[32]: (2, 3)
Look at what happens when we increase m:
In [36]: np.r_['0,3', [1,2,3,], [4,5,6]].shape
Out[36]: (2, 1, 3) # <- 3 dimensions
In [37]: np.r_['0,4', [1,2,3,], [4,5,6]].shape
Out[37]: (2, 1, 1, 3) # <- 4 dimensions
Anything you can do with r_ can also be done with one of the more readable array-building functions such as np.concatenate, np.row_stack, np.column_stack, np.hstack, np.vstack or np.dstack, though it may also require a call to reshape.
Even with the call to reshape, those other functions may even be faster:
In [38]: %timeit np.r_['0,4', [1,2,3,], [4,5,6]]
10000 loops, best of 3: 38 us per loop
In [43]: %timeit np.concatenate(([1,2,3,], [4,5,6])).reshape(2,1,1,3)
100000 loops, best of 3: 10.2 us per loop

The paragraph that you've highlighted is the two comma-separated integers syntax which is a special case of the three comma-separated syntax. Once you understand the three comma-separated syntax the two comma-separated syntax falls into place.
The equivalent three comma-separated integers syntax for your example would be:
np.r_['0,2,-1', [1,2,3], [4,5,6]]
In order to provide a better explanation I will change the above to:
np.r_['0,2,-1', [1,2,3], [[4,5,6]]]
The above has two parts:
A comma-separated integer string
Two comma-separated arrays
The comma-separated arrays have the following shapes:
np.array([1,2,3]).shape
(3,)
np.array([[4,5,6]]).shape
(1, 3)
In other words the first 'array' is '1-dimensional' while the second 'array' is '2-dimensional'.
First the 2 in 0,2,-1 means that each array should be upgraded so that it's forced to be at least 2-dimensional. Since the second array is already 2-dimensional it is not affected. However the first array is 1-dimensional and in order to make it 2-dimensional np.r_ needs to add a 1 to its shape tuple to make it either (1,3) or (3,1). That is where the -1 in 0,2,-1 comes into play. It basically decides where the extra 1 needs to be placed in the shape tuple of the array. -1 is the default and places the 1 (or 1s if more dimensions are required) in the front of the shape tuple (I explain why further below). This turns the first array's shape tuple into (1,3) which is the same as the second array's shape tuple. The 0 in 0,2,-1 means that the resulting arrays need to be concatenated along the '0' axis.
Since both arrays now have a shape tuple of (1,3) concatenation is possible because if you set aside the concatenation axis (dimension 0 in the above example which has a value of 1) in both arrays the remaining dimensions are equal (in this case the value of the remaining dimension in both arrays is 3). If this was not the case then the following error would be produced:
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Now if you concatenate two arrays having the shape (1,3) the resulting array will have shape (1+1,3) == (2,3) and therefore:
np.r_['0,2,-1', [1,2,3], [[4,5,6]]].shape
(2, 3)
When a 0 or a positive integer is used for the third integer in the comma-separated string, that integer determines the start of each array's shape tuple in the upgraded shape tuple (only for those arrays which need to have their dimensions upgraded). For example 0,2,0 means that for arrays requiring a shape upgrade the array's original shape tuple should start at dimension 0 of the upgraded shape tuple. For array [1,2,3] which has a shape tuple (3,) the 1 would be placed after the 3. This would result in a shape tuple equal to (3,1) and as you can see the original shape tuple (3,) starts at dimension 0 of the upgraded shape tuple. 0,2,1 would mean that for [1,2,3] the array's shape tuple (3,) should start at dimension 1 of the upgraded shape tuple. This means that the 1 needs to be placed at dimension 0. The resulting shape tuple would be (1,3).
When a negative number is used for the third integer in the comma-separated string, the integer following the negative sign determines where original shape tuple should end. When the original shape tuple is (3,) 0,2,-1 means that the original shape tuple should end at the last dimension of the upgraded shape tuple and therefore the 1 would be placed at dimension 0 of the upgraded shape tuple and the upgraded shape tuple would be (1,3). Now (3,) ends at dimension 1 of the upgraded shape tuple which is also the last dimension of the upgraded shape tuple ( original array is [1,2,3] and upgraded array is [[1,2,3]]).
np.r_['0,2', [1,2,3], [4,5,6]]
Is the same as
np.r_['0,2,-1', [1,2,3], [4,5,6]]
Finally here's an example with more dimensions:
np.r_['2,4,1',[[1,2],[4,5],[10,11]],[7,8,9]].shape
(1, 3, 3, 1)
The comma-separated arrays are:
[[1,2],[4,5],[10,11]] which has shape tuple (3,2)
[7,8,9] which has shape tuple (3,)
Both of the arrays need to be upgraded to 4-dimensional arrays. The original array's shape tuples need to start from dimension 1.
Therefore for the first array the shape becomes (1,3,2,1) as 3,2 starts at dimension 1 and because two 1s need to be added to make it 4-dimensional one 1 is placed before the original shape tuple and one 1 after.
Using the same logic the second array's shape tuple becomes (1,3,1,1).
Now the two arrays need to be concatenated using dimension 2 as the concatenation axis. Eliminating dimension 2 from each array's upgraded shape tuple result in the tuple (1,3,1) for both arrays. As the resulting tuples are identical the arrays can be concatenated and the concatenated axis are summed up to produce (1, 3, 2+1, 1) == (1, 3, 3, 1).

The string '0,2' tells numpy to concatenate along axis 0 (the first axis) and to wrap the elements in enough brackets to ensure a two-dimensional array. Consider the following results:
for axis in (0,1):
for minDim in (1,2,3):
print np.r_['{},{}'.format(axis, minDim), [1,2,30, 31], [4,5,6, 61], [7,8,90, 91], [10,11, 12, 13]], 'axis={}, minDim={}\n'.format(axis, minDim)
[ 1 2 30 31 4 5 6 61 7 8 90 91 10 11 12 13] axis=0, minDim=1
[[ 1 2 30 31]
[ 4 5 6 61]
[ 7 8 90 91]
[10 11 12 13]] axis=0, minDim=2
[[[ 1 2 30 31]]
[[ 4 5 6 61]]
[[ 7 8 90 91]]
[[10 11 12 13]]] axis=0, minDim=3
[ 1 2 30 31 4 5 6 61 7 8 90 91 10 11 12 13] axis=1, minDim=1
[[ 1 2 30 31 4 5 6 61 7 8 90 91 10 11 12 13]] axis=1, minDim=2
[[[ 1 2 30 31]
[ 4 5 6 61]
[ 7 8 90 91]
[10 11 12 13]]] axis=1, minDim=3

Related

Flatten Numpy 3D Array to 2D

Suppose I have a 3D numpy array with shape (10, 20, 3), representing an image with 10 rows and 20 columns, where the 3rd dimension contains an array of length 3 of either all zeros, or all ones, for example [0 0 0] or [1 1 1].
What numpy method would be most suitable to convert the 3D array to 2D, where the third dimension has been reduced to a single value of either 0 or 1, depending on whether the array was previously [0 0 0] or [1 1 1]?
The new shape should be (10, 20), where the value of each cell is either 0 or 1. So instead of the third dimension being an array, it becomes a integer.
I have had a look at reshape, and also flatten, however it looks like both of these methods maintain the same total number of 'cells' (i.e. 10 x 20 x 3 = 600). What I want is to reduce one dimension down to a single value so that the total number of cells is now 10 x 20 = 200.

Python NumPy shape

Can someone please explain this code to me
case = np.array([[1,2], [2,4], [3,5]])
I understand the above gives 2 columns and 3 rows.
But the code below I don't understand. Please help me to understand it.
np.arange(0, case.shape[0]+4)

np.arange() returns evenly spaced values within a given interval.
In this case, since case.shape[0] is the first axis of the array, which has 3 arrays in it, the range goes from 0 to 3+4=7 (end not included).

case = np.array([[1,2], [2,4], [3,5]])
case
array([[1, 2],
[2, 4],
[3, 5]])
Numpy.arange will provide a series of numbers starts from 0 to case.shape[0] +4 . Here case.shape is (3,2) (Three Rows and Two Columns) . So Case[0] will be 3 and case[1] will be 2 . So np.arrange will be a series of numbers from 0 to 3+4 = 7 where 0 is included and 7 is excluded and output will be 0,1,2,3,4,5,6

Why is slicing using "colon and comma" different than using a collection of indexes

Why is slicing using "colon and comma" different than using a collection of indexes?
Here is an example of what I expected to yield the same result but but it does not:
import numpy as np
a = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
print(a[[0,1],[0,1]])
# Output
# [[ 1 2 3]
# [10 11 12]]
print(a[:,[0,1]])
# Output
# [[[ 1 2 3]
# [ 4 5 6]]
# [[ 7 8 9]
# [10 11 12]]]
Why are they not equivalent?

In the first case, you are indexing the array a with 2 lists of the same length, which would be equivalent to indexing with 2 arrays of the same shape (see numpy docs on arrays as indices).
Therefore, the output is a[0,0] (which is the same as a[0,0,:]) and a[1,1], the elementwise combinations of the index array. This is expected to return an array of shape 2,3. 2 because it is the length of the index array, and 3 because it is the axis that is not indexed.
In the second case however, the result is a[:,0] (equivalent to a[:,0,:]) and a[:,1]. Thus, here the expected result is an array with the first and third dimensions equivalent to the original array, and the second dimension equal to 2, the length of the index array (which here is the same as the original size of the second axis).
To show clearly that these two operations are clearly not the same, we can try to assume equivalence between : and a range of the same length as the axis to the third axis, which will result in:
print(a[[0,1],[0,1],[0,1,2]])
IndexError Traceback (most recent call last)
<ipython-input-8-110de8f5f6d8> in <module>()
----> 1 print(a[[0,1],[0,1],[0,1,2]])
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (2,) (3,)
That is because there is no elementwise combination of the index arrays possible. Opposite to that, a[:,:,:] would return the whole array, and a[[0,1],[0,1],[0,2]] returns [ 1 12] which as expected is an array of one dimension with length 2, like the index array.

1-D arrays in NumPy

As far as I know 1-D arrays are those arrays which either have just 1 column and any number of rows or vice versa.
If I run this code:
import numpy as np
a = np.arange(10).reshape(1,10)
b = np.arange(10).reshape(10,1)
print(a.ndim, b.ndim)
It returns that both are 2-D arrays.
Why? I know the computer is working fine. But can you please tell me what is a 1-D array.

A 1-D array is an array with just a single dimension. There are no columns or rows. It has a number of values in a line like say a=[1,2,3,4,5,6]. The very concept of two separate dimensions row and columns do not apply to a 1-D array. Hence when you defined your first array with .reshape(1,10), you gave it the dimensions- 1 and 10. Thus, you actually defined a 2-D array of dimension 1x10.
If you execute this code-
import numpy as np
a = np.arange(10).reshape(1,10)
b = np.arange(10).reshape(10,1)
print(a.ndim, b.ndim)
print(a)
print(b)
You will get this output-
2 2
[[0 1 2 3 4 5 6 7 8 9]]
[[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]]
Which clearly shows that the array a has 2 dimensions- a row and a column, and hence is a 2-D array.

This .reshape(10,1) reshapes the array to a 2-d array with 10 rows and 1 column. However, if using .reshape(10) instead you will get a 1-d array.

The problem is the reshape, you say reshape(1,10). This means, reshape the array in a 2d matrix with 1 row and 10 columns. What you want is a 1d array so you need reshape(10)

Do numpy 1D arrays follow row/column rules?

I have just started using numpy and I am getting confused about how to use arrays. I have seen several Stack Overflow answers on numpy arrays but they all deal with how to get the desired result (I know how to do this, I just don't know why I need to do it this way). The consensus that I've seen is that arrays are better than matrices because they are a more basic class and less restrictive. I understand you can transpose an array which to me means there is a distinction between a row and a column, but the multiplication rules all produce the wrong outputs (compared to what I am expecting).
Here is the test code I have written along with the outputs:
a = numpy.array([1,2,3,4])
print(a)
>>> [1 2 3 4]
print(a.T) # Transpose
>>> [1 2 3 4] # No apparent affect
b = numpy.array( [ [1], [2], [3], [4] ] )
print(b)
>>> [[1]
[2]
[3]
[4]] # Column (Expected)
print(b.T)
>>> [[1 2 3 4]] # Row (Expected, transpose seems to work here)
print((b.T).T)
>>> [[1]
[2]
[3]
[4]] # Column (All of these are as expected,
# unlike for declaring the array as a row vector)
# The following are element wise multiplications of a
print(a*a)
>>> [ 1 4 9 16]
print(a * a.T) # Row*Column
>>> [ 1 4 9 16] # Inner product scalar result expected
print(a.T * a) # Column*Row
>>> [ 1 4 9 16] # Outer product matrix result expected
print(b*b)
>>> [[1]
[4]
[9]
[16]] # Expected result, element wise multiplication in a column
print(b * b.T) # Column * Row (Outer product)
>>> [[ 1 2 3 4]
[ 2 4 6 8]
[ 3 6 9 12]
[ 4 8 12 16]] # Expected matrix result
print(b.T * (b.T)) # Column * Column (Doesn't make much sense so I expected elementwise multiplication
>>> [[ 1 4 9 16]]
print(b.T * (b.T).T) # Row * Column, inner product expected
>>> [[ 1 2 3 4]
[ 2 4 6 8]
[ 3 6 9 12]
[ 4 8 12 16]] # Outer product result
I know that I can use numpy.inner() and numpy.outer() to achieve the affect (that is not a problem), I just want to know if I need to keep track of whether my vectors are rows or columns.
I also know that I can create a 1D matrix to represent my vectors and the multiplication works as expected. I'm trying to work out the best way to store my data so that when I look at my code it is clear what is going to happen - right now the maths just looks confusing and wrong.
I only need to use 1D and 2D tensors for my application.

I'll try annotating your code
a = numpy.array([1,2,3,4])
print(a)
>>> [1 2 3 4]
print(a.T) # Transpose
>>> [1 2 3 4] # No apparent affect
a.shape will show (4,). a.T.shape is the same. It kept the same number of dimensions, and performed the only meaningful transpose - no change. Making it (4,1) would have added a dimension, and destroyed the A.T.T roundtrip.
b = numpy.array( [ [1], [2], [3], [4] ] )
print(b)
>>> [[1]
[2]
[3]
[4]] # Column (Expected)
print(b.T)
>>> [[1 2 3 4]] # Row (Expected, transpose seems to work here)
b.shape is (4,1), b.T.shape is (1,4). Note the extra set of []. If you'd created a as a = numpy.array([[1,2,3,4]]) its shape too would have been (1,4).
The easy way to make b would be b=np.array([[1,2,3,4]]).T (or b=np.array([1,2,3,4])[:,None] or b=np.array([1,2,3,4]).reshape(-1,1))
Compare this to MATLAB
octave:3> a=[1,2,3,4]
a =
1 2 3 4
octave:4> size(a)
ans =
1 4
octave:5> size(a.')
ans =
4 1
Even without the extra [] it has initialed the matrix as 2d.
numpy has a matrix class that imitates MATLAB - back in the time when MATLAB allowed only 2d.
In [75]: m=np.matrix('1 2 3 4')
In [76]: m
Out[76]: matrix([[1, 2, 3, 4]])
In [77]: m.shape
Out[77]: (1, 4)
In [78]: m=np.matrix('1 2; 3 4')
In [79]: m
Out[79]:
matrix([[1, 2],
[3, 4]])
I don't recommend using np.matrix unless it really adds something useful to your code.
Note the MATLAB talks of vectors, but they are really just their matrix with only one non-unitary dimension.
# The following are element wise multiplications of a
print(a*a)
>>> [ 1 4 9 16]
print(a * a.T) # Row*Column
>>> [ 1 4 9 16] # Inner product scalar result expected
This behavior follows from a.T == A. As you noted, * produces element by element multiplication. This is equivalent to the MATLAB .*. np.dot(a,a) gives the dot or matrix product of 2 arrays.
print(a.T * a) # Column*Row
>>> [ 1 4 9 16] # Outer product matrix result expected
No, it is still doing elementwise multiplication.
I'd use broadcasting, a[:,None]*a[None,:] to get the outer product. Octave added this in imitation of numpy; I don't know if MATLAB has it yet.
In the following * is always element by element multiplication. It's broadcasting that produces matrix/outer product results.
print(b*b)
>>> [[1]
[4]
[9]
[16]] # Expected result, element wise multiplication in a column
A (4,1) * (4,1)=>(4,1). Same shapes all around.
print(b * b.T) # Column * Row (Outer product)
>>> [[ 1 2 3 4]
[ 2 4 6 8]
[ 3 6 9 12]
[ 4 8 12 16]] # Expected matrix result
Here (4,1)*(1,4)=>(4,4) product. The 2 size 1 dimensions have been replicated so it becomes, effectively a (4,4)*(4,4). How would you do replicate this in MATLAB - with .*?
print(b.T * (b.T)) # Column * Column (Doesn't make much sense so I expected elementwise multiplication
>>> [[ 1 4 9 16]]
* is elementwise regardless of expectations. Think b' .* b' in MATLAB.
print(b.T * (b.T).T) # Row * Column, inner product expected
>>> [[ 1 2 3 4]
[ 2 4 6 8]
[ 3 6 9 12]
[ 4 8 12 16]] # Outer product result
Again * is elementwise; inner requires a summation in addition to multiplication. Here broadcasting again applies (1,4)*(4,1)=>(4,4).
np.dot(b,b) or np.trace(b.T*b) or np.sum(b*b) give 30.
When I worked in MATLAB I frequently checked the size, and created test matrices that would catch dimension mismatches (e.g. a 2x3 instead of a 2x2 matrix). I continue to do that in numpy.
The key things are:
numpy arrays may be 1d (or even 0d)
A (4,) array is not exactly the same as a (4,1) or (1,4)`.
* is elementwise - always.
broadcasting usually accounts for outer like behavior

"Transposing" is, from a numpy perspective, really only a meaningful concept for two-dimensional structures:
>>> import numpy
>>> arr = numpy.array([1,2,3,4])
>>> arr.shape
(4,)
>>> arr.transpose().shape
(4,)
So, if you want to transpose something, you'll have to make it two-dimensional:
>>> arr_2d = arr.reshape((4,1)) ## four rows, one column -> two-dimensional
>>> arr_2d.shape
(4, 1)
>>> arr_2d.transpose().shape
(1, 4)
Also, numpy.array(iterable, **kwargs) has a key word argument ndmin, which will, set to ndmin=2 prepend your desired shape with as many 1 as necessary:
>>> arr_ndmin = numpy.array([1,2,3,4],ndmin=2)
>>> arr_ndmin.shape
(1, 4)

Yes, they do.
Your question is already answered. Though I assume you are a Matlab user? If so, you may find this guide useful: Moving from MATLAB matrices to NumPy arrays

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Understanding the syntax of numpy.r_() concatenation - python

Related

Flatten Numpy 3D Array to 2D

Python NumPy shape

Why is slicing using "colon and comma" different than using a collection of indexes

1-D arrays in NumPy

Do numpy 1D arrays follow row/column rules?

Categories

Resources