1-D arrays in NumPy - python

As far as I know 1-D arrays are those arrays which either have just 1 column and any number of rows or vice versa.
If I run this code:
import numpy as np
a = np.arange(10).reshape(1,10)
b = np.arange(10).reshape(10,1)
print(a.ndim, b.ndim)
It returns that both are 2-D arrays.
Why? I know the computer is working fine. But can you please tell me what is a 1-D array.

A 1-D array is an array with just a single dimension. There are no columns or rows. It has a number of values in a line like say a=[1,2,3,4,5,6]. The very concept of two separate dimensions row and columns do not apply to a 1-D array. Hence when you defined your first array with .reshape(1,10), you gave it the dimensions- 1 and 10. Thus, you actually defined a 2-D array of dimension 1x10.
If you execute this code-
import numpy as np
a = np.arange(10).reshape(1,10)
b = np.arange(10).reshape(10,1)
print(a.ndim, b.ndim)
print(a)
print(b)
You will get this output-
2 2
[[0 1 2 3 4 5 6 7 8 9]]
[[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]]
Which clearly shows that the array a has 2 dimensions- a row and a column, and hence is a 2-D array.

This .reshape(10,1) reshapes the array to a 2-d array with 10 rows and 1 column. However, if using .reshape(10) instead you will get a 1-d array.

The problem is the reshape, you say reshape(1,10). This means, reshape the array in a 2d matrix with 1 row and 10 columns. What you want is a 1d array so you need reshape(10)

Related

IndexIng 2d array with 2d matrix specified row and column

I have a 6×6 matrix A, i'm trying to indexing the matrix A using two 2×2 matrices B and C. Each row of B and C specify a pair of indices for row and column in A. In detail, each row of B will specify the row needed to be indexed and each row of C specify the column.
For example,
A = np.arange(0,36).reshape(6,6)
B = np.array([[0,1],
[2,4]])
C = np.array([[1,2],
[3,4]])
I need to get a 2×2×2 matrix like this:
results =
[[[ 1 2]
[7 8]]
[[15 16]
[27 28]]]
example of indexing
If just get one matrix using index like B=[0,1] and C=[1,2], it can be done with:
d = A[B,:]
results = d[:,C]
But things different when I need to get two 2×2 matrices (2×2×2), and each matrix is index using each row of B and C.
p.s. Please change the title of this question if you can think of a more precise one.

Flatten Numpy 3D Array to 2D

Suppose I have a 3D numpy array with shape (10, 20, 3), representing an image with 10 rows and 20 columns, where the 3rd dimension contains an array of length 3 of either all zeros, or all ones, for example [0 0 0] or [1 1 1].
What numpy method would be most suitable to convert the 3D array to 2D, where the third dimension has been reduced to a single value of either 0 or 1, depending on whether the array was previously [0 0 0] or [1 1 1]?
The new shape should be (10, 20), where the value of each cell is either 0 or 1. So instead of the third dimension being an array, it becomes a integer.
I have had a look at reshape, and also flatten, however it looks like both of these methods maintain the same total number of 'cells' (i.e. 10 x 20 x 3 = 600). What I want is to reduce one dimension down to a single value so that the total number of cells is now 10 x 20 = 200.

Why is slicing using "colon and comma" different than using a collection of indexes

Why is slicing using "colon and comma" different than using a collection of indexes?
Here is an example of what I expected to yield the same result but but it does not:
import numpy as np
a = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
print(a[[0,1],[0,1]])
# Output
# [[ 1 2 3]
# [10 11 12]]
print(a[:,[0,1]])
# Output
# [[[ 1 2 3]
# [ 4 5 6]]
# [[ 7 8 9]
# [10 11 12]]]
Why are they not equivalent?
In the first case, you are indexing the array a with 2 lists of the same length, which would be equivalent to indexing with 2 arrays of the same shape (see numpy docs on arrays as indices).
Therefore, the output is a[0,0] (which is the same as a[0,0,:]) and a[1,1], the elementwise combinations of the index array. This is expected to return an array of shape 2,3. 2 because it is the length of the index array, and 3 because it is the axis that is not indexed.
In the second case however, the result is a[:,0] (equivalent to a[:,0,:]) and a[:,1]. Thus, here the expected result is an array with the first and third dimensions equivalent to the original array, and the second dimension equal to 2, the length of the index array (which here is the same as the original size of the second axis).
To show clearly that these two operations are clearly not the same, we can try to assume equivalence between : and a range of the same length as the axis to the third axis, which will result in:
print(a[[0,1],[0,1],[0,1,2]])
IndexError Traceback (most recent call last)
<ipython-input-8-110de8f5f6d8> in <module>()
----> 1 print(a[[0,1],[0,1],[0,1,2]])
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (2,) (3,)
That is because there is no elementwise combination of the index arrays possible. Opposite to that, a[:,:,:] would return the whole array, and a[[0,1],[0,1],[0,2]] returns [ 1 12] which as expected is an array of one dimension with length 2, like the index array.

Perform function on multiple columns in python

I have a data array of 30 trials(columns) each of 256 data points (rows) and would like to run a wavelet transform (which requires a 1D array) on each column with the eventual aim of obtaining the mean coefficients of the 30 trials.
Can someone point me in the right direction please?
If you have a multidimensional numpy array then you can use a for loop:
import numpy as np
A = np.array([[1,2,3], [4,5,6]])
# A is the matrix: 1 2 3
# 4 5 6
for col in A.transpose():
print("Column:", col)
# Perform your wavelet transform here, you can save the
# results to another multidimensional array.
This gives you access to each column as a 1D array.
Output:
Column: [1 4]
Column: [2 5]
Column: [3 6]
If you want to access the rows rather than the columns then loop through A rather than A.transpose().

Understanding the syntax of numpy.r_() concatenation

I read the following in the numpy documentation for the function r_:
A string integer specifies which axis to stack multiple comma
separated arrays along. A string of two comma-separated integers
allows indication of the minimum number of dimensions to force each
entry into as the second integer (the axis to concatenate along is
still the first integer).
and they give this example:
>>> np.r_['0,2', [1,2,3], [4,5,6]] # concatenate along first axis, dim>=2
array([[1, 2, 3],
[4, 5, 6]])
I don't follow, what does exactly the string '0,2' instruct numpy to do?
Other than the link above, is there another site with more documentation about this function?
'n,m' tells r_ to concatenate along axis=n, and produce a shape with at least m dimensions:
In [28]: np.r_['0,2', [1,2,3], [4,5,6]]
Out[28]:
array([[1, 2, 3],
[4, 5, 6]])
So we are concatenating along axis=0, and we would normally therefore expect the result to have shape (6,), but since m=2, we are telling r_ that the shape must be at least 2-dimensional. So instead we get shape (2,3):
In [32]: np.r_['0,2', [1,2,3,], [4,5,6]].shape
Out[32]: (2, 3)
Look at what happens when we increase m:
In [36]: np.r_['0,3', [1,2,3,], [4,5,6]].shape
Out[36]: (2, 1, 3) # <- 3 dimensions
In [37]: np.r_['0,4', [1,2,3,], [4,5,6]].shape
Out[37]: (2, 1, 1, 3) # <- 4 dimensions
Anything you can do with r_ can also be done with one of the more readable array-building functions such as np.concatenate, np.row_stack, np.column_stack, np.hstack, np.vstack or np.dstack, though it may also require a call to reshape.
Even with the call to reshape, those other functions may even be faster:
In [38]: %timeit np.r_['0,4', [1,2,3,], [4,5,6]]
10000 loops, best of 3: 38 us per loop
In [43]: %timeit np.concatenate(([1,2,3,], [4,5,6])).reshape(2,1,1,3)
100000 loops, best of 3: 10.2 us per loop
The paragraph that you've highlighted is the two comma-separated integers syntax which is a special case of the three comma-separated syntax. Once you understand the three comma-separated syntax the two comma-separated syntax falls into place.
The equivalent three comma-separated integers syntax for your example would be:
np.r_['0,2,-1', [1,2,3], [4,5,6]]
In order to provide a better explanation I will change the above to:
np.r_['0,2,-1', [1,2,3], [[4,5,6]]]
The above has two parts:
A comma-separated integer string
Two comma-separated arrays
The comma-separated arrays have the following shapes:
np.array([1,2,3]).shape
(3,)
np.array([[4,5,6]]).shape
(1, 3)
In other words the first 'array' is '1-dimensional' while the second 'array' is '2-dimensional'.
First the 2 in 0,2,-1 means that each array should be upgraded so that it's forced to be at least 2-dimensional. Since the second array is already 2-dimensional it is not affected. However the first array is 1-dimensional and in order to make it 2-dimensional np.r_ needs to add a 1 to its shape tuple to make it either (1,3) or (3,1). That is where the -1 in 0,2,-1 comes into play. It basically decides where the extra 1 needs to be placed in the shape tuple of the array. -1 is the default and places the 1 (or 1s if more dimensions are required) in the front of the shape tuple (I explain why further below). This turns the first array's shape tuple into (1,3) which is the same as the second array's shape tuple. The 0 in 0,2,-1 means that the resulting arrays need to be concatenated along the '0' axis.
Since both arrays now have a shape tuple of (1,3) concatenation is possible because if you set aside the concatenation axis (dimension 0 in the above example which has a value of 1) in both arrays the remaining dimensions are equal (in this case the value of the remaining dimension in both arrays is 3). If this was not the case then the following error would be produced:
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Now if you concatenate two arrays having the shape (1,3) the resulting array will have shape (1+1,3) == (2,3) and therefore:
np.r_['0,2,-1', [1,2,3], [[4,5,6]]].shape
(2, 3)
When a 0 or a positive integer is used for the third integer in the comma-separated string, that integer determines the start of each array's shape tuple in the upgraded shape tuple (only for those arrays which need to have their dimensions upgraded). For example 0,2,0 means that for arrays requiring a shape upgrade the array's original shape tuple should start at dimension 0 of the upgraded shape tuple. For array [1,2,3] which has a shape tuple (3,) the 1 would be placed after the 3. This would result in a shape tuple equal to (3,1) and as you can see the original shape tuple (3,) starts at dimension 0 of the upgraded shape tuple. 0,2,1 would mean that for [1,2,3] the array's shape tuple (3,) should start at dimension 1 of the upgraded shape tuple. This means that the 1 needs to be placed at dimension 0. The resulting shape tuple would be (1,3).
When a negative number is used for the third integer in the comma-separated string, the integer following the negative sign determines where original shape tuple should end. When the original shape tuple is (3,) 0,2,-1 means that the original shape tuple should end at the last dimension of the upgraded shape tuple and therefore the 1 would be placed at dimension 0 of the upgraded shape tuple and the upgraded shape tuple would be (1,3). Now (3,) ends at dimension 1 of the upgraded shape tuple which is also the last dimension of the upgraded shape tuple ( original array is [1,2,3] and upgraded array is [[1,2,3]]).
np.r_['0,2', [1,2,3], [4,5,6]]
Is the same as
np.r_['0,2,-1', [1,2,3], [4,5,6]]
Finally here's an example with more dimensions:
np.r_['2,4,1',[[1,2],[4,5],[10,11]],[7,8,9]].shape
(1, 3, 3, 1)
The comma-separated arrays are:
[[1,2],[4,5],[10,11]] which has shape tuple (3,2)
[7,8,9] which has shape tuple (3,)
Both of the arrays need to be upgraded to 4-dimensional arrays. The original array's shape tuples need to start from dimension 1.
Therefore for the first array the shape becomes (1,3,2,1) as 3,2 starts at dimension 1 and because two 1s need to be added to make it 4-dimensional one 1 is placed before the original shape tuple and one 1 after.
Using the same logic the second array's shape tuple becomes (1,3,1,1).
Now the two arrays need to be concatenated using dimension 2 as the concatenation axis. Eliminating dimension 2 from each array's upgraded shape tuple result in the tuple (1,3,1) for both arrays. As the resulting tuples are identical the arrays can be concatenated and the concatenated axis are summed up to produce (1, 3, 2+1, 1) == (1, 3, 3, 1).
The string '0,2' tells numpy to concatenate along axis 0 (the first axis) and to wrap the elements in enough brackets to ensure a two-dimensional array. Consider the following results:
for axis in (0,1):
for minDim in (1,2,3):
print np.r_['{},{}'.format(axis, minDim), [1,2,30, 31], [4,5,6, 61], [7,8,90, 91], [10,11, 12, 13]], 'axis={}, minDim={}\n'.format(axis, minDim)
[ 1 2 30 31 4 5 6 61 7 8 90 91 10 11 12 13] axis=0, minDim=1
[[ 1 2 30 31]
[ 4 5 6 61]
[ 7 8 90 91]
[10 11 12 13]] axis=0, minDim=2
[[[ 1 2 30 31]]
[[ 4 5 6 61]]
[[ 7 8 90 91]]
[[10 11 12 13]]] axis=0, minDim=3
[ 1 2 30 31 4 5 6 61 7 8 90 91 10 11 12 13] axis=1, minDim=1
[[ 1 2 30 31 4 5 6 61 7 8 90 91 10 11 12 13]] axis=1, minDim=2
[[[ 1 2 30 31]
[ 4 5 6 61]
[ 7 8 90 91]
[10 11 12 13]]] axis=1, minDim=3

Categories

Resources