How to understand numpy's combined slicing and indexing example - python

I am trying to understand numpy's combined slicing and indexing concept, however I am not sure how to correctly get the below results from numpy's output (by hand so that we can understand how numpy process combined slicing and indexing, which one will be process first?):
>>> import numpy as np
>>> a=np.arange(12).reshape(3,4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> i=np.array([[0,1],[2,2]])
>>> a[i,:]
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[ 8, 9, 10, 11]]])
>>> j=np.array([[2,1],[3,3]])
>>> a[:,j]
array([[[ 2, 1],
[ 3, 3]],
[[ 6, 5],
[ 7, 7]],
[[10, 9],
[11, 11]]])
>>> aj=a[:,j]
>>> aj.shape
(3L, 2L, 2L)
I am bit confused about how aj's shape becomes (3,2,2) with the above output, any detailed explanations are very appreciated, thanks!

Whenever you use an array of indices, the result has the same shape as the indices; for example:
>>> x = np.ones(5)
>>> i = np.array([[0, 1], [1, 0]])
>>> x[i]
array([[ 1., 1.],
[ 1., 1.]])
We've indexed with a 2x2 array, and the result is a 2x2 array.
When combined with a slice, the size of the slice is preserved. For example:
>>> x = np.ones((5, 3))
>>> x[i, :].shape
(2, 2, 3)
Where the first example was a 2x2 array of items, this example is a 2x2 array of (length-3) rows.
The same is true when you switch the order of the slice:
>>> x = np.ones((5, 3))
>>> x[:, i].shape
(5, 2, 2)
This can be thought of as a list of five 2x2 arrays.
Just remember: when any dimension is indexed with a list or array, the result has the shape of the indices, not the shape of the input.

a[:,j][0] is equivalent to a[0,j] or [0, 1, 2, 3][j] which gives you [[2, 1], [3, 3]])
a[:,j][1] is equivalent to a[1,j] or [4, 5, 6, 7][j] which gives you [[6, 5], [7, 7]])
a[:,j][2] is equivalent to a[2,j] or [8, 9, 10, 11][j] which gives you [[10, 9], [11, 11]])

Related

How do I delete the end element of a subarray?

So I've created a numpy array:
import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
I'm trying to delete the end element of this array's subarray:
a[0] = (a[0])[:-1]
And encounter this issue:
a[0] = (a[0])[:-1]
ValueError: could not broadcast input array from shape (2) into shape (3)
Why can't I change it ?
How do I do it?
Given:
>>> a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
You can do:
>>> a[:,0:2]
array([[1, 2],
[4, 5],
[7, 8]])
Or:
>>> np.delete(a,2,1)
array([[1, 2],
[4, 5],
[7, 8]])
Then in either case, assign that back to a since the result is a new array.
So:
>>> a=a[:,0:2]
>>> a
array([[1, 2],
[4, 5],
[7, 8]])
If you wanted only to delete 3 in the first row, that is a different problem. You can only do that if you have have an array of python lists since the sublists are not the same length.
Example:
>>> a = np.array([[1,2],[4,5,6],[7,8,9]])
>>> a
array([list([1, 2]), list([4, 5, 6]), list([7, 8, 9])], dtype=object)
If you do that, just stick to Python. You will have lost all the speed and other advantages of Numpy.
If by 'universal' you mean the last element of each row of a N x M array, just use .shape to find the dimensions:
>>> a
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> a.shape
(3, 4)
>>> np.delete(a,a.shape[1]-1,1)
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11]])
Or,
>>> a[:,0:a.shape[1]-1]
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11]])
>>> a = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> type(a)
<class 'numpy.ndarray'>
>>> a.shape
(3, 3)
The variable a is matrix (2D array). It has certain number of rows and columns. In a matrix all the rows must be of same length. As so, in the above example, the matrix cannot be formed if the first row has length 2 and others 3. So deleting the last element of only the first(or any other subset) sub-array is not possible.
Instead you have to delete the last element of all the sub-arrays at the same time.
That can be done as
>>> a[:,0:2]
array([[1, 2],
[4, 5],
[7, 8]])
Or,
>>> np.delete(a,2,1)
array([[1, 2],
[4, 5],
[7, 8]])
This also applies to the elements of other positions. Deleting can be done of any element of the sub-arrays keeping in mind that all the sub-arrays should have same length.
However you can manipulate the last element(or any other) of any sub-array unless the shape remains constant.
>>> a[0][-1] = 19
>>> a
array([[ 1, 2, 19],
[ 4, 5, 6],
[ 7, 8, 9]])
In case you try to form a matrix with rows of unequal length, a 1D array of lists is formed on which no Numpy operations like vector processing, slicing, etc. works (the list operation works)
>>> b = np.array([[1,2,3],[1,2,3]])
>>> c = np.array([[1,2],[1,2,3]])
>>> b
array([[1, 2, 3],
[1, 2, 3]])
>>> b.shape
(2, 3)
>>> c
array([list([1, 2]), list([1, 2, 3])], dtype=object)
>>> c.shape
(2,)
>>> print(type(b),type(c))
<class 'numpy.ndarray'> <class 'numpy.ndarray'>
Both are ndarray, but you can see the second variable c has is a 1D array of lists.
>>> b+b
array([[2, 4, 6],
[2, 4, 6]])
>>> c+c
array([list([1, 2, 1, 2]), list([1, 2, 3, 1, 2, 3])], dtype=object)
Similarly, b+b operation performs the element-wise addition of b with b, but c+c performs the concatenation operation among the two lists.
For Further Ref
How to make a multidimension numpy array with a varying row size?
Here is how:
import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
a = a[:-1]
print(a)
Output:
[[1 2 3]
[4 5 6]]

Sum each row of a numpy array with all rows of second numpy array (python)

I would like to know if there is any fast way to sum each row of a first array with all rows of a second array. In this case both arrays have the same number of colulmns. For instance if array1.shape = (n,c) and array2.shape = (m,c), the resulting array would be an array3.shape = ((n*m), c)
Look at the example below:
array1 = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
array2 = np.array([[0, 1, 2],
[3, 4, 5]])
The result would be:
array3 = np.array([[0, 2, 4],
[3, 5, 7]
[3, 5, 7]
[6, 8, 10]
[6, 8, 10]
[9, 11, 13]])
The only way I see I can do this is to repeat each row of one of the arrays the number of rows of the other array. For instance, by doing np.repeat(array1, len(array2), axis=0) and then sum this array with array2. This is not very practical however if the number of rows is too big. The other way would be with a for loop but this is too slow.
Any other better way to do it..?
Thanks in advance.
Extend array1 to 3D so that it becomes broadcastable against 2D array2 and then perform broadcasted addition and a final reshape is needed for desired output -
In [30]: (array1[:,None,:] + array2).reshape(-1,array1.shape[1])
Out[30]:
array([[ 0, 2, 4],
[ 3, 5, 7],
[ 3, 5, 7],
[ 6, 8, 10],
[ 6, 8, 10],
[ 9, 11, 13]])
You could try the following inline code if you haven't already. This is the simplest and probably also the quickest on a single thread.
>>> import numpy as np
>>> array1 = np.array([[0, 1, 2],
... [3, 4, 5],
... [6, 7, 8]])
>>>
>>> array2 = np.array([[0, 1, 2],
... [3, 4, 5]])
>>> array3 = np.array([i+j for i in array1 for j in array2])
>>> array3
array([[ 0, 2, 4],
[ 3, 5, 7],
[ 3, 5, 7],
[ 6, 8, 10],
[ 6, 8, 10],
[ 9, 11, 13]])
>>>
If you are looking for speed up by treading, you could consider using CUDA or multithreading. This suggestion goes a bit out of scope of your question but gives you an idea of what can be done to speed up matrix operations.

Numpy add (append) value to each row of 2-d array

I have numpy array of floats with shape (x,14) and I would like to add to the end of each "row" one more value (to each row different value), so that end result has shape (x,15).
We can suppose that I have those values in some list, so that part of the question is also defined.
How to do it with numpy functions?
Define a 2d array and a list:
In [73]: arr = np.arange(12).reshape(4,3)
In [74]: arr
Out[74]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
In [75]: alist = [10,11,12,13]
Note their shapes:
In [76]: arr.shape
Out[76]: (4, 3)
In [77]: np.array(alist).shape
Out[77]: (4,)
To join alist to arr it needs to have the same number of dimensions, and same number of 'rows'. We can do that by adding a dimension with the None idiom:
In [78]: np.array(alist)[:,None].shape
Out[78]: (4, 1)
Now we can concatenate on the 2nd axis:
In [79]: np.concatenate((arr, np.array(alist)[:,None]),axis=1)
Out[79]:
array([[ 0, 1, 2, 10],
[ 3, 4, 5, 11],
[ 6, 7, 8, 12],
[ 9, 10, 11, 13]])
column_stack does the same thing, taking care that each input is at least 2d (I'd suggest reading its code.) In the long run you should be familiar enough with dimensions and shapes to do this with plain concatenate.
In [81]: np.column_stack((arr, alist))
Out[81]:
array([[ 0, 1, 2, 10],
[ 3, 4, 5, 11],
[ 6, 7, 8, 12],
[ 9, 10, 11, 13]])
np.c_ also does this - but note the use of [] instead of (). It's a clever use of indexing notation, convenient, but potentially confusing.
np.c_[arr, alist]
np.r_['-1,2,0', arr, alist] # for more clever obscurity
You can use numpy.insert function (https://numpy.org/doc/stable/reference/generated/numpy.insert.html)
a = np.array([[1, 1], [2, 2], [3, 3]])
np.insert(a, 2, 0, axis=1)
Output:
array([[1, 1, 0],
[2, 2, 0],
[3, 3, 0]])

Why does tensordot/reshape not agree with kron?

If I define a array X with shape (2, 2):
X = np.array([[1, 2], [3, 4]])
and take the kronecker product, then reshape the output using
np.kron(X, X).reshape((2, 2, 2, 2))
I get a resulting matrix:
array([[[[ 1, 2],
[ 2, 4]],
[[ 3, 4],
[ 6, 8]]],
[[[ 3, 6],
[ 4, 8]],
[[ 9, 12],
[12, 16]]]])
However, when I use np.tensordot(X, X, axes=0) the following matrix is output
array([[[[ 1, 2],
[ 3, 4]],
[[ 2, 4],
[ 6, 8]]],
[[[ 3, 6],
[ 9, 12]],
[[ 4, 8],
[12, 16]]]])
which is different from the first output. Why is this the case? I found this while searching for answers, however I don't understand why that solution works or how to generalise to higher dimensions.
My first question is, why do you expect them to be same?
Let's do the kron without reshaping:
In [403]: X = np.array([[1, 2],
...: [3, 4]])
...:
In [404]: np.kron(X,X)
Out[404]:
array([[ 1, 2, 2, 4],
[ 3, 4, 6, 8],
[ 3, 6, 4, 8],
[ 9, 12, 12, 16]])
It's easy to visualize the action.
[X*1, X*2
X*3, X*4]
tensordot normally is thought of as a generalization of np.dot, able to handle more complex situations than the common matrix product (i.e. sum of products on one or more axes). But here there's no summing.
In [405]: np.tensordot(X,X, axes=0)
Out[405]:
array([[[[ 1, 2],
[ 3, 4]],
[[ 2, 4],
[ 6, 8]]],
[[[ 3, 6],
[ 9, 12]],
[[ 4, 8],
[12, 16]]]])
When axes is an integer rather than a tuple, the action is a little tricky to understand. The docs say:
``axes = 0`` : tensor product :math:`a\otimes b`
I just tried to explain what is happening when axes is a scalar (it's not trivial)
How does numpy.tensordot function works step-by-step?
Specifying axes=0 is equivalent to providing this tuple:
np.tensordot(X,X, axes=([],[]))
In any case it's evident from the output that this tensordot is producing the same numbers - but the layout is different from the kron.
I can replicate the kron layout with
In [424]: np.tensordot(X,X,axes=0).transpose(0,2,1,3).reshape(4,4)
Out[424]:
array([[ 1, 2, 2, 4],
[ 3, 4, 6, 8],
[ 3, 6, 4, 8],
[ 9, 12, 12, 16]])
That is I swap the middle 2 axes.
And omitting the reshape, I get the same (2,2,2,2) you get from kron:
np.tensordot(X,X,axes=0).transpose(0,2,1,3)
I like the explicitness of np.einsum:
np.einsum('ij,kl->ijkl',X,X) # = tensordot(X,X,0)
np.einsum('ij,kl->ikjl',X,X) # = kron(X,X).reshape(2,2,2,2)
Or using broadcasting, the 2 products are:
X[:,:,None,None]*X[None,None,:,:] # tensordot 0
X[:,None,:,None]*X[None,:,None,:] # kron

How can I isolate rows in a 2d numpy matrix that match a specific criteria?

How can I isolate rows in a 2d numpy matrix that match a specific criteria? For example if I have some data and I only want to look at the rows where the 0 index has a value of 5 or less how would I retrieve those values?
I tried this approach:
import numpy as np
data = np.matrix([
[10, 8, 2],
[1, 4, 5],
[6, 5, 7],
[2, 2, 10]])
#My attempt to retrieve all rows where index 0 is less than 5
small_data = (data[:, 0] < 5)
The output is:
matrix([
[False],
[ True],
[False],
[ True]], dtype=bool)
However I'd like the output to be:
[[1, 4, 5],
[2, 2, 10]]
Another approach may be for me to loop through the matrix rows and if the 0 index is smaller than 5 append the row to a list but I am hoping there is a better way than that.
Note: I'm using Python 2.7.
First: Don't use np.matrix, use normal np.arrays.
import numpy as np
data = np.array([[10, 8, 2],
[1, 4, 5],
[6, 5, 7],
[2, 2, 10]])
Then you can always use boolean indexing (based on the boolean array you get when you do comparisons) to get the desired rows:
>>> data[data[:, 0] < 5]
array([[ 1, 4, 5],
[ 2, 2, 10]])
or integer array indexing:
>>> data[np.where(data[:, 0] < 5)]
array([[ 1, 4, 5],
[ 2, 2, 10]])
That way you got a logical array that you can use to select the desired rows.
>>> data = np.matrix([
[10, 8, 2],
[1, 4, 5],
[6, 5, 7],
[2, 2, 10]])
>>> data = np.array(data)
>>> data[(data[:, 0] < 5), :]
array([[ 1, 4, 5],
[ 2, 2, 10]])
You can also use np.squeeze to filter the rows.
>>> ind = np.squeeze(np.asarray(data [:,0]))<5
>>> data[ind,:]
array([[ 1, 4, 5],
[ 2, 2, 10]])
use the following code.
data[data[small_data,:]]
That would work

Categories

Resources