Unexpected result from Numpy Matrix insert, How does this work? - python

My goal was to insert a column to the right on a numpy matrix. However, I found that the code I was using is putting in two columns rather than just one.
# This one results in a 4x1 matrix, as expected
np.insert(np.matrix([[0],[0]]), 1, np.matrix([[0],[0]]), 0)
>>>matrix([[0],
[0],
[0],
[0]])
# I would expect this line to return a 2x2 matrix, but it returns a 2x3 matrix instead.
np.insert(np.matrix([[0],[0]]), 1, np.matrix([[0],[0]]), 1)
>>>matrix([[0, 0, 0],
[0, 0, 0]]
Why do I get the above, in the second example, instead of [[0,0], [0,0]]?

While new use of np.matrix is discouraged, we get the same result with np.array:
In [41]: np.insert(np.array([[1],[2]]),1, np.array([[10],[20]]), 0)
Out[41]:
array([[ 1],
[10],
[20],
[ 2]])
In [42]: np.insert(np.array([[1],[2]]),1, np.array([[10],[20]]), 1)
Out[42]:
array([[ 1, 10, 20],
[ 2, 10, 20]])
In [44]: np.insert(np.array([[1],[2]]),1, np.array([10,20]), 1)
Out[44]:
array([[ 1, 10],
[ 2, 20]])
Insert as [1]:
In [46]: np.insert(np.array([[1],[2]]),[1], np.array([[10],[20]]), 1)
Out[46]:
array([[ 1, 10],
[ 2, 20]])
In [47]: np.insert(np.array([[1],[2]]),[1], np.array([10,20]), 1)
Out[47]:
array([[ 1, 10, 20],
[ 2, 10, 20]])
np.insert is a complex function written in Python. So we need to look at that code, and see how values are being mapped on the target space.
The docs elaborate on the difference between insert at 1 and [1]. But off hand I don't see an explanation of how the shape of values matters.
Difference between sequence and scalars:
>>> np.insert(a, [1], [[1],[2],[3]], axis=1)
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
>>> np.array_equal(np.insert(a, 1, [1, 2, 3], axis=1),
... np.insert(a, [1], [[1],[2],[3]], axis=1))
True
When adding an array at the end of another, I'd use concatenate (or one of its stack variants) rather than insert. None of these operate in-place.
In [48]: np.concatenate([np.array([[1],[2]]), np.array([[10],[20]])], axis=1)
Out[48]:
array([[ 1, 10],
[ 2, 20]])

Related

why there is deference between the output type of this two Numpy slice commands

The output of the two commands below gives a different array shape, I do appreciate explaining why and referring me to a reference if any, I searched the internet but did not find any clear explanation for it.
data.shape
(11,2)
# outputs the values in column-0 in an (1x11) array.
data[:,0]
array([-7.24070e-01, -2.40724e+00, 2.64837e+00, 3.60920e-01,
6.73120e-01, -4.54600e-01, 2.20168e+00, 1.15605e+00,
5.06940e-01, -8.59520e-01, -5.99700e-01])
# outputs the values in column-0 in an (11x1) array
data[:,:-1]
array([[-7.24070e-01],
[-2.40724e+00],
[ 2.64837e+00],
[ 3.60920e-01],
[ 6.73120e-01],
[-4.54600e-01],
[ 2.20168e+00],
[ 1.15605e+00],
[ 5.06940e-01],
[-8.59520e-01],
[-5.99700e-01]])
I'll try to consolidate the comments into an answer.
First look at Python list indexing
In [92]: alist = [1,2,3]
selecting an item:
In [93]: alist[0]
Out[93]: 1
making a copy of the whole list:
In [94]: alist[:]
Out[94]: [1, 2, 3]
or a slice of length 2, or 1 or 0:
In [95]: alist[:2]
Out[95]: [1, 2]
In [96]: alist[:1]
Out[96]: [1]
In [97]: alist[:0]
Out[97]: []
Arrays follow the same basic rules
In [98]: x = np.arange(12).reshape(3,4)
In [99]: x
Out[99]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Select a row:
In [100]: x[0]
Out[100]: array([0, 1, 2, 3])
or a column:
In [101]: x[:,0]
Out[101]: array([0, 4, 8])
x[0,1] selects an single element.
https://numpy.org/doc/stable/user/basics.indexing.html#single-element-indexing
Indexing with a slice returns multiple rows:
In [103]: x[0:2]
Out[103]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
In [104]: x[0:1] # it retains the dimensions, even if only 1 (or even 0)
Out[104]: array([[0, 1, 2, 3]])
Likewise for columns:
In [106]: x[:,0:1]
Out[106]:
array([[0],
[4],
[8]])
subslices on both dimensions:
In [107]: x[0:2,1:3]
Out[107]:
array([[1, 2],
[5, 6]])
https://numpy.org/doc/stable/user/basics.indexing.html
x[[0]] also returns a 2d array, but that gets into "advanced" indexing (which doesn't have a list equivalent).

How this numpy advance indexing code works?

I am learning numpy framework.This piece of code I don't understand.
import numpy as np
a =np.array([[0,1,2],[3,4,5],[6,7,8],[9,10,11]])
print(a)
row = np.array([[0,0],[3,3]])
col = np.array([[0,2],[0,2]])
b = a[row,col]
print("This is b array:",b)
This b array returns the corner values of a array, that is, b equals [[0,2],[9,11]].
When indexing is done using an array or "array-like", to access/modify the elements of an array, then it's called advanced indexing.
In [37]: a
Out[37]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
In [38]: row
Out[38]:
array([[0, 0],
[3, 3]])
In [39]: col
Out[39]:
array([[0, 2],
[0, 2]])
In [40]: a[row, col]
Out[40]:
array([[ 0, 2],
[ 9, 11]])
That's what you got. Below is an explanation:
Indices of
`a[row, col]` row column
|| || || ||
VV VV VV VV
a[0, 0] a[0, 2]
a[3, 0] a[3, 2]
|__________| |
row-idx array |
|__________|
column-idx array
You're indexing a using two equally shaped 2d-arrays, hence you're output array will also have the same shape as col and row. To better understand how array indexing works you can check the docs, where as shown, indexing with 1d-arrays over the existing axis' of a given array works as follows:
result[i_1, ..., i_M] == x[ind_1[i_1, ..., i_M], ind_2[i_1, ..., i_M],
..., ind_N[i_1, ..., i_M]]
Where the same logic applies in the case of indexing with 2d-arrays over each axis, but instead you'd have a result array with up to i_N_M indices.
So going back to your example you are essentially selecting from the rows of a based on row, and from those rows you are selecting some columns col. You might find it more intuitive to translate the row and column indices into (x,y) coordinates:
(0,0), (0,2)
(3,0), (3,2)
Which, by accordingly selecting from a, results in the output array:
print(a[row,col])
array([[ 0, 2],
[ 9, 11]])
You can understand it by making more tries, to see more examples.
If you have one dimensional index:
In [58]: np.arange(10)[np.array([1,3,4,6])]
Out[58]: array([1, 3, 4, 6])
In case of two dimensional index:
In [57]: np.arange(10)[np.array([[1,3],[4,6]])]
Out[57]:
array([[1, 3],
[4, 6]])
If you use 3 dimensional index:
In [59]: np.arange(10)[np.array([[[1],[3]],[[4],[6]]])]
Out[59]:
array([[[1],
[3]],
[[4],
[6]]])
As you can see, if you make hierarchy in indexing, you will get it in the output as well.
Proceeding by steps:
import numpy as np
a = np.array([[0,1,2],[3,4,5],[6,7,8],[9,10,11]])
print(a)
gives 2d array a:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
Then:
row = np.array([[0,0],[3,3]])
assigns to 2d array row values [0,0] and [3,3]:
array([[0, 0],
[3, 3]])
Then:
col = np.array([[0,2],[0,2]])
assigns to 2d array col values [0,2] and [0,2]:
array([[0, 2],
[0, 2]])
Finally:
b = a[row,col]
assigns to b values given by a[0,0], a[0,2] for the first row, a[3,0], a[3,2] for the second row, that is:
array([[ 0, 2],
[ 9, 11]])
Where does b[0,0] <-- a[0,0] come from? It comes from the combination of row[0,0] which is 0 and col[0,0] which is 0.
What about b[0,1] <-- a[0,2]? It comes from the combination of row[0,1] which is 0 and col[0,1] which is 2.
And so forth.

How to set individual indices in Numpy arrays

I am trying to use arrays to set values in other arrays. Unfortunately instead of setting a value it is somehow overwriting a bunch of values. What is going on, and how can I achieve what I want?
>>> target = np.array( [ [0,1],[1,2],[2,3] ])
>>> target
array([[0, 1],
[1, 2],
[2, 3]])
>>> actions = np.array([0,0,0])
>>> target[actions] #The first row, 3 times
array([[0, 1],
[0, 1],
[0, 1]])
>>> target[:,actions] #The first column, 3 times
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
>>> values = np.array([7,8,9])
>>> target[:,actions] = values #why isnt this working?
>>> target
array([[9, 1],
[9, 2],
[9, 3]])
#Actually want
#array([[7, 1],
# [8, 2],
# [9, 3]])
>>> target = np.array( [ [0,1],[1,2],[2,3] ]) #reset to original value
>>> actions = np.array([0,1,0])
>>> target[:,actions] = values.reshape(3, 1)
array([[7, 7],
[8, 8],
[9, 9]])
#Actually want
#array([[7, 1],
# [1, 8],
# [9, 3]])
target[:,actions] selects the same column of target thrice.
When you say target[:,actions] = values, what you are doing is:
Assign 7 to all the values in the column, three times.
Assign 8 to all the values in the column, three times.
Assign 9 to all the values in the column, three times.
So you end up with 9 in all the values in the column.
If you insist on this awkward triple-writing of data, you can fix it by transposing the write:
target[:,actions] = values.reshape(3, 1)
This will write [7,8,9] to the column, three times. Obviously that's wasteful, and you could do this instead:
target[:,actions[-1]] = values
The effect should be the same, and it saves computation.
2 ways to write [7,8,9] to the first column:
basic indexing (with slice):
In [396]: target[:,0] = [7,8,9] # all rows, 1st column
In [397]: target
Out[397]:
array([[7, 1],
[8, 2],
[9, 3]])
Advanced indexing (with 2 lists)
In [398]: target[[0,1,2],[0,0,0]] = [7,8,9] # pair [0,0],[1,0],[2,0]
In [399]: target
Out[399]:
array([[7, 1],
[8, 2],
[9, 3]])
The 2nd method also works for a mix of columns:
In [400]: target = np.array( [ [0,1],[1,2],[2,3] ])
In [401]: target[[0,1,2],[0,1,0]] = [7,8,9]
In [402]: target
Out[402]:
array([[7, 1],
[1, 8],
[9, 3]])
Broadcasting comes into play. In a case like this the are 3 potential arrays to broadcast - the 2 dimensions and the source array.
Advanced indexing like this produces a 1d array. So the source array has to match:
In [403]: target[[0,1,2],[0,1,0]]
Out[403]: array([7, 8, 9])
A (1,3) can broadcast to (3,), but a (3,1) can't:
In [404]: target[[0,1,2],[0,1,0]] = np.array([[7,8,9]])
In [405]: target[[0,1,2],[0,1,0]] = np.array([[7,8,9]]).T
...
ValueError: shape mismatch: value array of shape (3,1) could not be broadcast to indexing result of shape (3,)
This sort of indexing is unusual. Note that the result is (3,3).
In [412]: target[:,[0,0,0]]
Out[412]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
A (3,1) source:
In [413]: np.array([[7,8,9]]).T
Out[413]:
array([[7],
[8],
[9]])
In [414]: target[:,[0,0,0]] = _
In [415]: target
Out[415]:
array([[7, 1],
[8, 2],
[9, 3]])
The (3,1) can broadcast to (3,3). It works, but ends up assigning [7,8,9] 3 times, all to the same 0 column.
Another way of assigning the 1st column:
In [423]: target[np.ix_([0,1,2],[0,0,0])]
Out[423]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
Again a (3,3), with accepts a (3,1):
In [424]: target[np.ix_([0,1,2],[0,0,0])] = np.array([[7,8,9]]).T
In [425]: target
Out[425]:
array([[7, 1],
[8, 2],
[9, 3]])
ix_ makes 2 arrays that can broadcast against each other, in this case a column vector and a row one:
In [426]: np.ix_([0,1,2],[0,0,0])
Out[426]:
(array([[0],
[1],
[2]]), array([[0, 0, 0]]))
I can select all elements of target with:
In [430]: target[np.ix_([0,1,2],[0,1])]
Out[430]:
array([[0, 1],
[1, 2],
[2, 3]])
and in a jumbled order:
In [431]: target[np.ix_([2,0,1],[1,0])]
Out[431]:
array([[3, 2],
[1, 0],
[2, 1]])
I couldn't get it to work using : indexing, however the following is functional by using an array of indices. Not sure why the : method is not working, if someone can come up with a way to fix that I will accept it instead.
>>> target = np.array( [ [0,1],[1,2],[2,3] ])
>>> rows = np.arange(target.shape[0])
>>> actions = np.array([0,1,0])
>>> values = np.array([7,8,9])
>>> target[rows,actions] = values
>>> target
array([[7, 1],
[1, 8],
[9, 3]])

Inserting a row into a NumPy array

I have :
A = np.array([[0,1,1],[0,3,2],[1,1,1],[1,5,2]])
where the NumPy array is sorted based on first element and then second element and so on.
I want to insert [1,4,10] into A,such that the output would be :
A = array([[0,1,1],[0,3,2],[1,1,1],[1,4,10][1,5,2]])
How should I do it?
First off, stack the new 1D array as the last row with np.vstack -
B = np.vstack((A,[1,4,10]))
Now, for maintaining the precedence order of considering first and then second and so on elements for each row, assume each row as an indexing tuple and then get the sorted indices. This could be achieved with np.ravel_multi_index(B.T,B.max(0)+1). Then, use these indices for rearranging rows of B and have the desired output. Thus, the final code would be -
out = B[np.ravel_multi_index(B.T,B.max(0)+1).argsort()]
It seems there's an alternative with np.lexsort to get the sorted indices that respects that precedence, but does from in the opposite sense. So, we need to reverse the order of elements row-wise, use lexsort and then get the sorted indices. These indices could then be used for indexing into B just like in the previous approach and get us the output. So, the alternative final code with np.lexsort would be -
out = B[np.lexsort(B[:,::-1].T)]
Sample run -
In [60]: A
Out[60]:
array([[0, 1, 1],
[0, 3, 2],
[1, 1, 1],
[1, 5, 2]])
In [61]: B = np.vstack((A,[1,4,10]))
In [62]: B
Out[62]:
array([[ 0, 1, 1],
[ 0, 3, 2],
[ 1, 1, 1],
[ 1, 5, 2],
[ 1, 4, 10]]) # <= New row
In [63]: B[np.ravel_multi_index(B.T,B.max(0)+1).argsort()]
Out[63]:
array([[ 0, 1, 1],
[ 0, 3, 2],
[ 1, 1, 1],
[ 1, 4, 10], # <= New row moved here
[ 1, 5, 2]])
In [64]: B[np.lexsort(B[:,::-1].T)]
Out[64]:
array([[ 0, 1, 1],
[ 0, 3, 2],
[ 1, 1, 1],
[ 1, 4, 10], # <= New row moved here
[ 1, 5, 2]])

How to add items into a numpy array

I need to accomplish the following task:
from:
a = array([[1,3,4],[1,2,3]...[1,2,1]])
(add one element to each row) to:
a = array([[1,3,4,x],[1,2,3,x]...[1,2,1,x]])
I have tried doing stuff like a[n] = array([1,3,4,x])
but numpy complained of shape mismatch. I tried iterating through a and appending element x to each item, but the changes are not reflected.
Any ideas on how I can accomplish this?
Appending data to an existing array is a natural thing to want to do for anyone with python experience. However, if you find yourself regularly appending to large arrays, you'll quickly discover that NumPy doesn't easily or efficiently do this the way a python list will. You'll find that every "append" action requires re-allocation of the array memory and short-term doubling of memory requirements. So, the more general solution to the problem is to try to allocate arrays to be as large as the final output of your algorithm. Then perform all your operations on sub-sets (slices) of that array. Array creation and destruction should ideally be minimized.
That said, It's often unavoidable and the functions that do this are:
for 2-D arrays:
np.hstack
np.vstack
np.column_stack
np.row_stack
for 3-D arrays (the above plus):
np.dstack
for N-D arrays:
np.concatenate
import numpy as np
a = np.array([[1,3,4],[1,2,3],[1,2,1]])
b = np.array([10,20,30])
c = np.hstack((a, np.atleast_2d(b).T))
returns c:
array([[ 1, 3, 4, 10],
[ 1, 2, 3, 20],
[ 1, 2, 1, 30]])
One way to do it (may not be the best) is to create another array with the new elements and do column_stack. i.e.
>>>a = array([[1,3,4],[1,2,3]...[1,2,1]])
[[1 3 4]
[1 2 3]
[1 2 1]]
>>>b = array([1,2,3])
>>>column_stack((a,b))
array([[1, 3, 4, 1],
[1, 2, 3, 2],
[1, 2, 1, 3]])
Appending a single scalar could be done a bit easier as already shown (and also without converting to float) by expanding the scalar to a python-list-type:
import numpy as np
a = np.array([[1,3,4],[1,2,3],[1,2,1]])
x = 10
b = np.hstack ((a, [[x]] * len (a) ))
returns b as:
array([[ 1, 3, 4, 10],
[ 1, 2, 3, 10],
[ 1, 2, 1, 10]])
Appending a row could be done by:
c = np.vstack ((a, [x] * len (a[0]) ))
returns c as:
array([[ 1, 3, 4],
[ 1, 2, 3],
[ 1, 2, 1],
[10, 10, 10]])
np.insert can also be used for the purpose
import numpy as np
a = np.array([[1, 3, 4],
[1, 2, 3],
[1, 2, 1]])
x = 5
index = 3 # the position for x to be inserted before
np.insert(a, index, x, axis=1)
array([[1, 3, 4, 5],
[1, 2, 3, 5],
[1, 2, 1, 5]])
index can also be a list/tuple
>>> index = [1, 1, 3] # equivalently (1, 1, 3)
>>> np.insert(a, index, x, axis=1)
array([[1, 5, 5, 3, 4, 5],
[1, 5, 5, 2, 3, 5],
[1, 5, 5, 2, 1, 5]])
or a slice
>>> index = slice(0, 3)
>>> np.insert(a, index, x, axis=1)
array([[5, 1, 5, 3, 5, 4],
[5, 1, 5, 2, 5, 3],
[5, 1, 5, 2, 5, 1]])
If x is just a single scalar value, you could try something like this to ensure the correct shape of the array that is being appended/concatenated to the rightmost column of a:
import numpy as np
a = np.array([[1,3,4],[1,2,3],[1,2,1]])
x = 10
b = np.hstack((a,x*np.ones((a.shape[0],1))))
returns b as:
array([[ 1., 3., 4., 10.],
[ 1., 2., 3., 10.],
[ 1., 2., 1., 10.]])
target = []
for line in a.tolist():
new_line = line.append(X)
target.append(new_line)
return array(target)

Categories

Resources