Numpy extract submatrix - python

I'm pretty new in numpy and I am having a hard time understanding how to extract from a np.array a sub matrix with defined columns and rows:
Y = np.arange(16).reshape(4,4)
If I want to extract columns/rows 0 and 3, I should have:
[[0 3]
[12 15]]
I tried all the reshape functions...but cannot figure out how to do this. Any ideas?

Give np.ix_ a try:
Y[np.ix_([0,3],[0,3])]
This returns your desired result:
In [25]: Y = np.arange(16).reshape(4,4)
In [26]: Y[np.ix_([0,3],[0,3])]
Out[26]:
array([[ 0, 3],
[12, 15]])

One solution is to index the rows/columns by slicing/striding. Here's an example where you are extracting every third column/row from the first to last columns (i.e. the first and fourth columns)
In [1]: import numpy as np
In [2]: Y = np.arange(16).reshape(4, 4)
In [3]: Y[0:4:3, 0:4:3]
Out[1]: array([[ 0, 3],
[12, 15]])
This gives you the output you were looking for.
For more info, check out this page on indexing in NumPy.

print y[0:4:3,0:4:3]
is the shortest and most appropriate fix .

First of all, your Y only has 4 col and rows, so there is no col4 or row4, at most col3 or row3.
To get 0, 3 cols: Y[[0,3],:]
To get 0, 3 rows: Y[:,[0,3]]
So to get the array you request: Y[[0,3],:][:,[0,3]]
Note that if you just Y[[0,3],[0,3]] it is equivalent to [Y[0,0], Y[3,3]] and the result will be of two elements: array([ 0, 15])

You can also do this using:
Y[[[0],[3]],[0,3]]
which is equivalent to doing this using indexing arrays:
idx = np.array((0,3)).reshape(2,1)
Y[idx,idx.T]
To make the broadcasting work as desired, you need the non-singleton dimension of your indexing array to be aligned with the axis you're indexing into, e.g. for an n x m 2D subarray:
Y[<n x 1 array>,<1 x m array>]
This doesn't create an intermediate array, unlike CT Zhu's answer, which creates the intermediate array Y[(0,3),:], then indexes into it.

This can also be done by slicing: Y[[0,3],:][:,[0,3]]. More elegantly, it is possible to slice arrays (or even reorder them) by given sets of indices for rows, columns, pages, et cetera:
r=np.array([0,3])
c=np.array([0,3])
print(Y[r,:][:,c]) #>>[[ 0 3][12 15]]
for reordering try this:
r=np.array([0,3])
c=np.array([3,0])
print(Y[r,:][:,c])#>>[[ 3 0][15 12]]

Related

iterating a filtered Numpy array whilst maintaining index information

I am attempting to pass filtered values from a Numpy array into a function.
I need to pass values only above a certain value, and their index position with the Numpy array.
I am attempting to avoid iterating over the entire array within python by using Numpys own filtering systems, the arrays i am dealing with have 20k of values in them with potentially only very few being relevant.
import numpy as np
somearray = np.array([1,2,3,4,5,6])
arrayindex = np.nonzero(somearray > 4)
for i in arrayindex:
somefunction(arrayindex[0], somearray[arrayindex[0]])
This threw up errors of logic not being able to handle multiple values,
this led me to testing it through print statement to see what was going on.
for cell in arrayindex:
print(f"index {cell}")
print(f"data {somearray[cell]}")
I expected an output of
index 4
data 5
index 5
data 6
But instead i get
index [4 5]
data [5 6]
I have looked through different methods to iterate through numpy arrays such and neditor, but none seem to still allow me to do the filtering of values outside of the for loop.
Is there a solution to my quandary?
Oh, i am aware that is is generally frowned upon to loop through a numpy array, however the function that i am passing these values to are complex, triggering certain events and involving data to be uploaded to a data base dependent on the data location within the array.
Thanks.
import numpy as np
somearray = np.array([1,2,3,4,5,6])
arrayindex = [idx for idx, val in enumerate(somearray) if val > 4]
for i in range(0, len(arrayindex)):
somefunction(arrayindex[i], somearray[arrayindex[i]])
for i in range(0, len(arrayindex)):
print("index", arrayindex[i])
print("data", somearray[arrayindex[i]])
You need to have a clear idea of what nonzero produces, and pay attention to the difference between indexing with a list(s) and with a tuple.
===
In [110]: somearray = np.array([1,2,3,4,5,6])
...: arrayindex = np.nonzero(somearray > 4)
nonzero produces a tuple of arrays, one per dimension (this becomes more obvious with 2d arrays):
In [111]: arrayindex
Out[111]: (array([4, 5]),)
It can be used directly as an index:
In [113]: somearray[arrayindex]
Out[113]: array([5, 6])
In this 1d case you could take the array out of the tuple, and iterate on it:
In [114]: for i in arrayindex[0]:print(i, somearray[i])
4 5
5 6
argwhere does a 'transpose', which could also be used for iteration
In [115]: idxs = np.argwhere(somearray>4)
In [116]: idxs
Out[116]:
array([[4],
[5]])
In [117]: for i in idxs: print(i,somearray[i])
[4] [5]
[5] [6]
idxs is (2,1) shape, so i is (1,) shape array, resulting in the brackets in the display. Occasionally it's useful, but nonzero is used more (often by it's other name, np.where).
2d
argwhere has a 2d example:
In [119]: x=np.arange(6).reshape(2,3)
In [120]: np.argwhere(x>1)
Out[120]:
array([[0, 2],
[1, 0],
[1, 1],
[1, 2]])
In [121]: np.nonzero(x>1)
Out[121]: (array([0, 1, 1, 1]), array([2, 0, 1, 2]))
In [122]: x[np.nonzero(x>1)]
Out[122]: array([2, 3, 4, 5])
While nonzero can be used to index the array, argwhere elements can't.
In [123]: for ij in np.argwhere(x>1):
...: print(ij,x[ij])
...:
...
IndexError: index 2 is out of bounds for axis 0 with size 2
Problem is that ij is a list, which is used to index on dimension. numpy distinguishes between lists and tuples when indexing. (Earlier versions fudged the difference, but current versions are taking a more rigorous approach.)
So we need to change the list into a tuple. One way is to unpack it:
In [124]: for i,j in np.argwhere(x>1):
...: print(i,j,x[i,j])
...:
...:
0 2 2
1 0 3
1 1 4
1 2 5
I could have used: print(ij,x[tuple(ij)]) in [123].
I should have used unpacking the [117] iteration:
In [125]: for i, in idxs: print(i,somearray[i])
4 5
5 6
or somearray[tuple(i)]

Selective deletion by value in numpy array

EDITED: Refined problem statement
I am still figuring out the fancy options which are offered by the numpy library. Following topic came on my desk:
Purpose:
In a multi-dimensional array I select one column. This slicing works fine. But after that, values stored in another list need to be filtered out of the column values.
Current status:
array1 = np.asarray([[0,1,2],[1,0,3],[2,3,0]])
print(array1)
array1woZero = np.nonzero(array1)
print(array1woZero)
toBeRemoved = []
toBeRemoved.append(1)
print(toBeRemoved)
column = array1[:,1]
result = np.delete(column,toBeRemoved)
The above mentioned code does not bring the expected result. In fact, the np.delete() command just removes the value at index 1 - but I would need the value of 1 to be filtered out instead. What I also do not understand is the shape change when applying the nonzero to array1: While array1 is (3,3), the array1woZero turns out into a tuple of 2 dims with 6 values each.
0
Array of int64
(6,)
0
0
1
1
2
2
1
Array of int64
(6,)
1
2
0
2
0
1
My feeling is that I would require something like slicing with an exclusion operator. Do you have any hints for me to solve that? Is it necessary to use different data structures?
In [18]: arr = np.asarray([[0,1,2],[1,0,3],[2,3,0]])
In [19]: arr
Out[19]:
array([[0, 1, 2],
[1, 0, 3],
[2, 3, 0]])
nonzero gives the indices of all non-zero elements of its argument (arr):
In [20]: idx = np.nonzero(arr)
In [21]: idx
Out[21]: (array([0, 0, 1, 1, 2, 2]), array([1, 2, 0, 2, 0, 1]))
This is a tuple of arrays, one per dimension. That output can be confusing, but it is easily used to return all of those non-zero elements:
In [22]: arr[idx]
Out[22]: array([1, 2, 1, 3, 2, 3])
Indexing like this, with a pair of arrays, produces a 1d array. In your example there is just one 0 per row, but in general that's not the case.
This is the same indexing - with 2 lists of the same length:
In [24]: arr[[0,0,1,1,2,2], [1,2,0,2,0,1]]
Out[24]: array([1, 2, 1, 3, 2, 3])
idx[0] just selects on array of that tuple, the row indices. That probably isn't what you want. And I doubt if you want to apply np.delete to that tuple.
It's hard to tell from the description, and code, what you want. Maybe that's because you don't understand what nonzero is producing.
We can also select the nonzero elements with boolean masking:
In [25]: arr>0
Out[25]:
array([[False, True, True],
[ True, False, True],
[ True, True, False]])
In [26]: arr[ arr>0 ]
Out[26]: array([1, 2, 1, 3, 2, 3])
the hint with the boolean masking very good and helped me to develop my own solution. The symbolic names in the following code snippets are different, but the idea should become clear anyway.
At the beginning, I have my overall searchSpace.
searchSpace = relativeDistances[currentNode,:]
Assume that its shape is (5,). My filter is defined on the indexes, i.e. range 0..4. Then I define another numpy array "filter" of same shape with all 1, and the values to be filtered out I set to 0.
filter = np.full(shape=nodeCount,fill_value=1,dtype=np.int32())
filter[0] = 0
filter[3] = 0
searchSpace = searchSpace * filter
minValue = searchSpace[searchSpace > 0].min()
neighborNode = np.where(searchSpace==minValue)
The filter array provides me the flexibility to adjust the filter later on as part of a loop. Using the element-wise multiplication with 0 and subsequent boolean masking, I can create my reduced searchSpace for minimum search. Compared to a separate array or list, I still have the original shape, which is required to get the correct index in the where-statement.

Finding the difference between two values in a numpy array-with code

I have a numpy list which I initiate by (my_array=[] and has a shape of (0,)) then I append the wm and hm elements to it like so(r is a cascade with the format of-[[300 240 22 22]]):
my_array=[]
for (x, y, w, h) in r:
wm=int(x+ (w/2.))
hm=int(y+ (h/2.))
my_array.append([numpy.float32(wm), numpy.float32(hm)])
return numpy.array(my_array)
That code produces:
wm element the hm element
[[270.01 303.43] [310.17 306.37]] # second to last row
[[269.82 303.38] [310.99 306.86]] # the last row
the shape of the returned array is (2,2) and is dtype:float32
Now the problem is that when I tried to append the 303.43 it theoretically would be [-2][1] but it indexes 303.38. which is fine but I also need to index 303.43 as well.
What I found was that the first [] indexes either the wm[0] or hm[1] element, then the second [] indexes one of the two columns of values inside each element
-for example [0][-1] indexes the wm element[0] and last row [-1] I want to index the second last row as well and tried [0][-2] but it didn't work as intended(it indexed the 269.82).
So I tried [0][1][-2] but it didn't work due to IndexError: invalid index to scalar variable.
All I want to do is to find the difference between the last and second to last row for the 2 columns in the wm element(so in the example above it would be 269.82-270.1=-0.19 and 303.38-303.43=-0.05). All solutions presented in other questions dont work ([0][-1],[-1][0], you can try them yourself to find out) The indexing doesn't work. So is there a way around this problem? Please explain it fully because I am still kind of new to this! Thanks in advance!
Addition:
Taking the last two blocks of data
Indexing the array (in the idle) fetches(I copied the last two blocks of the array):
[[293.51373 323.4329 ]
[247.77493 316.02783]]
[[292.9887 322.23425]
[247.24142 314.2921 ]]
On my program, it shows up as (the same array)
--wm element------------------hm element
[[293.51373 323.4329 ][247.77493 316.02783]] I consider this the second to last row
[[292.9887 322.23425][247.24142 314.2921 ]] and I thought this was the last row
This brought forth a lot of confusion for me, but I ignored the minor difference of the way they are displayed until now. Now, the question is how to index the 323.4329 and the 293.51373 numbers, it would be better if they can be indexed separately?
A sample r:
In [41]: r = np.array([[0,0,8,10],[1,1,6,8],[2,2,10,12]])
In [42]: r
Out[42]:
array([[ 0, 0, 8, 10],
[ 1, 1, 6, 8],
[ 2, 2, 10, 12]])
In [43]: my_array=[]
In [45]: for (ex,ey,ew,eh) in r:
...: wm = int(ex+(ew/2))
...: hm = int(ey+(eh/2))
...: print(wm,hm)
...: my_array.append([wm,hm])
...:
4 5
4 5
7 8
The resulting array:
In [46]: arr = np.array(my_array)
In [47]: arr
Out[47]:
array([[4, 5],
[4, 5],
[7, 8]])
Sample indexing:
In [48]: arr[:,0]
Out[48]: array([4, 4, 7]) # the 3 wm values
In [49]: arr[-1,:] # the last values produced by the last `r` row
Out[49]: array([7, 8])
Or a more symbolic array:
In [52]: arr = np.array([[f'wm{i}',f'hm{i}'] for i in range(3)])
In [53]: arr
Out[53]:
array([['wm0', 'hm0'],
['wm1', 'hm1'],
['wm2', 'hm2']], dtype='<U3')
In [54]: arr[:,0]
Out[54]: array(['wm0', 'wm1', 'wm2'], dtype='<U3')
In [55]: arr[-1,:]
Out[55]: array(['wm2', 'hm2'], dtype='<U3')
===
In [108]: arr = np.array([[313.5536, 330.60587], [368.23245, 332.70932]])
In [109]: arr
Out[109]:
array([[313.5536 , 330.60587], # 2nd to the last row
[368.23245, 332.70932]]) # last row
Last row:
In [110]: arr[-1]
Out[110]: array([368.23245, 332.70932])
In [111]: arr[-1,:]
Out[111]: array([368.23245, 332.70932])
First column
In [112]: arr[:,0]
Out[112]: array([313.5536 , 368.23245])
2nd to the last row:
In [113]: arr[-2,:]
Out[113]: array([313.5536 , 330.60587])

How to delete rows of numpy array by multiple row indices?

I have two lists of indices (idx[0] and idx[1]), and I should delete the corresponding rows from numpy array y_test.
y_test
12 11 10
1 2 2
3 2 3
4 1 2
13 1 10
idx[0] = [0,2]
idx[1] = [1,3]
I tried to delete the rows as follows (using ~). But it didn't work:
result = y_test[(~idx[0]+~idx[1]+~idx[2])]
Expected result:
result =
13 1 10
Instead of removing elements, just make a new array with the desired ones. This will keep any future indexing from getting jumbled up and maintain the old array.
import numpy as np
y_test = np.asarray([[12, 11, 10], [1, 2, 2], [3, 2, 3], [4, 1, 2], [13, 1, 10]])
idx = [[0, 2], [1, 3]]
# flatten list of lists
idx_flat = [i for j in idx for i in j]
# assign values that are NOT in your idx list to a new array
result = [row for num, row in enumerate(y_test) if num not in idx_flat]
# cast this however you want it, right now 'result' is a list of np.arrays
print result
[array([13, 1, 10])]
For an understanding of the flatten step using list comprehensions check this out
You can use numpy.delete which deletes the subarrays along the axis.
np.delete(y_test, idx, axis=0)
Make sure that idx.dtype is an integer type and use numpy.astype if not.
Your approach did not work because idx is not a boolean index array but holds the indices. So ~ which is binary negation will produce ~[0, 2] = [-1, -3] (where both should be numpy arrays).
I would definitely recommend reading up on the difference between index arrays and boolean index arrays. For boolean index arrays I would suggest using numpy.logical_not and numpy.logical_or.
+ concatenates Python lists but is the standard plus for numpy arrays.
Since you are using NumPy I'd suggest masking in this way.
Setup:
import numpy as np
y_test = np.array([[12,11,10],
[1,2,2],
[3,2,3],
[4,1,2],
[13,1,10]])
idx = np.array([[0,2], [1,3]])
Generate the mask:
Generate a mask of ones then set to zero elements at index in idx:
mask = np.ones(len(y_test), dtype = int).reshape(5,1)
mask[idx.flatten()] = 0
Finally apply the mask:
y_test[~np.all(y_test * mask == 0, axis=1)]
#=> [[13 1 10]]
y_test has not been modified.

create a numpy array from the minimum values in the rows of an array

Starting with a 2D Numpy array I would like to create a 1D array in which each value corresponds to the minimum value of each row in the 2D array.
For example if
dog=[[1,2],[4,3],[6,7]]
then I would like to create an array from
'dog':[1,3,6]
This seems like it should be easy to do, but I'm not getting it so far.
In [54]: dog=[[1,2],[4,3],[6,7]]
In [55]: np.min(dog, axis=1)
Out[55]: array([1, 3, 6])
or, if dog is a NumPy array, you could call its min method:
In [57]: dog = np.array([[1,2],[4,3],[6,7]])
In [58]: dog.min(axis=1)
Out[58]: array([1, 3, 6])
Since dog.shape is (3,2), (for 3 rows, 2 columns), the axis=1 refers to the second dimension in the shape -- the one with 2 elements. Putting axis=1 in the call to dog.min tells NumPy to take the min over the axis=1 direction, thus eliminating the axis of length 2. The result is thus of shape (3,).
Without numpy:
dog=[[1,2],[4,3],[6,7]]
mins = [min(x) for x in dog]

Categories

Resources