Finding the difference between two values in a numpy array-with code - python

I have a numpy list which I initiate by (my_array=[] and has a shape of (0,)) then I append the wm and hm elements to it like so(r is a cascade with the format of-[[300 240 22 22]]):
my_array=[]
for (x, y, w, h) in r:
wm=int(x+ (w/2.))
hm=int(y+ (h/2.))
my_array.append([numpy.float32(wm), numpy.float32(hm)])
return numpy.array(my_array)
That code produces:
wm element the hm element
[[270.01 303.43] [310.17 306.37]] # second to last row
[[269.82 303.38] [310.99 306.86]] # the last row
the shape of the returned array is (2,2) and is dtype:float32
Now the problem is that when I tried to append the 303.43 it theoretically would be [-2][1] but it indexes 303.38. which is fine but I also need to index 303.43 as well.
What I found was that the first [] indexes either the wm[0] or hm[1] element, then the second [] indexes one of the two columns of values inside each element
-for example [0][-1] indexes the wm element[0] and last row [-1] I want to index the second last row as well and tried [0][-2] but it didn't work as intended(it indexed the 269.82).
So I tried [0][1][-2] but it didn't work due to IndexError: invalid index to scalar variable.
All I want to do is to find the difference between the last and second to last row for the 2 columns in the wm element(so in the example above it would be 269.82-270.1=-0.19 and 303.38-303.43=-0.05). All solutions presented in other questions dont work ([0][-1],[-1][0], you can try them yourself to find out) The indexing doesn't work. So is there a way around this problem? Please explain it fully because I am still kind of new to this! Thanks in advance!
Addition:
Taking the last two blocks of data
Indexing the array (in the idle) fetches(I copied the last two blocks of the array):
[[293.51373 323.4329 ]
[247.77493 316.02783]]
[[292.9887 322.23425]
[247.24142 314.2921 ]]
On my program, it shows up as (the same array)
--wm element------------------hm element
[[293.51373 323.4329 ][247.77493 316.02783]] I consider this the second to last row
[[292.9887 322.23425][247.24142 314.2921 ]] and I thought this was the last row
This brought forth a lot of confusion for me, but I ignored the minor difference of the way they are displayed until now. Now, the question is how to index the 323.4329 and the 293.51373 numbers, it would be better if they can be indexed separately?

A sample r:
In [41]: r = np.array([[0,0,8,10],[1,1,6,8],[2,2,10,12]])
In [42]: r
Out[42]:
array([[ 0, 0, 8, 10],
[ 1, 1, 6, 8],
[ 2, 2, 10, 12]])
In [43]: my_array=[]
In [45]: for (ex,ey,ew,eh) in r:
...: wm = int(ex+(ew/2))
...: hm = int(ey+(eh/2))
...: print(wm,hm)
...: my_array.append([wm,hm])
...:
4 5
4 5
7 8
The resulting array:
In [46]: arr = np.array(my_array)
In [47]: arr
Out[47]:
array([[4, 5],
[4, 5],
[7, 8]])
Sample indexing:
In [48]: arr[:,0]
Out[48]: array([4, 4, 7]) # the 3 wm values
In [49]: arr[-1,:] # the last values produced by the last `r` row
Out[49]: array([7, 8])
Or a more symbolic array:
In [52]: arr = np.array([[f'wm{i}',f'hm{i}'] for i in range(3)])
In [53]: arr
Out[53]:
array([['wm0', 'hm0'],
['wm1', 'hm1'],
['wm2', 'hm2']], dtype='<U3')
In [54]: arr[:,0]
Out[54]: array(['wm0', 'wm1', 'wm2'], dtype='<U3')
In [55]: arr[-1,:]
Out[55]: array(['wm2', 'hm2'], dtype='<U3')
===
In [108]: arr = np.array([[313.5536, 330.60587], [368.23245, 332.70932]])
In [109]: arr
Out[109]:
array([[313.5536 , 330.60587], # 2nd to the last row
[368.23245, 332.70932]]) # last row
Last row:
In [110]: arr[-1]
Out[110]: array([368.23245, 332.70932])
In [111]: arr[-1,:]
Out[111]: array([368.23245, 332.70932])
First column
In [112]: arr[:,0]
Out[112]: array([313.5536 , 368.23245])
2nd to the last row:
In [113]: arr[-2,:]
Out[113]: array([313.5536 , 330.60587])

Related

How to filter with numpy on 2D array using np.where

I read the numpy doc and np.where takes 1 argument to return row indices when the condition is matching..
numpy.where(condition, [x, y, ]/)
In the context of multi dimensional array I want to find and replace when the condition is matching
this is doable with some other params from the doc [x, y, ] are replacement values
Here is my data structure :
my_2d_array = np.array([[1,2],[3,4]])
Here is how I filter a column with python my_2d_array[:,1]
Here is how I filter find/replace with numpy :
indices = np.where( my_2d_array[:,1] == 4, my_2d_array[:,1] , my_2d_array[:,1] )
(when the second column value match 4 invert the value in column two with column one)
So its hard for me to understand why the same syntax my_2d_array[:,1] is used to filter a whole column in python and to designate a single row of my 2D array for numpy where the condition is matched
Your array:
In [9]: arr = np.array([[1,2],[3,4]])
In [10]: arr
Out[10]:
array([[1, 2],
[3, 4]])
Testing for some value:
In [11]: arr==4
Out[11]:
array([[False, False],
[False, True]])
testing one column:
In [12]: arr[:,1]
Out[12]: array([2, 4])
In [13]: arr[:,1]==4
Out[13]: array([False, True])
As documented, np.where with just one argument is just a call to nonzero, which finds the index(s) for the True values:
So for the 2d array in [11] we get two arrays:
In [15]: np.nonzero(arr==4)
Out[15]: (array([1], dtype=int64), array([1], dtype=int64))
and for the 1d boolean in [13], one array:
In [16]: np.nonzero(arr[:,1]==4)
Out[16]: (array([1], dtype=int64),)
That array can be used to select a row from arr:
In [17]: arr[_,1]
Out[17]: array([[4]])
If used in the three argument where, it selects elements between the 2nd and 3rd arguments. For example, using arguments that have nothing to do with arr:
In [18]: np.where(arr[:,1]==4, ['a','b'],['c','d'])
Out[18]: array(['c', 'b'], dtype='<U1')
The selection gets more complicated if the arguments differ in shape; then the rules of broadcasting apply.
So the basic point with np.where is that all 3 arguments are first evaluated, and passed (in true python function fashion) to the where function. It then selects elements based on the cond, returning a new array.
That where is functionally the same as this list comprehension (or an equivalent for loop):
In [19]: [i if cond else j for cond,i,j in zip(arr[:,1]==4, ['a','b'],['c','d'])]
Out[19]: ['c', 'b']

iterating a filtered Numpy array whilst maintaining index information

I am attempting to pass filtered values from a Numpy array into a function.
I need to pass values only above a certain value, and their index position with the Numpy array.
I am attempting to avoid iterating over the entire array within python by using Numpys own filtering systems, the arrays i am dealing with have 20k of values in them with potentially only very few being relevant.
import numpy as np
somearray = np.array([1,2,3,4,5,6])
arrayindex = np.nonzero(somearray > 4)
for i in arrayindex:
somefunction(arrayindex[0], somearray[arrayindex[0]])
This threw up errors of logic not being able to handle multiple values,
this led me to testing it through print statement to see what was going on.
for cell in arrayindex:
print(f"index {cell}")
print(f"data {somearray[cell]}")
I expected an output of
index 4
data 5
index 5
data 6
But instead i get
index [4 5]
data [5 6]
I have looked through different methods to iterate through numpy arrays such and neditor, but none seem to still allow me to do the filtering of values outside of the for loop.
Is there a solution to my quandary?
Oh, i am aware that is is generally frowned upon to loop through a numpy array, however the function that i am passing these values to are complex, triggering certain events and involving data to be uploaded to a data base dependent on the data location within the array.
Thanks.
import numpy as np
somearray = np.array([1,2,3,4,5,6])
arrayindex = [idx for idx, val in enumerate(somearray) if val > 4]
for i in range(0, len(arrayindex)):
somefunction(arrayindex[i], somearray[arrayindex[i]])
for i in range(0, len(arrayindex)):
print("index", arrayindex[i])
print("data", somearray[arrayindex[i]])
You need to have a clear idea of what nonzero produces, and pay attention to the difference between indexing with a list(s) and with a tuple.
===
In [110]: somearray = np.array([1,2,3,4,5,6])
...: arrayindex = np.nonzero(somearray > 4)
nonzero produces a tuple of arrays, one per dimension (this becomes more obvious with 2d arrays):
In [111]: arrayindex
Out[111]: (array([4, 5]),)
It can be used directly as an index:
In [113]: somearray[arrayindex]
Out[113]: array([5, 6])
In this 1d case you could take the array out of the tuple, and iterate on it:
In [114]: for i in arrayindex[0]:print(i, somearray[i])
4 5
5 6
argwhere does a 'transpose', which could also be used for iteration
In [115]: idxs = np.argwhere(somearray>4)
In [116]: idxs
Out[116]:
array([[4],
[5]])
In [117]: for i in idxs: print(i,somearray[i])
[4] [5]
[5] [6]
idxs is (2,1) shape, so i is (1,) shape array, resulting in the brackets in the display. Occasionally it's useful, but nonzero is used more (often by it's other name, np.where).
2d
argwhere has a 2d example:
In [119]: x=np.arange(6).reshape(2,3)
In [120]: np.argwhere(x>1)
Out[120]:
array([[0, 2],
[1, 0],
[1, 1],
[1, 2]])
In [121]: np.nonzero(x>1)
Out[121]: (array([0, 1, 1, 1]), array([2, 0, 1, 2]))
In [122]: x[np.nonzero(x>1)]
Out[122]: array([2, 3, 4, 5])
While nonzero can be used to index the array, argwhere elements can't.
In [123]: for ij in np.argwhere(x>1):
...: print(ij,x[ij])
...:
...
IndexError: index 2 is out of bounds for axis 0 with size 2
Problem is that ij is a list, which is used to index on dimension. numpy distinguishes between lists and tuples when indexing. (Earlier versions fudged the difference, but current versions are taking a more rigorous approach.)
So we need to change the list into a tuple. One way is to unpack it:
In [124]: for i,j in np.argwhere(x>1):
...: print(i,j,x[i,j])
...:
...:
0 2 2
1 0 3
1 1 4
1 2 5
I could have used: print(ij,x[tuple(ij)]) in [123].
I should have used unpacking the [117] iteration:
In [125]: for i, in idxs: print(i,somearray[i])
4 5
5 6
or somearray[tuple(i)]

Find index of max element in numpy array excluding few indexes

Say:
p = array([4, 0, 8, 2, 7])
Want to find the index of max value, except few indexes, say:
excptIndx = [2, 3]
Ans: 4, as 7 will be max.
if excptIndx = [1, 3], Ans: 2, as 8 will be max.
In numpy, you can mask all values at excptIndx and run argmax to obtain index of max element:
import numpy as np
p = np.array([4, 0, 8, 2, 7])
excptIndx = [2, 3]
m = np.zeros(p.size, dtype=bool)
m[excptIndx] = True
a = np.ma.array(p, mask=m)
print(np.argmax(a))
# 4
The setup:
In [153]: p = np.array([4,0,8,2,7])
In [154]: exceptions = [2,3]
Original indexes in p:
In [155]: idx = np.arange(p.shape[0])
delete exceptions from both:
In [156]: np.delete(p,exceptions)
Out[156]: array([4, 0, 7])
In [157]: np.delete(idx,exceptions)
Out[157]: array([0, 1, 4])
Find the argmax in the deleted array:
In [158]: np.argmax(np.delete(p,exceptions))
Out[158]: 2
Use that to find the max value (could just as well use np.max(_156)
In [159]: _156[_158]
Out[159]: 7
Use the same index to find the index in the original p
In [160]: _157[_158]
Out[160]: 4
In [161]: p[_160] # another way to get the max value
Out[161]: 7
For this small example, the pure Python alternatives might well be faster. They often are in small cases. We need test cases with a 1000 or more values to really see the advantages of numpy.
Another method
Set the exceptions to a small enough value, and take the argmax:
In [162]: p1 = p.copy(); p1[exceptions] = -1000
In [163]: np.argmax(p1)
Out[163]: 4
Here the small enough is easy to pick; more generally it may require some thought.
Or taking advantage of the np.nan... functions:
In [164]: p1 = p.astype(float); p1[exceptions]=np.nan
In [165]: np.nanargmax(p1)
Out[165]: 4
A solution is
mask = np.isin(np.arange(len(p)), excptIndx)
subset_idx = np.argmax(p[mask])
parent_idx = np.arange(len(p))[mask][subset_idx]
See http://seanlaw.github.io/2015/09/10/numpy-argmin-with-a-condition/
p = np.array([4,0,8,2,7]) # given
exceptions = [2,3] # given
idx = list( range(0,len(p)) ) # simple array of index
a1 = np.delete(idx, exceptions) # remove exceptions from idx (i.e., index)
a2 = np.argmax(np.delete(p, exceptions)) # get index of the max value after removing exceptions from actual p array
a1[a2] # as a1 and a2 are in sync, this will give the original index (as asked) of the max value

Selective deletion by value in numpy array

EDITED: Refined problem statement
I am still figuring out the fancy options which are offered by the numpy library. Following topic came on my desk:
Purpose:
In a multi-dimensional array I select one column. This slicing works fine. But after that, values stored in another list need to be filtered out of the column values.
Current status:
array1 = np.asarray([[0,1,2],[1,0,3],[2,3,0]])
print(array1)
array1woZero = np.nonzero(array1)
print(array1woZero)
toBeRemoved = []
toBeRemoved.append(1)
print(toBeRemoved)
column = array1[:,1]
result = np.delete(column,toBeRemoved)
The above mentioned code does not bring the expected result. In fact, the np.delete() command just removes the value at index 1 - but I would need the value of 1 to be filtered out instead. What I also do not understand is the shape change when applying the nonzero to array1: While array1 is (3,3), the array1woZero turns out into a tuple of 2 dims with 6 values each.
0
Array of int64
(6,)
0
0
1
1
2
2
1
Array of int64
(6,)
1
2
0
2
0
1
My feeling is that I would require something like slicing with an exclusion operator. Do you have any hints for me to solve that? Is it necessary to use different data structures?
In [18]: arr = np.asarray([[0,1,2],[1,0,3],[2,3,0]])
In [19]: arr
Out[19]:
array([[0, 1, 2],
[1, 0, 3],
[2, 3, 0]])
nonzero gives the indices of all non-zero elements of its argument (arr):
In [20]: idx = np.nonzero(arr)
In [21]: idx
Out[21]: (array([0, 0, 1, 1, 2, 2]), array([1, 2, 0, 2, 0, 1]))
This is a tuple of arrays, one per dimension. That output can be confusing, but it is easily used to return all of those non-zero elements:
In [22]: arr[idx]
Out[22]: array([1, 2, 1, 3, 2, 3])
Indexing like this, with a pair of arrays, produces a 1d array. In your example there is just one 0 per row, but in general that's not the case.
This is the same indexing - with 2 lists of the same length:
In [24]: arr[[0,0,1,1,2,2], [1,2,0,2,0,1]]
Out[24]: array([1, 2, 1, 3, 2, 3])
idx[0] just selects on array of that tuple, the row indices. That probably isn't what you want. And I doubt if you want to apply np.delete to that tuple.
It's hard to tell from the description, and code, what you want. Maybe that's because you don't understand what nonzero is producing.
We can also select the nonzero elements with boolean masking:
In [25]: arr>0
Out[25]:
array([[False, True, True],
[ True, False, True],
[ True, True, False]])
In [26]: arr[ arr>0 ]
Out[26]: array([1, 2, 1, 3, 2, 3])
the hint with the boolean masking very good and helped me to develop my own solution. The symbolic names in the following code snippets are different, but the idea should become clear anyway.
At the beginning, I have my overall searchSpace.
searchSpace = relativeDistances[currentNode,:]
Assume that its shape is (5,). My filter is defined on the indexes, i.e. range 0..4. Then I define another numpy array "filter" of same shape with all 1, and the values to be filtered out I set to 0.
filter = np.full(shape=nodeCount,fill_value=1,dtype=np.int32())
filter[0] = 0
filter[3] = 0
searchSpace = searchSpace * filter
minValue = searchSpace[searchSpace > 0].min()
neighborNode = np.where(searchSpace==minValue)
The filter array provides me the flexibility to adjust the filter later on as part of a loop. Using the element-wise multiplication with 0 and subsequent boolean masking, I can create my reduced searchSpace for minimum search. Compared to a separate array or list, I still have the original shape, which is required to get the correct index in the where-statement.

Numpy extract submatrix

I'm pretty new in numpy and I am having a hard time understanding how to extract from a np.array a sub matrix with defined columns and rows:
Y = np.arange(16).reshape(4,4)
If I want to extract columns/rows 0 and 3, I should have:
[[0 3]
[12 15]]
I tried all the reshape functions...but cannot figure out how to do this. Any ideas?
Give np.ix_ a try:
Y[np.ix_([0,3],[0,3])]
This returns your desired result:
In [25]: Y = np.arange(16).reshape(4,4)
In [26]: Y[np.ix_([0,3],[0,3])]
Out[26]:
array([[ 0, 3],
[12, 15]])
One solution is to index the rows/columns by slicing/striding. Here's an example where you are extracting every third column/row from the first to last columns (i.e. the first and fourth columns)
In [1]: import numpy as np
In [2]: Y = np.arange(16).reshape(4, 4)
In [3]: Y[0:4:3, 0:4:3]
Out[1]: array([[ 0, 3],
[12, 15]])
This gives you the output you were looking for.
For more info, check out this page on indexing in NumPy.
print y[0:4:3,0:4:3]
is the shortest and most appropriate fix .
First of all, your Y only has 4 col and rows, so there is no col4 or row4, at most col3 or row3.
To get 0, 3 cols: Y[[0,3],:]
To get 0, 3 rows: Y[:,[0,3]]
So to get the array you request: Y[[0,3],:][:,[0,3]]
Note that if you just Y[[0,3],[0,3]] it is equivalent to [Y[0,0], Y[3,3]] and the result will be of two elements: array([ 0, 15])
You can also do this using:
Y[[[0],[3]],[0,3]]
which is equivalent to doing this using indexing arrays:
idx = np.array((0,3)).reshape(2,1)
Y[idx,idx.T]
To make the broadcasting work as desired, you need the non-singleton dimension of your indexing array to be aligned with the axis you're indexing into, e.g. for an n x m 2D subarray:
Y[<n x 1 array>,<1 x m array>]
This doesn't create an intermediate array, unlike CT Zhu's answer, which creates the intermediate array Y[(0,3),:], then indexes into it.
This can also be done by slicing: Y[[0,3],:][:,[0,3]]. More elegantly, it is possible to slice arrays (or even reorder them) by given sets of indices for rows, columns, pages, et cetera:
r=np.array([0,3])
c=np.array([0,3])
print(Y[r,:][:,c]) #>>[[ 0 3][12 15]]
for reordering try this:
r=np.array([0,3])
c=np.array([3,0])
print(Y[r,:][:,c])#>>[[ 3 0][15 12]]

Categories

Resources