Basics of numpy where function, what does it do to the array?

Basics of numpy where function, what does it do to the array? - python

I have seen the post Difference between nonzero(a), where(a) and argwhere(a). When to use which? and I don't really understand the use of the where function from numpy module.
For example I have this code
import numpy as np
Z =np.array(
[[1,0,1,1,0,0],
[0,0,0,1,0,0],
[0,1,0,1,0,0],
[0,0,1,1,0,0],
[0,1,0,0,0,0],
[0,0,0,0,0,0]])
print Z
print np.where(Z)
Which gives:
(array([0, 0, 0, 1, 2, 2, 3, 3, 4], dtype=int64),
array([0, 2, 3, 3, 1, 3, 2, 3, 1], dtype=int64))
The definition of where function is:
Return elements, either from x or y, depending on condition. But it doesn't also makes sense to me
So what does the output exactly mean?

np.where returns indices where a given condition is met. In your case, you're asking for the indices where the value in Z is not 0 (e.g. Python considers any non-0 value as True). Which for Z results in:
(0, 0) # top left
(0, 2) # third element in the first row
(0, 3) # fourth element in the first row
(1, 3) # fourth element in the second row
... # and so on
np.where starts to make sense in the following scenarios:
a = np.arange(10)
np.where(a > 5) # give me all indices where the value of a is bigger than 5
# a > 5 is a boolean mask like [False, False, ..., True, True, True]
# (array([6, 7, 8, 9], dtype=int64),)
Hope that helps.

Related

Delete numpy axis 1 based on condition

I need to remove values from a np axis based on a condition.
For example, I would want to remove [:,2] (the second values on axis 1) if the first value == 0, else I would want to remove [:,3].
Input:
[[0,1,2,3],[0,2,3,4],[1,3,4,5]]
Output:
[[0,1,3],[0,2,4],[1,3,4]]
So now my output has one less value on the 1st axis, depending on if it met the condition or not.
I know I can isolate and manipulate this based on
array[np.where(array[:,0] == 0)] but then I would have to deal with each condition separately, and it's very important for me to preserve the order of this array.
I am dealing with 3D arrays & am hoping to be able to calculate all this simultaneously while preserving the order.
Any help is much appreciated!

A possible solution:
a = np.array([[0,1,2,3],[0,2,3,4],[1,3,4,5]])
b = np.arange(a.shape[1])
np.apply_along_axis(
lambda x: x[np.where(x[0] == 0, np.delete(b,2), np.delete(b,3))], 1, a)
Output:
array([[0, 1, 3],
[0, 2, 4],
[1, 3, 4]])

Since you are starting and ending with a list, a straight forward iteration is a good solution:
In [261]: alist =[[0,1,2,3],[0,2,3,4],[1,3,4,5]]
In [262]: for row in alist:
...: if row[0]==0: row.pop(2)
...: else: row.pop(3)
...:
In [263]: alist
Out[263]: [[0, 1, 3], [0, 2, 4], [1, 3, 4]]
A possible array approach:
In [273]: arr = np.array([[0,1,2,3],[0,2,3,4],[1,3,4,5]])
In [274]: mask = np.ones(arr.shape, bool)
In [275]: mask[np.arange(3),np.where(arr[:,0]==0,2,3)]=False
In [276]: mask
Out[276]:
array([[ True, True, False, True],
[ True, True, False, True],
[ True, True, True, False]])
arr[mask] will be 1d, but since we are deleting the same number of elements each row, we can reshape it:
In [277]: arr[mask].reshape(arr.shape[0],-1)
Out[277]:
array([[0, 1, 3],
[0, 2, 4],
[1, 3, 4]])
I expect the list approach will be faster for small cases, but the array should scale better. I don't know where the trade off is.

How to randomly select one nonzero element per row from a sparse matrix with out for loop in python

I have a large sparse matrix whose each row contains multiple nonzero elements, for example
a = np.array([[1, 1,0,0,0,0], [2,0, 1,0,2,0], [3,0,4,0,0, 3]])
I want to be able to randomly select one nonzero element per row without for loop. Any good suggestion? As output, I am more interested in chosen elements' index than its value.

With a numpy array such as:
arr = np.array([5, 2, 6, 0, 2, 0, 0, 6])
you can do arr != 0 which will give a True / False array of values which pass the condition so in our case, where the values are not equal (!=) to 0. So:
array([ True, True, True, False, True, False, False, True], dtype=bool)
from here, we can 'index' arr with this boolean array by doing arr[arr != 0] which gives us:
array([5, 2, 6, 2, 6])
So now that we have a way of removing the non-zero values from a numpy array, we can do a simple list comprehension on each row in your a array. For each row, we remove the zeros and then perform a random.choice on the array. As so:
np.array([np.random.choice(r[r!=0]) for r in a])
which gives you back an array of length 3 containing random non-zero items from each row in a. :)
Hope this helps!
Update
If you want the indexes of the random non-zero numbers in the array, you can use .nonzero().
So if we have this array:
arr = np.array([5, 2, 6, 0, 2, 0, 0, 6])
we can do:
arr.nonzero()
which gives a tuple of the indexes of non-zero elements:
(array([0, 1, 2, 4, 7]),)
so as with before, we can use this and np.random.choice() in a list-comprehension to produce random indexes:
a = np.array([[1, 1, 0, 0, 0, 0], [2, 0, 1, 0, 2, 0], [3, 0, 4, 0, 0, 3]])
np.array([np.random.choice(r.nonzero()[0]) for r in a])
which returns an array of the form [x, y, z] where x, y and z are random indexes of non-zero elements from their corresponding rows.
E.g. one result could be:
array([1, 4, 2])
And if you want it to also return the rows, you could just add in a numpy.arrange() call on the length of a to get an array of row numbers:
([np.arange(len(a))], np.array([np.random.choice(r.nonzero()[0]) for r in a]))
so an example random output could be:
([array([0, 1, 2])], array([1, 2, 5]))
for a as:
array([[1, 1, 0, 0, 0, 0],
[2, 0, 1, 0, 2, 0],
[3, 0, 4, 0, 0, 3]])
Hope this does what you want now :)

Find indices of the elements smaller than x in a numpy array

Assuming that I have a numpy array such as:
import numpy as np
arr = np.array([10,1,2,5,6,2,3,8])
How could I extract an array containing the indices of the elements smaller than 6 so I get the following result:
np.array([1,2,3,5,6])
I would like something that behave like np.nonzero() but instead of testing for nonzero value, it test for value smaller than x

You can use numpy.flatnonzero on the boolean mask and Return indices that are non-zero in the flattened version of a:
np.flatnonzero(arr < 6)
# array([1, 2, 3, 5, 6])
Another option on 1d array is numpy.where:
np.where(arr < 6)[0]
# array([1, 2, 3, 5, 6])

The simplest way one can do this is by
arr[arr<6]

I'd suggest a cleaner and self-explainable way to do so:
First, find the indices where the condition is valid:
>> indices = arr < 6
>> indices
>> [False, True, True, True, False, True, False]
Then, use the indices for indexing:
>> arr[indices]
>> [1, 2, 5, 2, 3]
or for finding the right position in the original array:
>> np.where(indices)[0]
>> [1, 2, 3, 5, 6]

select first occurance of minimum index from numpy array

I am trying to find out the index of the minimum value in each row and I am using below code.
#code
import numpy as np
C = np.array([[1,2,4],[2,2,5],[4,3,3]])
ind = np.where(C == C.min(axis=1).reshape(len(C),1))
ind
#output
(array([0, 1, 1, 2, 2], dtype=int64), array([0, 0, 1, 1, 2], dtype=int64))
but the problem it is returning all indices of minimum values in each row. but I want only the first occurrence of minimum values. like
(array([0, 1, 2], dtype=int64), array([0, 0, 1], dtype=int64))

If you want to use comparison against the minimum value, we need to use np.min and keep the dimensions with keepdims set as True to give us a boolean array/mask. To select the first occurance, we can use argmax along each row of the mask and thus have our desired output.
Thus, the implementation to get the corresponding column indices would be -
(C==C.min(1, keepdims=True)).argmax(1)
Sample step-by-step run -
In [114]: C # Input array
Out[114]:
array([[1, 2, 4],
[2, 2, 5],
[4, 3, 3]])
In [115]: C==C.min(1, keepdims=1) # boolean array of min values
Out[115]:
array([[ True, False, False],
[ True, True, False],
[False, True, True]], dtype=bool)
In [116]: (C==C.min(1, keepdims=True)).argmax(1) # argmax to get first occurances
Out[116]: array([0, 0, 1])
The first output of row indices would simply be a range array -
np.arange(C.shape[0])
To achieve the same column indices of first occurance of minimum values, a direct way would be to use np.argmin -
C.argmin(axis=1)

Find location of pair of elements in two arrays in numpy

I have two numpy arrays x and y
Suppose x = [0, 1, 1, 1, 3, 4, 5, 5, 5] and y = [0, 2, 3, 4, 2, 1, 3, 4, 5]
The length of both arrays is the same and the coordinate pair I am looking for definitely exists in the array.
How can I find the index of (a, b) in these arrays, where a is an element in x and b is the corresponding element in y. For example, the index of (1, 4) would be 3: the elements at index 3 of x and y are 1 and 4 respectively.

You could use numpy.where combined with numpy.logical_and if you want a purely numpy solution:
In [16]: import numpy as np
In [17]: x = np.array([0, 1, 1, 1, 3, 4, 5, 5, 5])
In [18]: y = np.array([0, 2, 3, 4, 2, 1, 3, 4, 5])
In [19]: np.where(np.logical_and(x == 1, y == 4))[0]
Out[19]: array([3], dtype=int64)
numpy.logical_and allows you to element-wise perform a logical AND operation between two numpy arrays. What we're doing here is determining which locations contain both the x values being 1 and the y values being 4 in the same corresponding locations. Those locations that satisfy this are True. numpy.where determines the locations in the array where this condition is satisfied. numpy.where actually returns both row and column locations of where the condition is True separately as a tuple of two elements, but as we are only considered with one dimension, only the first tuple is valid which is why we immediately index the first element of the tuple.
The output is a numpy array of locations where the condition is valid. You can even go further and coerce the output to be a list of indices to make things neater and/or if it is required (thanks #EddoHintoso):
In [20]: list(np.where(np.logical_and(x == 1, y == 4))[0])
Out[20]: [3]

You could compare your first array with the first value, second array with the second value and then find where both True. Then you could get that True with argmax which will give you first index of the first True occurence:
x = np.array([0, 1, 1, 1, 3, 4, 5, 5, 5])
y = np.array([0, 2, 3, 4, 2, 1, 3, 4, 5])
idx = ((x == 1) & (y == 4)).argmax()
In [35]: idx
Out[35]: 3
In [36]: x == 1
Out[36]: array([False, True, True, True, False, False, False, False, False], dtype=bool)
In [37]: y == 4
Out[37]: array([False, False, False, True, False, False, False, True, False], dtype=bool)
If you could have multiple occurence you could use following using nonzero:
idx_list = ((x == 1) & (y == 4))
idx = idx_list.nonzero()[0]
In [51]: idx
Out[51]: array([3], dtype=int64)
Or if you need list of indices:
In [57]: idx_list.nonzero()[0].tolist()
Out[57]: [3]
You could do that in one line with:
idx = ((x == 1) & (y == 4)).nonzero()[0]

x = [0, 1, 1, 1, 3, 4, 5, 5, 5]
y = [0, 2, 3, 4, 2, 1, 3, 4, 5]
w=zip(x,y)
w.index((1,4))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Basics of numpy where function, what does it do to the array? - python

Related

Delete numpy axis 1 based on condition

How to randomly select one nonzero element per row from a sparse matrix with out for loop in python

Find indices of the elements smaller than x in a numpy array

select first occurance of minimum index from numpy array

Find location of pair of elements in two arrays in numpy

Categories

Resources