How to get indices of values of numpy array which are zero - python

Lets say I have a numpy array as below:
array([ 1. , 2. , 0. , 3.4, 0. , 1.1])
Now I want to get indices of all the elements which are zero. Reason being I want to get those indices and then for a different array I want to convert the elements on same indices to zero.
To get indices of non zero I know we can use nonzero or argwhere.
np.nonzero(a)
(array([0, 1, 3, 5]),)
Then lets say I have array b I can use the above indices list and convert all the elements of array b with same indices to zero like below:
b[np.nonzero(a)]=0
This is fine but it is for non zero indices. How to get for zero indices.

If you just want to use the result for indexing purposes, it's more efficient to make a boolean array than an array of indices:
a == 0
You can index with that the same way you would have used the array of indices:
b[a == 0] = 0
If you really want an array of indices, you can call np.nonzero on the boolean array:
np.nonzero(a == 0)
np.where also works.

Related

Insert into numpy array with condition from another array

I have a numpy array arr containing 0s and 1s,
arr = np.random.randint(2, size=(800,800))
Then I casted it to astype(np.float32) and inserted various float numbers at various positions. In fact, what I would like to do is insert those float numbers only where the original array had 1 rather than 0; where the original array had 0 I want to keep 0.
My thought was to take a copy of the array (with .copy()) and reinsert from that later. So now I have arr above (1s and 0s), and a same-shaped array arr2 with numerical elements. I want to replace the elements in arr2 with those in arr only where (and everywhere where) the element in arr is 0. How can I do this?
Small example:
arr = np.array([1,0],
[0,1])
arr2 = np.array([2.43, 5.25],
[1.54, 2.59])
Desired output:
arr2 = np.array([2.43, 0],
[0, 2.59])
N.B. should be as fast as possible on arrays of around 800x800
Simply do:
arr2[arr == 0] = 0
or
arr2 = arr2 * arr
#swag2198 is correct, an alternative is below
Numpy has a functin called 'where' which allows you to set values based on a condition from another array - this is essentially masking
below will achieve what you want - essentially it will return an array the same dimensions as arr2, except wherever there is a zero in arr, it will be replaced with zero
arr = np.array([[1,0],[0,1]])
arr2 = np.array([[2.43, 5.25],
[1.54, 2.59]])
arr_out = np.where(arr, arr2, 0)
the advantage of this way is that you can pick values based on two arrays if you wish - say you wanted to mask an image for instance - replace the background

Why fancy indexing is not same as slicing in numpy?

I have been learning Fancy indexing but when I observed the behavior of the following code I got a couple of questions...
According to my understanding,
Fancy Indexing is:
ndArray[ [0,1,2] ] i.e. passing a list of rows / columns
and
Slicing is:
ndArray[ 0:3 ] i.e. giving a range of rows / columns
Now, the problem
A numpy array,
arr = [ [1,2,3],
[4,5,6],
[7,8,9] ]
When I try fancy indexing:
arr[ [0,1], [1,2] ]
>>> [2, 6]
And when slice it,
arr[:2, 1:]
>>> [ [2, 3],
[5, 6] ]
Essentially both of them should return the two-dimension array as both of them mean the same, as they are used interchangeably!
:2 should be equivalent to [0,1] #For rows
1: should be equivalent to [1,2] #For cols
The question:
Why Fancy indexing is not returning as the slice notation? And how to achieve that?
Please enlighten me.
Thanks
Fancy indexing and slicing behave differently by definition / by numpy specification.
So, instead of questioning why that is so, it is better to:
Be able to recognize / distinguish / tell them apart (i.e., have a clear understanding of when does the indexing become fancy indexing, and when is it slicing).
Be aware of the differences in their semantics (outcomes).
In your example:
In the case of fancy indexing, the indices generated for the two axes are combined "in tandem" (similar to how the zip function combines two input sequences "in tandem". (In the words of the official numpy documentation, the two index arrays are "iterated together"). We are passing the list [0, 1] for indexing the array on axis 0, and passing the list [1, 2] for indexing the array on axis 1. The index 0 from the index array [0, 1] is combined only with the corresponding index 1 of the index array [1, 2]. Similarly, the index 1 of the index array [0, 1] is combined only with the corresponding index 2 of the index array [1, 2]. In other words, the index arrays do not combine with each other in a many-to-many fashion. All this was about fancy indexing.
In the case of slicing, the slice :2 that is specified for axis 0 conceptually generates indices '0' and '1' for axis 0; and the slice 1: specified for axis 1 conceptually generates indices 1 and 2 for axis 1. But these generated indices combine in a many-to-many fashion, unlike in the case of fancy indexing. So, they produce four combinations rather than just two.
So, the crucial difference in the defined semantics of fancy indexing and slicing is that in the case of fancy indexing, the fancy index arrays are iterated together.

Why is numpy.argsort() shuffeling the indices for ties?

I am using python 3. The problem is with numpy.argsort().
I have two arrays (say A and B). I want to order values in array A by values in array B. I use this code.
A_ordered = A[B.argsort()]
In array B, there are good chances that ties occur. Sometimes even, every single value in array B is identical.
When there are ties in B, I don't want the values in A to change order. Hence, when values in B are tied, I expect them to keep their relative indices order using .argsort().
Here is an example of the problem when every value in B are tied. Indices given by np.argsort() look like they are shuffled.
B = np.empty(23000) #creating empty array
B.fill(0.5) #filling it with equal values of 0.5
print(B.argsort()) #trying to sort
Out[176]: array([ 0, 15338, 15337, ..., 7660, 7680, 22999], dtype=int64)
As all values in B are equal, what I expect is
Out[176]: array([ 0, 1, 2, ..., 22997, 22998, 22999], dtype=int64)
I don't want to use the method below for sorting A based on B, because in case of ties, values of A will be used for sorting.
A = [x for _,x in sorted(zip(B,A))]
Many thanks!
You need to tell argsort to use a stable sorting method.
>>> print(B.argsort(kind='stable')) #trying to sort
[ 0 1 2 ... 22997 22998 22999]

Getting a slice of a numpy ndarray (for arbitary dimensions)

I have a Numpy array of arbitrary dimensions, and an index vector containing one number for each dimension. I would like to get the slice of the array corresponding to the set of indices less than the value in the index array for all dimensions, e.g.
A = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9,10,11,12]])
index = [2,3]
result = [[1,2,3],
[5,6,7]]
The intuitive syntax for this would be something like A[:index], but this doesn't work for obvious reasons.
If the dimension of the array were fixed, I could write A[:index[0],:index[1],...:index[n]]; is there some kind of list comprehension I could use, like A[:i for i in index]?
You can slice multiple dimensions in one go:
result = A[:2,:3]
that slices dimension one up to the index 2 and dimension two up to the index 3.
If you have arbitary dimensions you can also create a tuple of slices:
slicer = tuple(slice(0, i, 1) for i in index)
result = A[slicer]
A slice defines the start(0), stop(the index you specified) and step(1) - basically like a range but useable for indexing. And the i-th entry of the tuple slices the i-th dimension of your array.
If you only specify stop-indices you can use the shorthand:
slicer = tuple(slice(i) for i in index)
I would recommend the first option if you know the number of dimensions and the last one if you don't.

Getting index of numpy.ndarray

I have a one-dimensional array of the type numpy.ndarray and I want to know the index of it's max entry. After finding the max, I used
peakIndex = numpy.where(myArray==max)
to find the peak's index. But instead of the index, my script spits out
peakIndex = (array([1293]),)
I want my code to spit out just the integer 1293. How can I clean up the output?
Rather than using numpy.where, you can use numpy.argmax.
peakIndex = numpy.argmax(myArray)
numpy.argmax returns a single number, the flattened index of the first occurrence of the maximum value. If myArray is multidimensional you might want to convert the flattened index to an index tuple:
peakIndexTuple = numpy.unravel_index(numpy.argmax(myArray), myArray.shape)
To find the max value of an array, you can use the array.max() method. This will probably be more efficient than the for loop described in another answer, which- in addition to not being pythonic- isn't actually written in python. (if you wanted to take items out of the array one by one to compare, you could use ndenumerate, but you'd be sacrificing some of the performance benefits of arrays)
The reason that numpy.where() yields results as tuples is that more than one position could be equal to the max... and it's that edge case that would make something simple (like taking array[0]) prone to bugs. Per Is there a Numpy function to return the first index of something in an array?,
"The result is a tuple with first all the row indices, then all the
column indices".
Your example uses a 1-D array, so you'd get the results you want directly from the array provided. It's a tuple with one element (one array of indices), and although you can iterate over ind_1d[0] directly, I converted it to a list solely for readability.
>>> peakIndex_1d
array([ 1. , 1.1, 1.6, 1. , 1.6, 0.8])
>>> ind_1d = numpy.where( peakIndex_1d == peakIndex_1d.max() )
(array([2, 4]),)
>>> list( ind_1d[0] )
[2, 4]
For a 2-D array with 3 values equal to the max, you could use:
>>> peakIndex
array([[ 0. , 1.1, 1.5],
[ 1.1, 1.5, 0.7],
[ 0.2, 1.2, 1.5]])
>>> indices = numpy.where( peakIndex == peakIndex.max() )
>>> ind2d = zip(indices[0], indices[1])
[(0, 2), (1, 1), (2, 2)]

Categories

Resources