Explain numpy.where [duplicate] - python

This question already has answers here:
Is there a NumPy function to return the first index of something in an array?
(20 answers)
Closed 5 years ago.
I'm looking for something similar to list.index(value) that works for numpy arrays. I think that numpy.where might do the trick, but I don't understand how it works, exactly. Could someone please explain
a) what this means
and b) whether or not it works like list.index(value) but with numpy arrays.
This is the article from the documentation:
numpy.where(condition[, x, y])
Return elements, either from x or y, depending on condition.
If only condition is given, return condition.nonzero().
Parameters: condition : array_like, bool
When True, yield x, otherwise yield y.
x, y : array_like, optional
Values from which to choose. x and y need to have the same shape as
condition. Returns: out : ndarray or tuple of ndarrays
If both x and y are specified, the output array contains elements of
x where condition is True, and elements from y elsewhere. If only
condition is given, return the tuple condition.nonzero(), the indices
where condition is True. See also nonzero, choose
Notes If x and y are given and input arrays are 1-D, where is
equivalent to: [xv if c else yv for (c,xv,yv) in
zip(condition,x,y)]

What it means?:
The numpy.where function takes a condition as an argument and returns the indices where that condition is true
Is it like list.index?:
It is close in that it returns the indices of the array where the condition is met, while list.index takes a value as the argument, this can be achieved with numpy.where by passing array == value as the condition.
Example:
Using the array
a = numpy.array([[1,2,3],
[4,5,6],
[7,8,9]])
and calling numpy.where(a == 4) returns (array([1]), array([0]))
calling numpy.where(a >= 4) returns (array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2])), two arrays of Y and X coordinates (respectively) where the condition is true.

Related

How to extract numpy array stored in tuple?

Let's consider very easy example:
import numpy as np
a = np.array([0, 1, 2])
print(np.where(a < -1))
(array([], dtype=int64),)
print(np.where(a < 2))
(array([0, 1]),)
I'm wondering if its possible to extract length of those arrays, i.e. I want to know that the first array is empty, and the second is not. Usually it can be easily done with len function, however now numpy array is stored in tuple. Do you know how it can be done?
Just use this:
import numpy as np
a = np.array([0, 1, 2])
x = np.where(a < 2)[0]
print(len(x))
Outputs 2
To find the number of values in the array satisfying the predicate, you can skip np.where and use np.count_nonzero instead:
a = np.array([0, 1, 2])
print(np.count_nonzero(a < -1))
>>> 0
print(np.count_nonzero(a < 2))
>>> 2
If you need to know whether there are any values in a that satisfy the predicate, but not how many there are, a cleaner way of doing so is with np.any:
a = np.array([0, 1, 2])
print(np.any(a < -1))
>>> False
print(np.any(a < 2))
>>> True
np.where takes 3 arguments: condition, x, y where last two are arrays and are optional. When provided the funciton returns element from x for indices where condition is True, and y otherwise. When only condition is provided it acts like np.asarray(condition).nonzero() and returns a tuple, as in your case. For more details see Note at np.where.
Alternatively, because you need only length of sublist where condition is True, you can simply use np.sum(condition):
a = np.array([0, 1, 2])
print(np.sum(a < -1))
>>> 0
print(np.sum(a < 2))
>>> 2

How to check numpy array is empty? [duplicate]

How can I check whether a numpy array is empty or not?
I used the following code, but this fails if the array contains a zero.
if not self.Definition.all():
Is this the solution?
if self.Definition == array([]):
You can always take a look at the .size attribute. It is defined as an integer, and is zero (0) when there are no elements in the array:
import numpy as np
a = np.array([])
if a.size == 0:
# Do something when `a` is empty
https://numpy.org/devdocs/user/quickstart.html (2020.04.08)
NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes.
(...) NumPy’s array class is called ndarray. (...) The more important attributes of an ndarray object are:
ndarray.ndim
the number of axes (dimensions) of the array.
ndarray.shape
the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.
ndarray.size
the total number of elements of the array. This is equal to the product of the elements of shape.
One caveat, though.
Note that np.array(None).size returns 1!
This is because a.size is equivalent to np.prod(a.shape),
np.array(None).shape is (), and an empty product is 1.
>>> import numpy as np
>>> np.array(None).size
1
>>> np.array(None).shape
()
>>> np.prod(())
1.0
Therefore, I use the following to test if a numpy array has elements:
>>> def elements(array):
... return array.ndim and array.size
>>> elements(np.array(None))
0
>>> elements(np.array([]))
0
>>> elements(np.zeros((2,3,4)))
24
Why would we want to check if an array is empty? Arrays don't grow or shrink in the same that lists do. Starting with a 'empty' array, and growing with np.append is a frequent novice error.
Using a list in if alist: hinges on its boolean value:
In [102]: bool([])
Out[102]: False
In [103]: bool([1])
Out[103]: True
But trying to do the same with an array produces (in version 1.18):
In [104]: bool(np.array([]))
/usr/local/bin/ipython3:1: DeprecationWarning: The truth value
of an empty array is ambiguous. Returning False, but in
future this will result in an error. Use `array.size > 0` to
check that an array is not empty.
#!/usr/bin/python3
Out[104]: False
In [105]: bool(np.array([1]))
Out[105]: True
and bool(np.array([1,2]) produces the infamous ambiguity error.
edit
The accepted answer suggests size:
In [11]: x = np.array([])
In [12]: x.size
Out[12]: 0
But I (and most others) check the shape more than the size:
In [13]: x.shape
Out[13]: (0,)
Another thing in its favor is that it 'maps' on to an empty list:
In [14]: x.tolist()
Out[14]: []
But there are other other arrays with 0 size, that aren't 'empty' in that last sense:
In [15]: x = np.array([[]])
In [16]: x.size
Out[16]: 0
In [17]: x.shape
Out[17]: (1, 0)
In [18]: x.tolist()
Out[18]: [[]]
In [19]: bool(x.tolist())
Out[19]: True
np.array([[],[]]) is also size 0, but shape (2,0) and len 2.
While the concept of an empty list is well defined, an empty array is not well defined. One empty list is equal to another. The same can't be said for a size 0 array.
The answer really depends on
what do you mean by 'empty'?
what are you really test for?

numpy apply_along_axis on a 1d array

What happens when numpy.apply_along_axis takes a 1d array as input? When I use it on 1d array, I see something strange:
y=array([1,2,3,4])
First try:
apply_along_axis(lambda x: x > 2, 0, y)
apply_along_axis(lambda x: x - 2, 0, y)
returns:
array([False, False, True, True], dtype=bool)
array([-1, 0, 1, 2])
However when I try:
apply_along_axis(lambda x: x - 2 if x > 2 else x, 0, y)
I get an error:
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I could of course use list comprehension then convert back to array instead, but that seems convoluted and I feel like I'm missing something about apply_along_axis when applied to a 1d array.
UPDATE: as per Jeff G's answer, my confusion stems from the fact that for 1d array with only one axis, what is being passed to the function is in fact the 1d array itself rather than the individual elements.
"numpy.where" is clearly better for my chosen example (and no need for apply_along_axis), but my question is really about the proper idiom for applying a general function (that takes one scalar and returns one scalar) to each element of an array (other than list comprehension), something akin to pandas.Series.apply (or map). I know of 'vectorize' but it seems no less unwieldy than list comprehension.
I'm unclear whether you're asking if y must be 1-D (answer is no, it can be multidimensional) or if you're asking about the function passed into apply_along_axis. To that, the answer is yes: the function you pass must take a 1-D array. (This is stated clearly in the function's documentation).
In your three examples, the type of x is always a 1-D array. The reason your first two examples work is because Python is implicitly broadcasting the > and - operators along that array.
Your third example fails because there is no such broadcasting along an array for if / else. For this to work with apply_along_axis you need to pass a function that takes a 1-D array. numpy.where would work for this:
>>> apply_along_axis(lambda x: numpy.where(x > 2, x - 2, x), 0, y)
array([1, 2, 1, 2])
P.S. In all these examples, apply_along_axis is unnecessary, thanks to broadcasting. You could achieve the same results with these:
>>> y > 2
>>> y - 2
>>> numpy.where(y > 2, y - 2, y)
This answer addresses the updated addendum to your original question:
numpy.vectorize will take an elementwise function and return a new function. The new function can be applied to an entire array. It's like map, but it uses the broadcasting rules of numpy.
f = lambda x: x - 2 if x > 2 else x # your elementwise fn
fv = np.vectorize(f)
fv(np.array([1,2,3,4]))
# Out[5]: array([1, 2, 1, 2])

assignment followed by boolean operator

I've come across statements like these where an assignment (in this case to the first column of a numpy data array) is followed by a boolean operator. Such as this for example.
indices = data[:,1] == 1
How would what happens here be explained in psuedocode, and what type of output is generated from this statement?
In this case this was followed by this stament:
jan_data = data[indices]
data[:,1] == 1 is an expression that will evaluate to a value. This value will be assigned to indices. Using parentheses, you can think of it as indices = (data[:,1] == 1). It is not "an assignment followed by a boolean operator". It is an assignment whose right-hand side is an expression containing a boolean operator. You can assign the result of a == b just like you can assign the result of a + b.
Types get to define what sort of value is returned by such comparisons. In this case, I suspect data is a numpy array, and comparing numpy arrays gives you another numpy array of boolean values, with True where the condition was true and False where it was false. So if data[:,1] was something like [1, 2, 3, 2, 1], the result of data[:,1] == 1 would be [True, False, False, False, True], and this is the value that would be assigned to indices.

Numpy multidimensional array slicing

Suppose I have defined a 3x3x3 numpy array with
x = numpy.arange(27).reshape((3, 3, 3))
Now, I can get an array containing the (0,1) element of each 3x3 subarray with x[:, 0, 1], which returns array([ 1, 10, 19]). What if I have a tuple (m,n) and want to retrieve the (m,n) element of each subarray(0,1) stored in a tuple?
For example, suppose that I have t = (0, 1). I tried x[:, t], but it doesn't have the right behaviour - it returns rows 0 and 1 of each subarray. The simplest solution I have found is
x.transpose()[tuple(reversed(t))].transpose()
but I am sure there must be a better way. Of course, in this case, I could do x[:, t[0], t[1]], but that can't be generalised to the case where I don't know how many dimensions x and t have.
you can create the index tuple first:
index = (numpy.s_[:],)+t
x[index]
HYRY solution is correct, but I have always found numpy's r_, c_ and s_ index tricks to be a bit strange looking. So here is the equivalent thing using a slice object:
x[(slice(None),) + t]
That single argument to slice is the stop position (i.e. None meaning all in the same way that x[:] is equivalent to x[None:None])

Categories

Resources