I've come across statements like these where an assignment (in this case to the first column of a numpy data array) is followed by a boolean operator. Such as this for example.
indices = data[:,1] == 1
How would what happens here be explained in psuedocode, and what type of output is generated from this statement?
In this case this was followed by this stament:
jan_data = data[indices]
data[:,1] == 1 is an expression that will evaluate to a value. This value will be assigned to indices. Using parentheses, you can think of it as indices = (data[:,1] == 1). It is not "an assignment followed by a boolean operator". It is an assignment whose right-hand side is an expression containing a boolean operator. You can assign the result of a == b just like you can assign the result of a + b.
Types get to define what sort of value is returned by such comparisons. In this case, I suspect data is a numpy array, and comparing numpy arrays gives you another numpy array of boolean values, with True where the condition was true and False where it was false. So if data[:,1] was something like [1, 2, 3, 2, 1], the result of data[:,1] == 1 would be [True, False, False, False, True], and this is the value that would be assigned to indices.
Related
I have an array which I want to use boolean indexing on, with multiple index arrays, each producing a different array. Example:
w = np.array([1,2,3])
b = np.array([[False, True, True], [True, False, False]])
Should return something along the lines of:
[[2,3], [1]]
I assume that since the number of cells containing True can vary between masks, I cannot expect the result to reside in a 2d numpy array, but I'm still hoping for something more elegant than iterating over the masks the appending the result of indexing w by the i-th b mask to it.
Am I missing a better option?
Edit: The next step I want to do afterwards is to sum each of the arrays returned by w[b], returning a list of scalars. If that somehow makes the problem easier, I'd love to know as well.
Assuming you want a list of numpy arrays you can simply use a comprehension:
w = np.array([1,2,3])
b = np.array([[False, True, True], [True, False, False]])
[w[bool] for bool in b]
# [array([2, 3]), array([1])]
If your goal is just a sum of the masked values you use:
np.sum(w*b) # 6
or
np.sum(w*b, axis=1) # array([5, 1])
# or b # w
…since False times you number will be 0 and therefor won't effect the sum.
Try this:
[w[x] for x in b]
Hope this helps.
This question already has answers here:
Is there a NumPy function to return the first index of something in an array?
(20 answers)
Closed 5 years ago.
I'm looking for something similar to list.index(value) that works for numpy arrays. I think that numpy.where might do the trick, but I don't understand how it works, exactly. Could someone please explain
a) what this means
and b) whether or not it works like list.index(value) but with numpy arrays.
This is the article from the documentation:
numpy.where(condition[, x, y])
Return elements, either from x or y, depending on condition.
If only condition is given, return condition.nonzero().
Parameters: condition : array_like, bool
When True, yield x, otherwise yield y.
x, y : array_like, optional
Values from which to choose. x and y need to have the same shape as
condition. Returns: out : ndarray or tuple of ndarrays
If both x and y are specified, the output array contains elements of
x where condition is True, and elements from y elsewhere. If only
condition is given, return the tuple condition.nonzero(), the indices
where condition is True. See also nonzero, choose
Notes If x and y are given and input arrays are 1-D, where is
equivalent to: [xv if c else yv for (c,xv,yv) in
zip(condition,x,y)]
What it means?:
The numpy.where function takes a condition as an argument and returns the indices where that condition is true
Is it like list.index?:
It is close in that it returns the indices of the array where the condition is met, while list.index takes a value as the argument, this can be achieved with numpy.where by passing array == value as the condition.
Example:
Using the array
a = numpy.array([[1,2,3],
[4,5,6],
[7,8,9]])
and calling numpy.where(a == 4) returns (array([1]), array([0]))
calling numpy.where(a >= 4) returns (array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2])), two arrays of Y and X coordinates (respectively) where the condition is true.
Can someone explain what this code is doing?
a = np.array([[1, 2], [3, 4]])
a[..., [True, False]]
What is the [True, False] doing there?
Ellipsis Notation and Booleans as Integers
From the numpy docs:
Ellipsis expand to the number of : objects needed to make a selection tuple of the same length as x.ndim. There may only be a single ellipsis present
True and False are just obfuscated 0 and 1. Taking the example from the docs:
x = np.array([[[1],[2],[3]], [[4],[5],[6]]])
x[...,0]
# outputs: array([[1, 2, 3],
# [4, 5, 6]])
x[..., False] # same thing
The boolean values are specifying an index, just like the numbers 0 or 1 would.
In response to your question in the comments
It first seems magical that
a = np.array([[1, 2], [3, 4]])
a[..., [True, True]] # = [[2,2],[4,4]]
But when we consider it as
a[..., [1,1]] # = [[2,2],[4,4]]
It seems less impressive.
Similarly:
b = array([[1,2,3],[4,5,6]])
b[...,[2,2]] # = [[3,3],[5,5]]
After applying the ellipsis rules; the true and false grab column indices, just like 0, 1, or 17 would have
Boolean Arrays for Complex Indexing
There are some subtle differences (bool's have a different type than ints). A lot of the hairy details can be found here. These do not seem to have any roll in your code, but they are interesting in figuring out how numpy indexing works.
In particular, this line is probably what you're looking for:
In the future Boolean array-likes (such as lists of python bools) will
always be treated as Boolean indexes
On this page, they talk about boolean arrays, which are quite complex as an indexing tool
Boolean arrays used as indices are treated in a different manner
entirely than index arrays. Boolean arrays must be of the same shape
as the initial dimensions of the array being indexed
Skipping down a bit
Unlike in the case of integer index arrays, in the boolean case, the
result is a 1-D array containing all the elements in the indexed array
corresponding to all the true elements in the boolean array. The
elements in the indexed array are always iterated and returned in
row-major (C-style) order. The result is also identical to
y[np.nonzero(b)]. As with index arrays, what is returned is a copy of
the data, not a view as one gets with slices.
I've been looking for a way to efficiently check for duplicates in a numpy array and stumbled upon a question that contained an answer using this code.
What does this line mean in numpy?
s[s[1:] == s[:-1]]
Would like to understand the code before applying it. Looked in the Numpy doc but had trouble finding this information.
The slices [1:] and [:-1] mean all but the first and all but the last elements of the array:
>>> import numpy as np
>>> s = np.array((1, 2, 2, 3)) # four element array
>>> s[1:]
array([2, 2, 3]) # last three elements
>>> s[:-1]
array([1, 2, 2]) # first three elements
therefore the comparison generates an array of boolean comparisons between each element s[x] and its "neighbour" s[x+1], which will be one shorter than the original array (as the last element has no neighbour):
>>> s[1:] == s[:-1]
array([False, True, False], dtype=bool)
and using that array to index the original array gets you the elements where the comparison is True, i.e. the elements that are the same as their neighbour:
>>> s[s[1:] == s[:-1]]
array([2])
Note that this only identifies adjacent duplicate values.
Check this out:
>>> s=numpy.array([1,3,5,6,7,7,8,9])
>>> s[1:] == s[:-1]
array([False, False, False, False, True, False, False], dtype=bool)
>>> s[s[1:] == s[:-1]]
array([7])
So s[1:] gives all numbers but the first, and s[:-1] all but the last.
Now compare these two vectors, e.g. look if two adjacent elements are the same. Last, select these elements.
s[1:] == s[:-1] compares s without the first element with s without the last element, i.e. 0th with 1st, 1st with 2nd etc, giving you an array of len(s) - 1 boolean elements. s[boolarray] will select only those elements from s which have True at the corresponding place in boolarray. Thus, the code extracts all elements that are equal to the next element.
It will show duplicates in a sorted array.
Basically, the inner expression s[1:] == s[:-1] compares the array with its shifted version. Imagine this:
1, [2, 3, ... n-1, n ]
- [1, 2, ... n-2, n-1] n
=> [F, F, ... F, F ]
In a sorted array, there will be no True in resulted array unless you had repetition. Then, this expression s[array] filters those which has True in the index array.
The future warning happens when you do something like this:
>>> numpy.asarray([1,2,3,None]) == None
Which currently returns False, but I understand will return an array containing [False,False,False,True] in a future version of Numpy.
As discussed on the numpy discussion list the way around this is to testa is None.
What confuses me is this behaviour of the in keyword with a 1D array compared to a list:
>>> None in [1,2,3,None]
True
>>> None in numpy.asarray([1,2,3,None])
__main__:1: FutureWarning: comparison to 'None' will result in an elementwise
object comparison in the future
False
>>> 1 in numpy.asarray([1,2,3,None])
True
EDIT (see comments) - There are really two different questions:
Why does this cause a FutureWarning - what will the future behaviour of None in numpy.asarray(...) be compared to what it is now?
Why the difference in behaviour of in from a list; can I test if my array contains None without converting it to a list or using a for loop?
Numpy version is 1.9.1, Python 3.4.1
The future warning happens when you do something like this:
numpy.asarray([1,2,3,4]) == None
Which currently returns False, but I understand will return an array containing [False,False,False,True] in a future version of Numpy.
As I mentioned in the comments, your example is incorrect. Future versions of numpy would return [False ,False, False, False], i.e. False for each element in the array that is not equal to None. This is more consistent with how element-wise comparisons to other scalar values currently work, e.g.:
In [1]: np.array([1, 2, 3, 4]) == 1
Out[1]: array([ True, False, False, False], dtype=bool)
In [2]: np.array(['a', 'b', 'c', 'd']) == 'b'
Out[2]: array([False, True, False, False], dtype=bool)
What confuses me is this behaviour of the in keyword with a 1D array compared to a list
When you test x in y, you are calling y.__contains__(x). When y is a list, __contains__ basically does something along the lines of this:
for item in y:
if (item is x) or (item == x):
return True
return False
As far as I can tell, np.ndarray.__contains__(x) performs the equivalent of this:
if any(y == x):
return True
else:
return False
That is to say it tests element-wise equality over the whole array first (y == x would be a boolean array the size of y). Since in your case you are testing whether y == None, this will raise the FutureWarning for the reasons given above.
In the comments you also wanted to know why
np.nan in np.array([1, 2, 3, np.nan])
returns False, but
np.nan in [1, 2, 3, np.nan]
returns True. The first part is easily explained by the fact that np.nan != np.nan (see here for the rationale behind this). To understand why the second case returns True, remember that list.__contains__() first checks for identity (is) before checking equality (==). Since np.nan is np.nan, the second case will return True.