Indexing with Masked Arrays in numpy - python

I have a bit of code that attempts to find the contents of an array at indices specified by another, that may specify indices that are out of range of the former array.
input = np.arange(0, 5)
indices = np.array([0, 1, 2, 99])
What I want to do is this:
print input[indices]
and get
[0 1 2]
But this yields an exception (as expected):
IndexError: index 99 out of bounds 0<=index<5
So I thought I could use masked arrays to hide the out of bounds indices:
indices = np.ma.masked_greater_equal(indices, 5)
But still:
>print input[indices]
IndexError: index 99 out of bounds 0<=index<5
Even though:
>np.max(indices)
2
So I'm having to fill the masked array first, which is annoying, since I don't know what fill value I could use to not select any indices for those that are out of range:
print input[np.ma.filled(indices, 0)]
[0 1 2 0]
So my question is: how can you use numpy efficiently to select indices safely from an array without overstepping the bounds of the input array?

Without using masked arrays, you could remove the indices greater or equal to 5 like this:
print input[indices[indices<5]]
Edit: note that if you also wanted to discard negative indices, you could write:
print input[indices[(0 <= indices) & (indices < 5)]]

It is a VERY BAD idea to index with masked arrays. There was a (very short) time with using MaskedArrays for indexing would have thrown an exception, but it was a bit too harsh...
In your test, you're filtering indices to find the entries matching a condition. What should you do with the missing entries of your MaskedArray ? Is the condition False ? True ? Should you use a default ? It's up to you, the user, to decide what to do.
Using indices.filled(0) means that when an item of indices is masked (as in, undefined), you want to take the first index (0) as default. Probably not what you wanted.
Here, I would have simply used input[indices.compressed()] : the compressed method flattens your MaskedArray, keeping only the unmasked entries.
But as you realized, you probably didn't need MaskedArrays in the first place

Related

Numpy tensor indexing [duplicate]

This question already has an answer here:
why there is deference between the output type of this two Numpy slice commands
(1 answer)
Closed last year.
I'm a little bit new to Python & Numpy, but I've noticed that when you call operator [] on a numpy array A, if it's a single index that is used (e.g., A[1]), the resulting sub-array is 1 dimension smaller, but if it's a range of indices (e.g., A[1:]) the dimension of the subarray remains unchanged, even if the range of indices covers only a single index, e.g., in this above case, if A was 2x2, A[1:] is effectively just a single index, but the resulting size is not the same as A[1].
My question is: is this always true in that if you supply a range of indices when extracting a subarray, the dimension doesn't change, and that a single index always reduces the dimension by 1? Are there edge cases?
That is always the case. When you use one index-value, e.g. A[1], you are effectively saying "give me the subarray A[1]", which, by definition, has a dimensionality smaller (by 1).
When you request a range of indices, e.g., A[1:] you are "cropping" A, to get everything but the first slice (A[0]). See, the range of indices define the axis you "lost" in the previous case (A[1]).
The following docs should be helpful to understand numpy arrays (indexing):
Arrays: https://numpy.org/doc/stable/user/absolute_beginners.html#more-information-about-arrays
Indexing: https://numpy.org/doc/stable/user/basics.indexing.html

Creating logical array from numpy array

I have a very large numpy array in Python full of meteorological data. In order to observe flawed data, I would like to look at every value and test it to see if it is less than -1. Eventually I would like to represent this with a logical array of 0's and 1's with 1 representing indices where the value is less than -1 and zeros representing all others. I have tried using the numpy.where funtion as follows
logarr = np.where(metdat < -1)
which returns the original array and the array of zeros for when this condition is true (around 200 times). I have tried using the numpy.where syntax laid out in Sci.Py.org where
logarr = np.where(metdat < -1 [1,0])
but my program dislikes the syntax. What am I doing wrong or would anyone recommend a better way of going about this?
Thanks,
jmatt
This works for your case, which directly converts the type from logical to int:
(matdat < -1).astype(int)
Or for np.where, the syntax needs to be:
np.where(matdat < -1, 1, 0)

Interpreting numpy.where results

I'm confused by what the results of numpy.where mean, and how to use it to index into an array.
Have a look at the code sample below:
import numpy as np
a = np.random.randn(10,10,2)
indices = np.where(a[:,:,0] > 0.5)
I expect the indices array to be 2-dim and contain the indices where the condition is true. We can see that by
indices = np.array(indices)
indices.shape # (2,120)
So it looks like indices is acting on the flattened array of some sort, but I'm not able to figure out exactly how. More confusingly,
a.shape # (20,20,2)
a[indices].shape # (2,120,20,2)
Question:
How does indexing my array with the output of np.where actually grow the size of the array? What is going on here?
You are basing your indexing on a wrong assumption: np.where returns something that can be immediatly used for advanced indexing (it's a tuple of np.ndarrays). But you convert it to a numpy array (so it's now a np.ndarray of np.ndarrays).
So
import numpy as np
a = np.random.randn(10,10,2)
indices = np.where(a[:,:,0] > 0.5)
a[:,:,0][indices]
# If you do a[indices] the result would be different, I'm not sure what
# you intended.
gives you the elements that are found by np.where. If you convert indices to a np.array it triggers another form of indexing (see this section of the numpy docs) and the warning message in the docs gets very important. That's the reason why it increases the total size of your array.
Some additional information about what np.where means: You get a tuple containing n arrays. n is the number of dimensions of the input array. So the first element that satisfies the condition has index [0][0], [1][0], ... [n][0] and not [0][0], [0][1], ... [0][n]. So in your case you have (2, 120) meaning you have 2 dimensions and 120 found points.

Finding indexes for use with np.ravel

I would like to use np.ravel to create a similar return structure as seen in the MATLAB code below:
[xi yi imv1] = find(squeeze(imagee(:,:,1))+0.1);
imv1 = imv1 - 0.1;
[xi yi imv2] = find(squeeze(imagee(:,:,2))+0.1);
imv2 = imv2 - 0.1;
where imagee is a matrix corresponding to values of a picture obtained from imread().
so, the(almost) corresponding Python translation is:
imv1=np.ravel(imagee**[:,:,0]**,order='F')
Where the bolded index splicing is clearly not the same as MATLAB. How do I specify the index values in Pythonic so that my return values will be the same as that found in the MATLAB portion? I believe this MATLAB code is written as "access all rows, columns, in the specified array of the third dimension." Therefore, how to specify this third parameter in Python?
To retrieve indexes, I usually use np.where. Here's an example: You have a 2 dimensional array
a = np.asarray([[0,1,2],[3,4,5]])
and want to get the indexes where the values are above a threshold, say 2. You can use np.where with the condition a>2
idxX, idxY = np.where(a>2)
which in turn you can use to address a
print a[idxX, idxY]
>>> [3 4 5]
However, the same effect can be achieved by indexing:
print a[a>2]
>>> [3 4 5]
This works on ravel'ed arrays as well as on three dimensional. Using 3D arrays with the first method however will require you to foresee more index arrays.

Numpy nonzero/flatnonzero index order; order of returned elements in boolean indexing

I'm wondering about the order of indices returned by numpy.nonzero / numpy.flatnonzero.
I couldn't find anything in the docs about it. It just says:
A[nonzero(flag)] == A[flag]
While in most cases this is enough, there are some when you need a sorted list of indices. Is it guaranteed that returned indices are sorted in case of 1-D or I need to sort them explicitly? (A similar question is the order of elements returned simply by selecting with a boolean array (A[flag]) which must be the same according to the docs.)
Example: finding the "gaps" between True elements in flag:
flag=np.array([True,False,False,True],dtype=bool)
iflag=flatnonzero(flag)
gaps= iflag[1:] - iflag[:-1]
Thanks.
Given the specification for advanced (or "fancy") indexing with integers, the guarantee that A[nonzero(flag)] == A[flag] is also a guarantee that the values are sorted low-to-high in the 1-d case. However, in higher dimensions, the result (while "sorted") has a different structure than you might expect.
In short, given a 1-dimensional array of integers ind and a 1-dimensional array x to be indexed, we have the following for all valid i defined for ind:
result[i] = x[ind[i]]
result takes the shape of ind, and contains the values of x at the indices indicated by ind. This means that we can deduce that if x[flag] maintains the original order of x, and if x[nonzero(flag)] is the same as x[flag], then nonzero(flag) must always produce indices in sorted order.
The only catch is that for multidimensional arrays, the indices are stored as distinct arrays for each dimension being indexed. So in other words,
x[array([0, 1, 2]), array([0, 0, 0])]
is equal to
array([x[0, 0], x[1, 0], x[2, 0]])
The values are still sorted, but each dimension is broken out into its own array. (You can do interesting things with broadcasting as a result; but that's beyond the scope of this answer.)
The only problem with this line of reasoning is that -- to my great surprise -- I can't find an explicit statement guaranteeing that boolean indexing preserves the original order of the array. Nonetheless, I'm quite certain from experience that it does. More generally, it would be unbelievably perverse to have x[[True, True, True]] return a reversed version of x.

Categories

Resources