I'm confused by what the results of numpy.where mean, and how to use it to index into an array.
Have a look at the code sample below:
import numpy as np
a = np.random.randn(10,10,2)
indices = np.where(a[:,:,0] > 0.5)
I expect the indices array to be 2-dim and contain the indices where the condition is true. We can see that by
indices = np.array(indices)
indices.shape # (2,120)
So it looks like indices is acting on the flattened array of some sort, but I'm not able to figure out exactly how. More confusingly,
a.shape # (20,20,2)
a[indices].shape # (2,120,20,2)
Question:
How does indexing my array with the output of np.where actually grow the size of the array? What is going on here?
You are basing your indexing on a wrong assumption: np.where returns something that can be immediatly used for advanced indexing (it's a tuple of np.ndarrays). But you convert it to a numpy array (so it's now a np.ndarray of np.ndarrays).
So
import numpy as np
a = np.random.randn(10,10,2)
indices = np.where(a[:,:,0] > 0.5)
a[:,:,0][indices]
# If you do a[indices] the result would be different, I'm not sure what
# you intended.
gives you the elements that are found by np.where. If you convert indices to a np.array it triggers another form of indexing (see this section of the numpy docs) and the warning message in the docs gets very important. That's the reason why it increases the total size of your array.
Some additional information about what np.where means: You get a tuple containing n arrays. n is the number of dimensions of the input array. So the first element that satisfies the condition has index [0][0], [1][0], ... [n][0] and not [0][0], [0][1], ... [0][n]. So in your case you have (2, 120) meaning you have 2 dimensions and 120 found points.
Related
I find this behaviour an utter nonsense. This happens only with numpy arrays, typical Python's arrays will just throw an error.
Let's create two arrays:
randomNumMatrix = np.random.randint(0,20,(3,3,3), dtype=np.int)
randRow = np.array([0,1,2], dtype=np.int)
If we pass an array as index to get something from another array, an original array is returned.
randomNumMatrix[randRow]
The code above returns an equivalent of randomNumMatrix. I find this unintuitive. I would expect it, not to work or at least return an equivalent of
randomNumMatrix[randRow[0]][randRow[1]][randRow[2]].
Additional observations:
A)
The code below does not work, it throws this error: IndexError: index 3 is out of bounds for axis 0 with size 3
randRow = np.array([0, 1, 3], dtype=np.int)
B)
To my surprise, the code below works:
randRow = np.array([0, 1, 2, 2,0,1,2], dtype=np.int)
Can somebody please explain what are the advantages of this feature?
In my opinion it only creates much confusion.
What is?
randomNumMatrix[randRow[0]][randRow[1]][randRow[2]]
That's not a valid Python.
In numpy there is a difference between
arr[(x,y,z)] # equivalent to arr[x,y,z]
and
arr[np.array([x,y,z])] # equivalent to arr[np.array([x,y,z]),:,:]
The tuple provides a scalar index for each dimension. The array (or list) provides multiple indices for one dimension.
You may need to study the numpy docs on indexing, especially advanced indexing.
Hello I have the following question. I create zero arrays of dimension (40,30,80). Now I need 7*7*7 of these zero arrays in an array. How can I do this?
One of my matrices is created like this:
import numpy as np
zeroMatrix = np.zeros((40,30,80))
My first method was to put the zero matrices in a 7*7*7 list. But i want to have it all in a numpy array. I know that there is a way with structured arrays I think, but i dont know how. If i copy my 7*7*7 list with np.copy() it creates a numpy array with the given shape, but there must be a way to do this instantly, isnt there?
EDIT
Maybe I have to make my question clearer. I have a 7*7 list of my zero matrices. In a for loop all of that arrays will be modified. In another step, this tempory list is appended to an empty list which will have a length of 7 in the end ( So i append the 7*7 list 7 times to the empty list. In the end I have a 7*7*7 List of those matrices. But I think this will be better If I have a numpy array of these zero matrices from the beginning.
Building an array of same-shaped arrays is not well supported by numpy which prefers to create a maximum depth array of minimum depth elements instead.
It turns out that numpy.frompyfunc is quite useful in circumventing this tendency where it is unwanted.
In your specific case one could do:
result = np.frompyfunc(zeroMatrix.copy, 0, 1)(np.empty((7, 7, 7), object))
Indeed:
>>> result.shape
(7, 7, 7)
>>> result.dtype
dtype('O')
>>> result[0, 0, 0].shape
(40, 30, 80)
I have an array of 2d indices.
indices = [[2,4], [6,77], [102,554]]
Now, I have a different 4-dimensional array, arr, and I want to only extract an array (it is an array, since it is 4-dimensional) with corresponding index in the indices array. It is equivalent to the following code.
for i in range(len(indices)):
output[i] = arr[indices[i][0], indices[i][1]]
However, I realized that using explicit for-loop yields a slow result. Is there any built-in numpy API that I can utilized? At this point, I tried using np.choose, np.put, np.take, but did not succeed to yield what I wanted. Thank you!
We need to index into the first two axes with the two columns from indices (thinking of it as an array).
Thus, simply convert to array and index, like so -
indices_arr = np.array(indices)
out = arr[indices_arr[:,0], indices_arr[:,1]]
Or we could extract those directly without converting to array and then index -
d0,d1 = [i[0] for i in indices], [i[1] for i in indices]
out = arr[d0,d1]
Another way to extract the elements would be with conversion to tuple, like so -
out = arr[tuple(indices_arr.T)]
If indices is already an array, skip the conversion process and use indices in places where we had indices_arr.
Try using the take function of numpy arrays. Your code should be something like:
outputarray= np.take(arr,indices)
I have a 2d numpy array, for instance as:
import numpy as np
a1 = np.zeros( (500,2) )
a1[:,0]=np.arange(0,500)
a1[:,1]=np.arange(0.5,1000,2)
# could be also read from txt
then I want to select the indexes corresponding to a slice that matches a criteria such as all the value a1[:,1] included in the range (l1,l2):
l1=20.0; l2=900.0; #as example
I'd like to do in a condensed expression. However, neither:
np.where(a1[:,1]>l1 and a1[:,1]<l2)
(it gives ValueError and it suggests to use np.all, which it is not clear to me in such a case); neither:
np.intersect1d(np.where(a1[:,1]>l1),np.where(a1[:,1]<l2))
is working (it gives unhashable type: 'numpy.ndarray')
My idea is then to use these indexes to map another array of size (500,n).
Is there any reasonable way to select indexes in such way? Or: is it necessary to use some mask in such case?
This should work
np.where((a1[:,1]>l1) & (a1[:,1]<l2))
or
np.where(np.logical_and(a1[:,1]>l1, a1[:,1]<l2))
Does this do what you want?
import numpy as np
a1 = np.zeros( (500,2) )
a1[:,0]=np.arange(0,500)
a1[:,1]=np.arange(0.5,1000,2)
c=(a1[:,1]>l1)*(a1[:,1]<l2) # boolean array, true if the item at that position is ok according to the criteria stated, false otherwise
print a1[c] # prints all the points in a1 that correspond to the criteria
afterwards you can than just select from your new array that you make, the points that you need (assuming your new array has dimensions (500,n)) , by doing
print newarray[c,:]
I have a bit of code that attempts to find the contents of an array at indices specified by another, that may specify indices that are out of range of the former array.
input = np.arange(0, 5)
indices = np.array([0, 1, 2, 99])
What I want to do is this:
print input[indices]
and get
[0 1 2]
But this yields an exception (as expected):
IndexError: index 99 out of bounds 0<=index<5
So I thought I could use masked arrays to hide the out of bounds indices:
indices = np.ma.masked_greater_equal(indices, 5)
But still:
>print input[indices]
IndexError: index 99 out of bounds 0<=index<5
Even though:
>np.max(indices)
2
So I'm having to fill the masked array first, which is annoying, since I don't know what fill value I could use to not select any indices for those that are out of range:
print input[np.ma.filled(indices, 0)]
[0 1 2 0]
So my question is: how can you use numpy efficiently to select indices safely from an array without overstepping the bounds of the input array?
Without using masked arrays, you could remove the indices greater or equal to 5 like this:
print input[indices[indices<5]]
Edit: note that if you also wanted to discard negative indices, you could write:
print input[indices[(0 <= indices) & (indices < 5)]]
It is a VERY BAD idea to index with masked arrays. There was a (very short) time with using MaskedArrays for indexing would have thrown an exception, but it was a bit too harsh...
In your test, you're filtering indices to find the entries matching a condition. What should you do with the missing entries of your MaskedArray ? Is the condition False ? True ? Should you use a default ? It's up to you, the user, to decide what to do.
Using indices.filled(0) means that when an item of indices is masked (as in, undefined), you want to take the first index (0) as default. Probably not what you wanted.
Here, I would have simply used input[indices.compressed()] : the compressed method flattens your MaskedArray, keeping only the unmasked entries.
But as you realized, you probably didn't need MaskedArrays in the first place