I have a large 2D numpy array and want to find the indices of the 1D arrays inside it that meet a condition: e.g., have at least a value greater than a given threshold x.
I already can do it the following way but is there a shorter, more efficient way to do it?
import numpy
a = numpy.array([[1,2,3,4,5], [1,2,3,4,20], [1,2,2,4,5]])
indices = []
i = 0
x = 10
for item in a:
if any(j > x for j in item):
indices.append(i)
i += 1
print(indices) # gives [1]
You could use numpy's built-in boolean operations:
import numpy as np
a = np.array([[1,2,3,4,5], [1,2,3,4,20], [1,2,2,4,5]])
indices = np.argwhere(np.any(a > 10, axis=1))
Related
I know this is something stupid... but I'm missing it. I'm trying to fill a numpy array with random values (1-10) and then iterate through the array to check those values. If I fill the array with values 1-9... no problem, if one of the values is ever 10... I get an out of bounds error.
When iterating through an array using:
for x in array:
if array[x]==y: do_something_with_x
Is x not the element in the array? If that's the case, why is my code referencing the value of x as 10 and causing the array to go out of bounds?
import numpy as np
import random
arr_orig = np.zeros(10)
arr_copy = np.copy(arr_orig)
arr_orig = np.random.randint(1,11,10)
lowest_val = arr_orig[0]
for x in arr_orig:
if arr_orig[x] < lowest_val: lowest_val = arr_orig[x]
arr_copy[0] = lowest_val
print (arr_orig)
print (arr_copy)
As in the comment from #John Gordon, it's an indexing problem. I think you meant:
for x in arr_orig:
if x < lowest_val: lowest_val = x
I want to know how to code an efficient index over a numpy array. For the moment, I go over the array elements using repeated additions. For example, I have to make some loops over an array A like this:
import numpy as np
A = np.arange(0,100)
M = 10
for i in range(A.shape[0]-M):
B = []
for j in range(M):
value = A[i+j]
B.append(value)
Is there a way for me to get the values without repeatedly doing the i+j addition?
I have two arrays of length n, namely old_fitness and new_fitness, and two matrices of dimension nxm, namely old_values and new_values.
What is the best way to create an nxm matrix best_fitness that comprises row new_values[i] when new_fitness[i] > old_fitness[i] and old_values[i] otherwise?
Something like:
best_values = nd.where(new_fitness > old_fitness, new_values, old_values)
but that works on rows of the last two matrices, instead of individual elements? I'm sure there's an easy answer, but I am a complete newbie to numpy.
Edit: new_values and old_values contain rows that represent possible solutions to a problem, and new_fitness and old_fitness contain a numeric measure of fitness for each possible solution / row in new_values and old_values respectively.
Should work as long as the comparison is of shape (n,1) - not (n,)
import numpy as np
old_fitness = np.asarray([0,1])
new_fitness = np.asarray([1,0])
old_value = np.asarray([[1,2], [3,4]])
new_value = np.asarray([[5,6], [7,8]])
np.where((new_fitness>old_fitness).reshape(old_fitness.shape[0],1), new_value, old_value)
returns
array([[5, 6],
[3, 4]])
Another possible solution, working on numpy arrays:
best_values = numpy.copy(old_values)
best_values[new_fitness > old_fitness, :] = new_values[new_fitness > old_fitness, :]
Are the arrays of equal length? If so zip them and then use a map function to return the desired output.
For example, something like:
bests = map(new_val if new_val > old_val else old_val for (old_val, new_val) in zip(old_fitness, new_fitness))
Edit: this is probably better
bests = map(lambda n, o: n if n > o else o, new_fitness, old_fitness)
Here's another one that works too!
bests = [np.max(pair) for pair in zip(new_fitness, old_fitness)]
I'm interested in the multi-dimensional case of Increment Numpy array with repeated indices.
I have an N-dimensional array and a set N index arrays, who's values I want to increment. The index arrays might have have repeated entries.
Without repeats, the solution is
a = arange(24).reshape(2,3,4)
i = array([0,0,1])
j = array([0,1,1])
k = array([0,0,3])
a[i,j,k] += 1
With repeats, (ex. j=array([0,0,2]) ), I'm unable to make numpy increment the replicates.
How about this:
import numpy as np
a = np.zeros((2,3,4))
i = np.array([0,0,1])
j = np.array([0,0,1])
k = np.array([0,0,3])
ijk = np.vstack((i,j,k)).T
H,edge = np.histogramdd(ijk,bins=a.shape)
a += H
I don't know if there is an easier solution with direct array indexing, but this works:
for x,y,z in zip(i,j,k):
a[x,y,z] +=1
What's an efficient way, given a NumPy matrix (2D array), to return the minimum/maximum n values (along with their indices) in the array?
Currently I have:
def n_max(arr, n):
res = [(0,(0,0))]*n
for y in xrange(len(arr)):
for x in xrange(len(arr[y])):
val = float(arr[y,x])
el = (val,(y,x))
i = bisect.bisect(res, el)
if i > 0:
res.insert(i, el)
del res[0]
return res
This takes three times longer than the image template matching algorithm that pyopencv does to generate the array I want to run this on, and I figure that's silly.
Since the time of the other answer, NumPy has added the numpy.partition and numpy.argpartition functions for partial sorting, allowing you to do this in O(arr.size) time, or O(arr.size+n*log(n)) if you need the elements in sorted order.
numpy.partition(arr, n) returns an array the size of arr where the nth element is what it would be if the array were sorted. All smaller elements come before that element and all greater elements come afterward.
numpy.argpartition is to numpy.partition as numpy.argsort is to numpy.sort.
Here's how you would use these functions to find the indices of the minimum n elements of a two-dimensional arr:
flat_indices = numpy.argpartition(arr.ravel(), n-1)[:n]
row_indices, col_indices = numpy.unravel_index(flat_indices, arr.shape)
And if you need the indices in order, so row_indices[0] is the row of the minimum element instead of just one of the n minimum elements:
min_elements = arr[row_indices, col_indices]
min_elements_order = numpy.argsort(min_elements)
row_indices, col_indices = row_indices[min_elements_order], col_indices[min_elements_order]
The 1D case is a lot simpler:
# Unordered:
indices = numpy.argpartition(arr, n-1)[:n]
# Extra code if you need the indices in order:
min_elements = arr[indices]
min_elements_order = numpy.argsort(min_elements)
ordered_indices = indices[min_elements_order]
Since there is no heap implementation in NumPy, probably your best guess is to sort the whole array and take the last n elements:
def n_max(arr, n):
indices = arr.ravel().argsort()[-n:]
indices = (numpy.unravel_index(i, arr.shape) for i in indices)
return [(arr[i], i) for i in indices]
(This will probably return the list in reverse order compared to your implementation - I did not check.)
A more efficient solution that works with newer versions of NumPy is given in this answer.
I just met the exact same problem and solved it.
Here is my solution, wrapping the np.argpartition:
Applied to arbitrary axis.
High speed when K << array.shape[axis], o(N).
Return both the sorted result and the corresponding indexs in original matrix.
def get_sorted_smallest_K(array, K, axis=-1):
# Find the least K values of array along the given axis.
# Only efficient when K << array.shape[axis].
# Return:
# top_sorted_scores: np.array. The least K values.
# top_sorted_indexs: np.array. The least K indexs of original input array.
partition_index = np.take(np.argpartition(array, K, axis), range(0, K), axis)
top_scores = np.take_along_axis(array, partition_index, axis)
sorted_index = np.argsort(top_scores, axis=axis)
top_sorted_scores = np.take_along_axis(top_scores, sorted_index, axis)
top_sorted_indexs = np.take_along_axis(partition_index, sorted_index, axis)
return top_sorted_scores, top_sorted_indexs