I have an array in which i have element like a = array.array('i',[3,5,7,2,8,9,10,37,99]). Now I have to find 4th largest element, If this is a list , then i can find by this way,
l = [3,5,7,2,8,9,10,37,99]
m = sorted(l)
m[-4]
you could use numpy.argsort that gives you the indices of the min values in order. So:
from numpy import argsort
index_to_fourth_largest_element = argsort(a)[-4]
But if you use this solution (meaning that you use numpy) and plan to do more with the array you could considering using numpy.array instead of array.array in the first place.
Related
Let's say I have a list: l=[7,2,20,9] and I wan't to find the minimum absolute difference among all elements within (in this case it would be 9-7 = 2 or equivalently |7-9|). To do it in nlogn complexity, I need to do sort, take the difference, and find the minimum element:
import numpy as np
sorted_l = sorted(l) # sort list
diff_sorted = abs(np.diff(sorted_l)) # get absolute value differences
min_diff = min(diff_sorted) # get min element
However, after doing this, I need to track which elements were used in the original l list that gave rise to this difference. So for l the minimum difference is 2 and the output I need is 7 and 9 since 9-7 is 2. Is there a way to do this? sorted method ruins the order and it's hard to backtrack. Am I missing something obvious? Thanks.
Use:
index = diff_sorted.tolist().index(min_diff)
sorted_l[index:index+2]
Output
[7, 9]
Whole Script
import numpy as np
l=[12,24,36,35,7]
sorted_l = sorted(l)
diff_sorted = np.diff(sorted_l)
min_diff = min(diff_sorted)
index = diff_sorted.tolist().index(min_diff)
sorted_l[index:index+2]
Output
[35, 36]
Explanation
tolist is transforming the numpy array into a list whose functions contain a index which gives you the index of the input argument. Therefore, using tolist and index functions, we get the index of the minimum in the sorted array. Using this index, we get two numbers which resulted the minimum difference ([index:index+2] is selecting two number in the sorted array)
I have an array of 2d indices.
indices = [[2,4], [6,77], [102,554]]
Now, I have a different 4-dimensional array, arr, and I want to only extract an array (it is an array, since it is 4-dimensional) with corresponding index in the indices array. It is equivalent to the following code.
for i in range(len(indices)):
output[i] = arr[indices[i][0], indices[i][1]]
However, I realized that using explicit for-loop yields a slow result. Is there any built-in numpy API that I can utilized? At this point, I tried using np.choose, np.put, np.take, but did not succeed to yield what I wanted. Thank you!
We need to index into the first two axes with the two columns from indices (thinking of it as an array).
Thus, simply convert to array and index, like so -
indices_arr = np.array(indices)
out = arr[indices_arr[:,0], indices_arr[:,1]]
Or we could extract those directly without converting to array and then index -
d0,d1 = [i[0] for i in indices], [i[1] for i in indices]
out = arr[d0,d1]
Another way to extract the elements would be with conversion to tuple, like so -
out = arr[tuple(indices_arr.T)]
If indices is already an array, skip the conversion process and use indices in places where we had indices_arr.
Try using the take function of numpy arrays. Your code should be something like:
outputarray= np.take(arr,indices)
Assume I have a sorted array of tuples which is sorted by the first value. I want to find the first index where a condition on the first element of the tuple holds. i.e. How do I replace the following code
test_array = [(1,2),(3,4),(5,6),(7,8),)(9,10)]
min_value = 5
index = 0
for c in test_array:
if c[0] > min_value:
break
else:
index = index + 1
With the equivalent of a matlab find ?
i.e. At the end of this loop I expect to get 3 but I'd like to make this more efficient. I an fine with using numpy for this. I tried using argmax but to no avail.
Thanks
Since the list is sorted and if you know the max possible value for the second element (or if there can only be 1 element with the same first value), you could apply bisect on the list of tuples (returns the sorted insertion position in the list)
import bisect
test_array = [(1,2),(3,4),(5,6),(7,8),(9,10)]
min_value = 5
print(bisect.bisect_left(test_array,(min_value,10000)))
Hardcoding to 10000 is bad, so if you only have integers you can do that instead:
print(bisect.bisect_left(test_array,(min_value+1,)))
result: 3
if you had floats (also works with integers) you could use sys.float_info.epsilon like this:
print(bisect.bisect_left(test_array,(min_value*(1+sys.float_info.epsilon),)))
It has O(log(n)) complexity so it's much better than a simple for loop when there are a lot of elements.
In general, numpy's where is used in a fashion similar to MATLAB's find. However, from an efficiency standpoint, I where cannot be controlled to return only the first element found. So, from a computational perspective, what you're doing here is not arguably less inefficient.
The where equivalent would be
index = numpy.where(numpy.array([t[0] for t in test_array]) >= min_value)
index = index[0] - 1
You can use numpy to indicate the elements that obey the conditions and then use argmax(), to get the index of the first one
import numpy
test_array = numpy.array([(1,2),(3,4),(5,6),(7,8),(9,10)])
min_value = 5
print (test_array[:,0]>min_value).argmax()
if you would like to find all of the elements that obey the condition, use can replace argmax() by nonzero()[0]
I want to achieve the same result with least complexity in python as min(Ar(Ar~=0)) in MATLAB where Ar is a 2D numpy array.
For those who are not familiar with MATLAB, ~= means != or not equal to.
Is there a function in python which returns the indexes of the elements:
1. Whose values fulfill a condition (elements which are != 0 in this case)
2.
Which can directly be used as list index input for another array? (As (Ar~=0)'s result is being used as an input like this Ar(Ar~=0)
Here Ar~=0 has been used as list index input like this Ar(Ar~=0) and then min of the array Ar(Ar~=0) is being found out. In other words minimum value of the array is found out excluding the elements whose value is 0.
The python syntax for a numpy array A would be:
A[A!=0].min()
you can also set the array elements:
B = A.copy()
B[A==0] = A[A!=0].min()
just as an example setting a cutoff
I have a dataframe column of n-dimensional arrays (let's call it data) and a list of n-dimensional arrays (let's call it means). For each item in the dataframe, I'm trying to grab the distance between the item in the dataframe and closest item in the means list as well as the index of the minimum in the list. I'm able to get the minimum distances using the following:
distances = [min([spatial.distance.cosine(i.ravel(),j.ravel()) for i in means])
for j in data['data']]
However I'm struggling to expand this to include the index of the element in the means list as well. I've tried enumerating the means list, but I'm not sure where to put the count iterator. Any ideas here?
Use np.argmin instead of min. Also, I think spatial.distance accept numpy array. So you could probably do:
idx = np.argmin(spatial.distance.cosine(data['data'].values[:, None],
means[None, :]) )
You can use map to replace the inner loop and use the apply method to replace the outer loop:
distances = data['data'].apply(lambda j: np.min(map(lambda i: spatial.distance.cosine(i.ravel(), j.ravel()), means))))
if you want to get the minimum index simply use np.argmin instead of min or np.min as pointed out in another answer. Or replace the np.min by list to get all the distances, then apply both functions separetely