Find the first value that meets criteria, not its index - python

I have a array with these elements:
array= [21558 43101 64638 86173 107701 129232 150775 172355 193864 215457
237071 258586 280130 301687 23255 344790 366285 387838 409365 430856
452367 473893 495456 516955 538543 560110 581641 603188]
In my program, there is a variable n that is randomly sorted. What I'm trying to achieve is very simple, but I just can't get anything to work.
With the line below, I'll find the index of the first value that is greater than n
value_index=np.where(array > n)[0][0]
What I need is to find the value that it represents, not the index.
Of course, I can simply just insert the value_index variable and call the value in a list, but I'm tryign to be as efficient as possible.
Can anyone help me find the fastest way possible to find this value?

Numpy generally isn't very good at getting the first of something without first computing the rest of the values. There is no equivalent to Pythons's
next(x for x in array if x > n)
Instead, you have to compute the mask of x > n, and get the first index of that. There are better ways to do this than np.where:
ind = np.flatnonzero(array > n)[0]
OR
ind = np.argmax(array > n)[0]
In either case, your best bet to get the value is
array[ind]

Related

Printing final list in variable

I'm working my way through Elements of Programming Interview in Python and I'm trying to find an alternative solution for the first problem in the Arrays chapter.
The idea is that you are to write a program that takes an array A and index i into A and rearranges the elements such that all elements less than A[i] appear first, followed by elements equal to the pivot followed by elements greater than the pivot. In the book, they already provided a solution but I'm trying to figure out an alternative one. In a nutshell, I'm creating subarrays for each aspect such as smaller, equal, and larger than A[i]. For right now I'm working on storing the integer values of A that are less than the pivot. I want to iterate over the list of A and store the values of all elements less than the pivot in the smaller (variable). The idea is to eventually return the variable smaller which will contain all values smaller than the pivot. To check my work I used the print function just to check the value of smaller. It stored each iteration smaller than the pivot in that variable. Ideally using this approach I just want to return the final iteration of the smaller variable instead of each iteration. What should be my next steps? Hopefully, that makes sense, I really don't mind elaborating on any part. Thanks in advance.
def properArray(pivot_index, A):
pivot = A[pivot_index]
smaller = []
for i in range(len(A)):
if A[i] < pivot:
smaller.append(A[i])
print (smaller)
resized_array =properArray
resized_array(3, [1,5,6,9,3,4,6])
I guess this is what you are trying to achieve
def properArray(pivot_index, A):
pivot = A[pivot_index]
smaller = []
for i in range(len(A)):
if A[i] < pivot:
smaller.append(A[i])
print (smaller)
resized_array =properArray
resized_array(3, [1,5,6,9,3,4,6])
Instead of printing smaller array on every iteration, you need to print it once, after the for loop is complete

Is there a more efficient an robust way to create a minimum proximity algorithm for a distance matrix?

I am trying to make an algorithm that propagates from point to point in a distance matrix using the smallest distance in the proximity. The code has two conditions: the minimum distance must be no less than 0 and each point must be visited once and return to the starting position.
This is my code in its entirety:
def totalDistance(aList):
path = []
for j in range(0,len(aList)):
k=j
order = []
for l in range(0,len(aList)):
order.append(k)
initval= min(x for x in aList[k] if x > 0 )
k = aList[k].index(initval)
for s in range(0,len(aList)):
for t in range(0,len(aList[s])):
aList[s][k] = 0
path.append(order)
return path
The code is meant to return the indexes of the points in within the closes proximity of the evaluated point.
aList = [[0,3,4,6],[3,0,7,3],[4,7,0,9],[6,3,9,0]] and represents the distance matrix.
When running the code, I get the following error:
initval= min(x for x in aList[k] if x > 0 )
ValueError: min() arg is an empty sequence
I presume that when I make the columns in my distance matrix zero with the following function:
for s in range(0,len(aList)):
for t in range(0,len(aList[s])):
aList[s][k] = 0
the min() function is unable to find a value with the given conditions. Is there a better way to format my code such that this does not occur or a better approach to this problem all together?
One technique and a pointer on the rest that you say is working...
For preventing re-visiting / backtracking. One of the common design patterns for this is to keep a separate data structure to "mark" the places you've been. Because your points are numerically indexed, you could use a list of booleans, but I think it is much easier to just keep a set of the places you've been. Something like this...
visited = set() # places already seen
# If I decide to visit point/index "3"...
visited.add(3)
Not really a great practice to modify your input data as you are doing, and especially so if you are looping over it, which you are...leads to headaches.
So then... Your current error is occurring because when you screen the rows for x>0 you eventually get an empty list because you are changing values and then min() chokes. So part of above can fix that, and you don't need to zero-ize, just mark them.
Then, the obvious question...how to use the marks? You can just use it as a part of your search. And it can work well with the enumerate command which can return index values and the value by enumeration.
Try something like this, which will make a list of "eligible" tuples with the distance and index location.
pts_to_consider = [(dist, idx) for idx, dist in enumerate(aList[k])
if dist > 0
and idx not in visited]
There are other ways to do this with numpy and other things, but this is a reasonable approach and close to what you have in code now. Comment back if stuck. I don't want to give away the whole farm because this is probably H/W. Perhaps you can use some of the hints here.

Replace a loop in python with the equivalent of a matlab find

Assume I have a sorted array of tuples which is sorted by the first value. I want to find the first index where a condition on the first element of the tuple holds. i.e. How do I replace the following code
test_array = [(1,2),(3,4),(5,6),(7,8),)(9,10)]
min_value = 5
index = 0
for c in test_array:
if c[0] > min_value:
break
else:
index = index + 1
With the equivalent of a matlab find ?
i.e. At the end of this loop I expect to get 3 but I'd like to make this more efficient. I an fine with using numpy for this. I tried using argmax but to no avail.
Thanks
Since the list is sorted and if you know the max possible value for the second element (or if there can only be 1 element with the same first value), you could apply bisect on the list of tuples (returns the sorted insertion position in the list)
import bisect
test_array = [(1,2),(3,4),(5,6),(7,8),(9,10)]
min_value = 5
print(bisect.bisect_left(test_array,(min_value,10000)))
Hardcoding to 10000 is bad, so if you only have integers you can do that instead:
print(bisect.bisect_left(test_array,(min_value+1,)))
result: 3
if you had floats (also works with integers) you could use sys.float_info.epsilon like this:
print(bisect.bisect_left(test_array,(min_value*(1+sys.float_info.epsilon),)))
It has O(log(n)) complexity so it's much better than a simple for loop when there are a lot of elements.
In general, numpy's where is used in a fashion similar to MATLAB's find. However, from an efficiency standpoint, I where cannot be controlled to return only the first element found. So, from a computational perspective, what you're doing here is not arguably less inefficient.
The where equivalent would be
index = numpy.where(numpy.array([t[0] for t in test_array]) >= min_value)
index = index[0] - 1
You can use numpy to indicate the elements that obey the conditions and then use argmax(), to get the index of the first one
import numpy
test_array = numpy.array([(1,2),(3,4),(5,6),(7,8),(9,10)])
min_value = 5
print (test_array[:,0]>min_value).argmax()
if you would like to find all of the elements that obey the condition, use can replace argmax() by nonzero()[0]

Replace values in a 2D array with different random numbers

I have a 2D array (image) in which I want to replace array values greater than some threshold with a random number in some range. My attempt was to use numpy.random.uniform, as so
Z[Z > some_value] = uniform(lower_limit,upper_limit)
However I've found that this replaces all values above the threshold with the same random value. I would like to replace all array values above the threshold with a different random value each.
I think this would require some interation over the entire array for which I would need to generate a random value if the condition is met. How would I do this?
You are correct that iteration would be the correct way to go. Let's do a list comprehension.
[uniform(lower_limit, upper_limit) if i > some_value else i
for i in Z]
Let's step through it. Take an individual value. If it is greater than the threshold, use a randomly generated one, otherwise the original value.
uniform(lower_limit, upper_limit) if i > some_value else i
Repeat this for every element in Z
for i in Z
For a 2D array, nest multiple comprehensions. Imagine that the above solution was to hit everything in one row and then repeat it for every row.
[[uniform(lower_limit, upper_limit) if i > some_value else i
for i in row]
for row in Z]
Check the third argument to uniform. Using size=N will yield an array of random values with length N. Thus
z[z>some_value] = np.random.uniform(lower, upper, len(z>some_value))
will do what you want.

Efficiently find the range of an array in python?

Is there an accepted efficient way to find the range (ie. max value - min value) of a list of numbers in python? I have tried using a loop and I know I can use the min and max functions with subtraction. I am just wondering if there is some kind of built-in that is faster.
If you really need high performance, try Numpy. The function numpy.ptp computes the range of values (i.e. max - min) across an array.
You're unlikely to find anything faster than the min and max functions.
You could possibly code up a minmax function which did a single pass to calculate the two values rather than two passes but you should benchmark this to ensure it's faster. It may not be if it's written in Python itself but a C routine added to Python may do it. Something like (pseudo-code, even though it looks like Python):
def minmax (arr):
if arr is empty:
return (None, None)
themin = arr[0]
themax = arr[0]
for each value in arr[1:]:
if value < themin:
themin = value
else:
if value > themax:
themax = value
return (themin, themax)
Another possibility is to interpose your own class around the array (this may not be possible if you want to work on real arrays directly). This would basically perform the following steps:
mark the initial empty array clean.
if adding the first element to an array, set themin and themax to that value.
if adding element to a non-empty array, set themin and themax depending on how the new value compares to them.
if deleting an element that is equal to themin or themax, mark the array dirty.
if requesting min and max from a clean array, return themin and themax.
if requesting min and max from a dirty array, calculate themin and themax using loop in above pseudo-code, then set array to be clean.
What this does is to cache the minimum and maximum values so that, at worst, you only need to do the big calculation infrequently (after deletion of an element which was either the minimum or maximum). All other requests use cached information.
In addition, the adding of elements keep themin and themax up to date without a big calculation.
And, possibly even better, you could maintain a dirty flag for each of themin and themax, so that dirtying one would still allow you to use the cached value of the nother.
If you use Numpy and you have an 1-D array (or can create one quickly from a list), then there's the function numpy.ptp():
http://docs.scipy.org/doc/numpy/reference/generated/numpy.ptp.html

Categories

Resources