I have a list of 29400 values in Python and I'm trying to check if each element of the list is larger than both the 2000 neighbors to its left and the 2000 neighbors to its right. If the element is larger than its 4000 neighbors, I want to retrieve the index of the element. If there aren't 2000 neighbors to the right I just want to compare it with future elements until I reach the end of the list, and vice versa if there aren't 2000 values to the left.
def find_peaks(t):
prev = []
future = []
peak_index = []
for i in range(len(t)-2000): # compare element with previous values
for j in range(1,2001):
if t[i]<t[i+j]:
prev.append(False)
break
if j==2000:
if t[i]<t[i+j]:
prev.append(False)
break
prev.append(True)
for i in range(1999,len(t)-1): # compare element with future values
for j in range(1,2001):
if t[i]<t[i-j]:
future.append(False)
break
if j==2000:
if t[i]<t[i-j]:
future.append(False)
break
future.append(True)
future = future[::-1] # reverse list
for i in range(0,len(prev)-1):
if prev[i] == True:
if prev[i] == True:
peak_index.append(i)
Does anyone know of any better ways to go about this? I was having trouble comparing elements near the end and the beginning of the list- If there aren't 2000 elements left in the list for me to compare with, then the list wraps around to the beginning of the list, which isn't something I want.
You can use some list comprehension, so the actual search becomes a one-liner. I cannot judge about speed and beauty, but it takes just some seconds on my machine.
import random
# create list of random numbers and manually insert two peaks
t = [random.randrange(1, 1000) for r in range(29400)] # found this here: https://stackoverflow.com/questions/16655089/python-random-numbers-into-a-list
t[666] = 2000
t[6666] = 2000
# finds the peak elements
peaks = [index for index, value in enumerate(t) if value == max(t[max(index-2000, 0):min(index+2000, len(t))])]
print peaks # includes 666 and 6666
A naive, iterative version would be to go through each element and do look behind/ahead a certain number of times to see if there are values larger than it, and if not consider it a pick:
def find_peaks(data, span=2000):
peaks = []
for i, value in enumerate(data):
peak = False # start with the assumed non-peak
for j in range(max(0, i - 1), max(0, i - span - 1), -1): # look behind
peak = value > data[j] # check if our value is larger than the selected neighbor
if not peak: # not a peak, break away
break
if peak: # look behind passed, look ahead:
for j in range(i + 1, min(i + span + 1, len(data))):
if value <= data[j]: # look ahead failed, break away
peak = False
break
if peak: # if look ahead passed...
peaks.append(i) # add it to our peaks list
return peaks
This is the most performant way to do it with look ahead/behind as it breaks away immediately when a condition is not met instead of checking every element with every element.
If you want to count in the neighbors that are of the same value when calculating a peak (so your current peak candidate is the same) you can use peak = value >= data[j] in the look behind and if value < data[j] in the look ahead portions.
My solution does not involve any language powerful inbuilt methods. Its just plain logic to find the target peaks. Basically I am iterating over each element and then in inner loop checking if its previous or future neighbors are present and less than the current element. I am initializing a isPeak variable to True for if none of the neighbors in both direction are greater than the current element, which indicates your current element is peak element. After that just getting the index of target element.
def find_peaks(t , neighbour_length=2000):
peak_index = []
for i in range(len(t)): # compare element with previous values
isPeak = True #intialize to true
for j in range(i, neighbour_length + i):
# Check if previous index value is present
if (2*i-j-1 >= 0):
# Check if next neighbour is less or break
if(t[i] <= t[2*i-j-1]):
isPeak = False
break
# Check if Future element is present
if (j+i+1 < len(t)):
#Check if next future neighbour ir less or break
if(t[i] <= t[i+j+1]):
isPeak = False
break
if(isPeak):
peak_index.append(i)
return peak_index
Hope it Helps!
Related
I am attempting to find the indices of the two smallest and the two largest values in python:
I have
import sklearn
euclidean_matrix=sklearn.metrics.pairwise_distances(H10.T,metric='euclidean')
max_index =np.where(euclidean_matrix==np.max(euclidean_matrix[np.nonzero(euclidean_matrix)]))
min_index=np.where(euclidean_matrix==np.min(euclidean_matrix[np.nonzero(euclidean_matrix)]))
min_index
max_index
I get the following output
(array([158, 272]), array([272, 158]))
(array([ 31, 150]), array([150, 31]))
the above code only returns the indices of the absolute smallest and the absolute largest values of the matrix, I would like to find the indices of the next smallest value and the indices of the next largest value. How can I do this? Ideally I would like to return the indices of the 2 largest values of the matrix and the indices of the two smallest values of the matrix. How can I do this?
I can think of a couple ways of doing this. Some of these depend on how much data you need to search through.
A couple of caveats: You will have to decide what to do when there are 1, 2, 3 elements only Or if all the same value, do you want min, max, etc to be identical? What if there are multiple items in max or min or min2, max2? which should be selected?
run min then remove that element run min on the rest. run max then remove that element and run on the rest (note that this is on the original and not the one with min removed). This is the least efficient method since it requires searching 4 times and copying twice. (Actually 8 times because we find the min/max then find the index.) Something like the in the pseudo code.
PSEUDO CODE:
max_index = np.where(euclidean_matrix==np.max(euclidean_matrix[np.nonzero(euclidean_matrix)]))
tmp_euclidean_matrix = euclidean_matrix #make sure this is a deepcopy
tmp_euclidean_matrix.remove(max_index) #syntax might not be right?
max_index2 = np.where(tmp_euclidean_matrix==np.max(tmp_euclidean_matrix[np.nonzero(tmp_euclidean_matrix)]))
min_index = np.where(euclidean_matrix==np.min(euclidean_matrix[np.nonzero(euclidean_matrix)]))
tmp_euclidean_matrix = euclidean_matrix #make sure this is a deepcopy
tmp_euclidean_matrix.remove(min_index) #syntax might not be right?
min_index2 = np.where(tmp_euclidean_matrix==np.min(tmp_euclidean_matrix[np.nonzero(tmp_euclidean_matrix)]))
Sort the data (if you need it sorted anyway this is a good option) then just grab two smallest and largest. This isn't great unless you needed it sorted anyway because of many copies and comparisons to sort.
PSEUDO CODE:
euclidean_matrix.sort()
min_index = 0
min_index2 = 1
max_index = len(euclidean_matrix) - 1
max_index2 = max_index - 1
Best option would be to roll your own search function to run on the data, this would be most efficient because you would go through the data only once to collect them.
This is just a simple iterative approach, other algorithms may be more efficient. You will want to validate this works though.
PSEUDO CODE:
def minmax2(array):
""" returns (minimum, second minimum, second maximum, maximum)
"""
if len(array) == 0:
raise Exception('Empty List')
elif len(array) == 1:
#special case only 1 element need at least 2 to have different
minimum = 0
minimum2 = 0
maximum2 = 0
maximum = 0
else:
minimum = 0
minimum2 = 1
maximum2 = 1
maximum = 0
for i in range(1, len(array)):
if array[i] <= array[minimum]:
# a new minimum (or tie) will shift the other minimum
minimum2 = minimum
minimum = i
elif array[i] < array[minimum2]:
minimum2 = i
elif array[i] >= array[maximum]:
# a new maximum (or tie) will shift the second maximum
maximum2 = maximum
maximum = i
elif array[i] > array[maximum2]:
maximum2 = i
return (minimum, minimum2, maximum2, maximum)
edit: Added pseudo code
I have a function match that takes in a list of numbers and a target number and I want to write a function that finds within the array two numbers that add to that target.
Here is my approach:
>>> def match(values, target=3):
... for i in values:
... for j in values:
... if j != i:
... if i + j == target:
... return print(f'{i} and {j}')
... return print('no matching pair')
Is this solution valiant? Can it be improved?
The best approach would result in O(NlogN) solution.
You sort the list, this will cost you O(NlogN)
Once the list is sorted you get two indices, the former points to the first element, the latter -- to the latest element and you check to see if the sum of the elements matches whatever is your target. If the sum is above the target, you move the upper index down, if the sum is below the target -- you move the lower index up. Finish when the upper index is equal to the lower index. This operation is linear and can be done in O(N) time.
All in all, you have O(NlogN) for the sorting and O(N) for the indexing, bringing the complexity of the whole solution to O(NlogN).
There is room for improvement. Right now, you have a nested loop. Also, you do not return when you use print.
As you iterate over values, you are getting the following:
values = [1, 2, 3]
target = 3
first_value = 1
difference: 3 - 1 = 2
We can see that in order for 1 to add up to 3, a 2 is required. Rather than iterating over the values, we can simply ask 2 in values.
def match(values, target):
values = set(values)
for value in values:
summand = target - value
if summand in values:
break
else:
print('No matching pair')
print(f'{value} and {summand}')
Edit: Converted values to a set since it has handles in quicker than if it were looking it up in a list. If you require the indices of these pairs, such as in the LeetCode problem you should not convert it to a set, since you will lose the order. You should also use enumerate in the for-loop to get the indices.
Edit: summand == value edge case
def match(values, target):
for i, value in enumerate(values):
summand = target - value
if summand in values[i + 1:]:
break
else:
print('No matching pair')
return
print(f'{value} and {summand}')
Given a list x, I want to sort it with selection sort, and then count the number of swaps made within the sort. So I came out with something like this:
count=0
a=0
n=len(x)
while (n-a)>0:
#please recommend a better way to swap.
i = (min(x[a:n]))
x[i], x[a] = x[a], x[i]
a += 1
#the count must still be there
count+=1
print (x)
Could you help me to find a way to manage this better? It doesn't work that well.
The problem is NOT about repeated elements. Your code doesn't work for lists with all elements distinct, either. Try x = [2,6,4,5].
i = (min(x[a:n]))
min() here gets the value of the minimum element in the slice, and then you use it as an index, that doesn't make sense.
You are confusing the value of an element, with its location. You must use the index to identify the location.
seq = [2,1,0,0]
beg = 0
n = len(seq)
while (n - beg) > 0:
jdx = seq[beg:n].index((min(seq[beg:n]))) # use the remaining unsorted right
seq[jdx + beg], seq[beg] = seq[beg], seq[jdx + beg] # swap the minimum with the first unsorted element.
beg += 1
print(seq)
print('-->', seq)
As the sorting progresses, the left of the list [0:beg] is sorted, and the right side [beg:] is being sorted, until completion.
jdx is the location (the index) of the minimum of the remaining of the list (finding the min must happen on the unsorted right part of the list --> [beg:])
Hi I've been reading up on finding the minimum of a multidimensional list, but if I have an N x N x 4 list, how do I get the minimum between every single 4th element? All other examples have been for a small example list using real indices. I suppose I'll be needing to define indices in terms of N....
[[[0,1,2,3],[0,1,2,3],...N],[[0,1,2,3],[0,1,2,3],...N].....N]
And then there's retrieving their indices.
I don't know what to try.
If anyone's interested in the actual piece of code:
relative = [[[[100] for k in range(5)] for j in range(N)] for i in range(N)]
What the following does is fill in the 4th element with times satisfying the mathematical equations. The 0th, 1st, 2nd and 3rd elements of relative have positions and velocities. The 4th spot is for the time taken for the i and jth particles to collide (redundant values such as i-i or j-i are filled with the value 100 (because it's big enough for the min function not to retrieve it). I need the shortest collision time (hence the 4th element comparisons)
def time(relative):
i = 0
t = 0
while i<N:
j = i+1
while j<N and i<N:
rv = relative[i][j][0]*relative[i][j][2]+relative[i][j][1]*relative[i][j][3] #Dot product of r and v
if rv<0:
rsquared = (relative[i][j][0])**2+(relative[i][j][1])**2
vsquared = (relative[i][j][2])**2+(relative[i][j][3])**2
det = (rv)**2-vsquared*(rsquared-diameter**2)
if det<0:
t = 100 #For negative times, assign an arbitrarily large number to make sure min() wont pick it up.
elif det == 0:
t = -rv/vsquared
elif det>0:
t1 = (-rv+sqrt((rv)**2-vsquared*(rsquared-diameter**2)))/(vsquared)
t2 = (-rv-sqrt((rv)**2-vsquared*(rsquared-diameter**2)))/(vsquared)
if t1-t2>0:
t = t2
elif t1-t2<0:
t = t1
elif rv>=0:
t = 100
relative[i][j][4]=t #Put the times inside the relative list for element ij.
j = j+1
i = i+1
return relative
I've tried:
t_fin = min(relative[i in range(0,N-1)][j in range(0,N-1)][4])
Which compiles but always returns 100 even thought I've checked it isnt the smallest element.
If you want the min of 4th element of NxNx4 list,
min([x[3] for lev1 in relative for x in lev1])
I work with a large amount of data and the execution time of this piece of code is very very important. The results in each iteration are interdependent, so it's hard to make it in parallel. It would be awesome if there is a faster way to implement some parts of this code, like:
finding the max element in the matrix and its indices
changing the values in a row/column with the max from another row/column
removing a specific row and column
Filling the weights matrix is pretty fast.
The code does the following:
it contains a list of lists of words word_list, with count elements in it. At the beginning each word is a separate list.
it contains a two dimensional list (count x count) of float values weights (lower triangular matrix, the values for which i>=j are zeros)
in each iteration it does the following:
it finds the two words with the most similar value (the max element in the matrix and its indices)
it merges their row and column, saving the larger value from the two in each cell
it merges the corresponding word lists in word_list. It saves both lists in the one with the smaller index (max_j) and it removes the one with the larger index (max_i).
it stops if the largest value is less then a given THRESHOLD
I might think of a different algorithm to do this task, but I have no ideas for now and it would be great if there is at least a small performance improvement.
I tried using NumPy but it performed worse.
weights = fill_matrix(count, N, word_list)
while 1:
# find the max element in the matrix and its indices
max_element = 0
for i in range(count):
max_e = max(weights[i])
if max_e > max_element:
max_element = max_e
max_i = i
max_j = weights[i].index(max_e)
if max_element < THRESHOLD:
break
# reset the value of the max element
weights[max_i][max_j] = 0
# here it is important that always max_j is less than max i (since it's a lower triangular matrix)
for j in range(count):
weights[max_j][j] = max(weights[max_i][j], weights[max_j][j])
for i in range(count):
weights[i][max_j] = max(weights[i][max_j], weights[i][max_i])
# compare the symmetrical elements, set the ones above to 0
for i in range(count):
for j in range(count):
if i <= j:
if weights[i][j] > weights[j][i]:
weights[j][i] = weights[i][j]
weights[i][j] = 0
# remove the max_i-th column
for i in range(len(weights)):
weights[i].pop(max_i)
# remove the max_j-th row
weights.pop(max_i)
new_list = word_list[max_j]
new_list += word_list[max_i]
word_list[max_j] = new_list
# remove the element that was recently merged into a cluster
word_list.pop(max_i)
count -= 1
This might help:
def max_ij(A):
t1 = [max(list(enumerate(row)), key=lambda r: r[1]) for row in A]
t2 = max(list(enumerate(t1)), key=lambda r:r[1][1])
i, (j, max_) = t2
return max_, i, j
It depends on how much work you want to put into it but if you're really concerned about speed you should look into Cython. The quick start tutorial gives a few examples ranging from a 35% speedup to an amazing 150x speedup (with some added effort on your part).