Hi I've been reading up on finding the minimum of a multidimensional list, but if I have an N x N x 4 list, how do I get the minimum between every single 4th element? All other examples have been for a small example list using real indices. I suppose I'll be needing to define indices in terms of N....
[[[0,1,2,3],[0,1,2,3],...N],[[0,1,2,3],[0,1,2,3],...N].....N]
And then there's retrieving their indices.
I don't know what to try.
If anyone's interested in the actual piece of code:
relative = [[[[100] for k in range(5)] for j in range(N)] for i in range(N)]
What the following does is fill in the 4th element with times satisfying the mathematical equations. The 0th, 1st, 2nd and 3rd elements of relative have positions and velocities. The 4th spot is for the time taken for the i and jth particles to collide (redundant values such as i-i or j-i are filled with the value 100 (because it's big enough for the min function not to retrieve it). I need the shortest collision time (hence the 4th element comparisons)
def time(relative):
i = 0
t = 0
while i<N:
j = i+1
while j<N and i<N:
rv = relative[i][j][0]*relative[i][j][2]+relative[i][j][1]*relative[i][j][3] #Dot product of r and v
if rv<0:
rsquared = (relative[i][j][0])**2+(relative[i][j][1])**2
vsquared = (relative[i][j][2])**2+(relative[i][j][3])**2
det = (rv)**2-vsquared*(rsquared-diameter**2)
if det<0:
t = 100 #For negative times, assign an arbitrarily large number to make sure min() wont pick it up.
elif det == 0:
t = -rv/vsquared
elif det>0:
t1 = (-rv+sqrt((rv)**2-vsquared*(rsquared-diameter**2)))/(vsquared)
t2 = (-rv-sqrt((rv)**2-vsquared*(rsquared-diameter**2)))/(vsquared)
if t1-t2>0:
t = t2
elif t1-t2<0:
t = t1
elif rv>=0:
t = 100
relative[i][j][4]=t #Put the times inside the relative list for element ij.
j = j+1
i = i+1
return relative
I've tried:
t_fin = min(relative[i in range(0,N-1)][j in range(0,N-1)][4])
Which compiles but always returns 100 even thought I've checked it isnt the smallest element.
If you want the min of 4th element of NxNx4 list,
min([x[3] for lev1 in relative for x in lev1])
Related
I am attempting to find the indices of the two smallest and the two largest values in python:
I have
import sklearn
euclidean_matrix=sklearn.metrics.pairwise_distances(H10.T,metric='euclidean')
max_index =np.where(euclidean_matrix==np.max(euclidean_matrix[np.nonzero(euclidean_matrix)]))
min_index=np.where(euclidean_matrix==np.min(euclidean_matrix[np.nonzero(euclidean_matrix)]))
min_index
max_index
I get the following output
(array([158, 272]), array([272, 158]))
(array([ 31, 150]), array([150, 31]))
the above code only returns the indices of the absolute smallest and the absolute largest values of the matrix, I would like to find the indices of the next smallest value and the indices of the next largest value. How can I do this? Ideally I would like to return the indices of the 2 largest values of the matrix and the indices of the two smallest values of the matrix. How can I do this?
I can think of a couple ways of doing this. Some of these depend on how much data you need to search through.
A couple of caveats: You will have to decide what to do when there are 1, 2, 3 elements only Or if all the same value, do you want min, max, etc to be identical? What if there are multiple items in max or min or min2, max2? which should be selected?
run min then remove that element run min on the rest. run max then remove that element and run on the rest (note that this is on the original and not the one with min removed). This is the least efficient method since it requires searching 4 times and copying twice. (Actually 8 times because we find the min/max then find the index.) Something like the in the pseudo code.
PSEUDO CODE:
max_index = np.where(euclidean_matrix==np.max(euclidean_matrix[np.nonzero(euclidean_matrix)]))
tmp_euclidean_matrix = euclidean_matrix #make sure this is a deepcopy
tmp_euclidean_matrix.remove(max_index) #syntax might not be right?
max_index2 = np.where(tmp_euclidean_matrix==np.max(tmp_euclidean_matrix[np.nonzero(tmp_euclidean_matrix)]))
min_index = np.where(euclidean_matrix==np.min(euclidean_matrix[np.nonzero(euclidean_matrix)]))
tmp_euclidean_matrix = euclidean_matrix #make sure this is a deepcopy
tmp_euclidean_matrix.remove(min_index) #syntax might not be right?
min_index2 = np.where(tmp_euclidean_matrix==np.min(tmp_euclidean_matrix[np.nonzero(tmp_euclidean_matrix)]))
Sort the data (if you need it sorted anyway this is a good option) then just grab two smallest and largest. This isn't great unless you needed it sorted anyway because of many copies and comparisons to sort.
PSEUDO CODE:
euclidean_matrix.sort()
min_index = 0
min_index2 = 1
max_index = len(euclidean_matrix) - 1
max_index2 = max_index - 1
Best option would be to roll your own search function to run on the data, this would be most efficient because you would go through the data only once to collect them.
This is just a simple iterative approach, other algorithms may be more efficient. You will want to validate this works though.
PSEUDO CODE:
def minmax2(array):
""" returns (minimum, second minimum, second maximum, maximum)
"""
if len(array) == 0:
raise Exception('Empty List')
elif len(array) == 1:
#special case only 1 element need at least 2 to have different
minimum = 0
minimum2 = 0
maximum2 = 0
maximum = 0
else:
minimum = 0
minimum2 = 1
maximum2 = 1
maximum = 0
for i in range(1, len(array)):
if array[i] <= array[minimum]:
# a new minimum (or tie) will shift the other minimum
minimum2 = minimum
minimum = i
elif array[i] < array[minimum2]:
minimum2 = i
elif array[i] >= array[maximum]:
# a new maximum (or tie) will shift the second maximum
maximum2 = maximum
maximum = i
elif array[i] > array[maximum2]:
maximum2 = i
return (minimum, minimum2, maximum2, maximum)
edit: Added pseudo code
I have a list of 29400 values in Python and I'm trying to check if each element of the list is larger than both the 2000 neighbors to its left and the 2000 neighbors to its right. If the element is larger than its 4000 neighbors, I want to retrieve the index of the element. If there aren't 2000 neighbors to the right I just want to compare it with future elements until I reach the end of the list, and vice versa if there aren't 2000 values to the left.
def find_peaks(t):
prev = []
future = []
peak_index = []
for i in range(len(t)-2000): # compare element with previous values
for j in range(1,2001):
if t[i]<t[i+j]:
prev.append(False)
break
if j==2000:
if t[i]<t[i+j]:
prev.append(False)
break
prev.append(True)
for i in range(1999,len(t)-1): # compare element with future values
for j in range(1,2001):
if t[i]<t[i-j]:
future.append(False)
break
if j==2000:
if t[i]<t[i-j]:
future.append(False)
break
future.append(True)
future = future[::-1] # reverse list
for i in range(0,len(prev)-1):
if prev[i] == True:
if prev[i] == True:
peak_index.append(i)
Does anyone know of any better ways to go about this? I was having trouble comparing elements near the end and the beginning of the list- If there aren't 2000 elements left in the list for me to compare with, then the list wraps around to the beginning of the list, which isn't something I want.
You can use some list comprehension, so the actual search becomes a one-liner. I cannot judge about speed and beauty, but it takes just some seconds on my machine.
import random
# create list of random numbers and manually insert two peaks
t = [random.randrange(1, 1000) for r in range(29400)] # found this here: https://stackoverflow.com/questions/16655089/python-random-numbers-into-a-list
t[666] = 2000
t[6666] = 2000
# finds the peak elements
peaks = [index for index, value in enumerate(t) if value == max(t[max(index-2000, 0):min(index+2000, len(t))])]
print peaks # includes 666 and 6666
A naive, iterative version would be to go through each element and do look behind/ahead a certain number of times to see if there are values larger than it, and if not consider it a pick:
def find_peaks(data, span=2000):
peaks = []
for i, value in enumerate(data):
peak = False # start with the assumed non-peak
for j in range(max(0, i - 1), max(0, i - span - 1), -1): # look behind
peak = value > data[j] # check if our value is larger than the selected neighbor
if not peak: # not a peak, break away
break
if peak: # look behind passed, look ahead:
for j in range(i + 1, min(i + span + 1, len(data))):
if value <= data[j]: # look ahead failed, break away
peak = False
break
if peak: # if look ahead passed...
peaks.append(i) # add it to our peaks list
return peaks
This is the most performant way to do it with look ahead/behind as it breaks away immediately when a condition is not met instead of checking every element with every element.
If you want to count in the neighbors that are of the same value when calculating a peak (so your current peak candidate is the same) you can use peak = value >= data[j] in the look behind and if value < data[j] in the look ahead portions.
My solution does not involve any language powerful inbuilt methods. Its just plain logic to find the target peaks. Basically I am iterating over each element and then in inner loop checking if its previous or future neighbors are present and less than the current element. I am initializing a isPeak variable to True for if none of the neighbors in both direction are greater than the current element, which indicates your current element is peak element. After that just getting the index of target element.
def find_peaks(t , neighbour_length=2000):
peak_index = []
for i in range(len(t)): # compare element with previous values
isPeak = True #intialize to true
for j in range(i, neighbour_length + i):
# Check if previous index value is present
if (2*i-j-1 >= 0):
# Check if next neighbour is less or break
if(t[i] <= t[2*i-j-1]):
isPeak = False
break
# Check if Future element is present
if (j+i+1 < len(t)):
#Check if next future neighbour ir less or break
if(t[i] <= t[i+j+1]):
isPeak = False
break
if(isPeak):
peak_index.append(i)
return peak_index
Hope it Helps!
I'm trying to implement Radix sort in python.
My current program is not working correctly in that a list like [41,51,2,3,123] will be sorted correctly to [2,3,41,51,123], but something like [52,41,51,42,23] will become [23,41,42,52,51] (52 and 51 are in the wrong place).
I think I know why this is happening, because when I compare the digits in the tens place, I don't compare units as well (same for higher powers of 10).
How do I fix this issue so that my program runs in the fastest way possible? Thanks!
def radixsort(aList):
BASEMOD = 10
terminateLoop = False
temp = 0
power = 0
newList = []
while not terminateLoop:
terminateLoop = True
tempnums = [[] for x in range(BASEMOD)]
for x in aList:
temp = int(x / (BASEMOD ** power))
tempnums[temp % BASEMOD].append(x)
if terminateLoop:
terminateLoop = False
for y in tempnums:
for x in range(len(y)):
if int(y[x] / (BASEMOD ** (power+1))) == 0:
newList.append(y[x])
aList.remove(y[x])
power += 1
return newList
print(radixsort([1,4,1,5,5,6,12,52,1,5,51,2,21,415,12,51,2,51,2]))
Currently, your sort does nothing to reorder values based on anything but their highest digit. You get 41 and 42 right only by chance (since they are in the correct relative order in the initial list).
You should be always build a new list based on each cycle of the sort.
def radix_sort(nums, base=10):
result_list = []
power = 0
while nums:
bins = [[] for _ in range(base)]
for x in nums:
bins[x // base**power % base].append(x)
nums = []
for bin in bins:
for x in bin:
if x < base**(power+1):
result_list.append(x)
else:
nums.append(x)
power += 1
return result_list
Note that radix sort is not necessarily faster than a comparison-based sort. It only has a lower complexity if the number of items to be sorted is larger than the range of the item's values. Its complexity is O(len(nums) * log(max(nums))) rather than O(len(nums) * log(len(nums))).
Radix sort sorts the elements by first grouping the individual digits of the same place value. [2,3,41,51,123] first we group them based on first digits.
[[],[41,51],[2],[3,123],[],[],[],[],[],[]]
Then, sort the elements according to their increasing/decreasing order. new array will be
[41,51,2,3,123]
then we will be sorting based on tenth digit. in this case [2,3]=[02,03]
[[2,3],[],[123],[],[41],[51],[],[],[],[]]
now new array will be
[2,3,123,41,51]
lastly based on 100th digits. this time [2,3,41,51]=[002,003,041,051]
[[2,3,41,51],[123],[],[],[],[],[],[],[],[]]
finally we end up having [2,3,41,51,123]
def radixsort(A):
if not isinstance(A,list):
raise TypeError('')
n=len(A)
maxelement=max(A)
digits=len(str(maxelement)) # how many digits in the maxelement
l=[]
bins=[l]*10 # [[],[],.........[]] 10 bins
for i in range(digits):
for j in range(n): #withing this we traverse unsorted array
e=int((A[j]/pow(10,i))%10)
if len(bins[e])>0:
bins[e].append(A[j]) #adds item to the end
else:
bins[e]=[A[j]]
k=0 # used for the index of resorted arrayA
for x in range(10):#we traverse the bins and sort the array
if len(bins[x])>0:
for y in range(len(bins[x])):
A[k]=bins[x].pop(0) #remove element from the beginning
k=k+1
I work with a large amount of data and the execution time of this piece of code is very very important. The results in each iteration are interdependent, so it's hard to make it in parallel. It would be awesome if there is a faster way to implement some parts of this code, like:
finding the max element in the matrix and its indices
changing the values in a row/column with the max from another row/column
removing a specific row and column
Filling the weights matrix is pretty fast.
The code does the following:
it contains a list of lists of words word_list, with count elements in it. At the beginning each word is a separate list.
it contains a two dimensional list (count x count) of float values weights (lower triangular matrix, the values for which i>=j are zeros)
in each iteration it does the following:
it finds the two words with the most similar value (the max element in the matrix and its indices)
it merges their row and column, saving the larger value from the two in each cell
it merges the corresponding word lists in word_list. It saves both lists in the one with the smaller index (max_j) and it removes the one with the larger index (max_i).
it stops if the largest value is less then a given THRESHOLD
I might think of a different algorithm to do this task, but I have no ideas for now and it would be great if there is at least a small performance improvement.
I tried using NumPy but it performed worse.
weights = fill_matrix(count, N, word_list)
while 1:
# find the max element in the matrix and its indices
max_element = 0
for i in range(count):
max_e = max(weights[i])
if max_e > max_element:
max_element = max_e
max_i = i
max_j = weights[i].index(max_e)
if max_element < THRESHOLD:
break
# reset the value of the max element
weights[max_i][max_j] = 0
# here it is important that always max_j is less than max i (since it's a lower triangular matrix)
for j in range(count):
weights[max_j][j] = max(weights[max_i][j], weights[max_j][j])
for i in range(count):
weights[i][max_j] = max(weights[i][max_j], weights[i][max_i])
# compare the symmetrical elements, set the ones above to 0
for i in range(count):
for j in range(count):
if i <= j:
if weights[i][j] > weights[j][i]:
weights[j][i] = weights[i][j]
weights[i][j] = 0
# remove the max_i-th column
for i in range(len(weights)):
weights[i].pop(max_i)
# remove the max_j-th row
weights.pop(max_i)
new_list = word_list[max_j]
new_list += word_list[max_i]
word_list[max_j] = new_list
# remove the element that was recently merged into a cluster
word_list.pop(max_i)
count -= 1
This might help:
def max_ij(A):
t1 = [max(list(enumerate(row)), key=lambda r: r[1]) for row in A]
t2 = max(list(enumerate(t1)), key=lambda r:r[1][1])
i, (j, max_) = t2
return max_, i, j
It depends on how much work you want to put into it but if you're really concerned about speed you should look into Cython. The quick start tutorial gives a few examples ranging from a 35% speedup to an amazing 150x speedup (with some added effort on your part).
I have a 2d array with a different species in each one. I pick a random element on the array and I want to count up how many of each species are in the eight squares immediately adjacent to that element.
But I want the array to wrap at the edges, so if I pick an element on the top row, the bottom row will be counted as "adjacent". How can I do this while iterating through j in range (x-1,x+1) and the same for j and y?
Also, is there a more elegant way of omitting the element I originally picked while looking through the adjacent squares than the if (j!=x or k!=y line?
numspec = [0] * len(allspec)
for i in range (0,len(allspec)):
#count up how many of species i there is in the immediate area
for j in range(x-1,x+1):
for k in range(y-1,y+1):
if (j!=x or k!=y):
numspec[hab[i][j]] = numspec[hab[i][j]]+1
You can wrap using j%8 that gives you a number from 0 to 7.
As for wrapping, I would recomend using relative indexing from -1 to +1 and then computing real index using modulo operator (%).
As for making sure you don't count the original element (x, y), you are doing just fine (I would probably use reversed contidion and continue, but it doesn't matter).
I don't quite understand your usage of i, j, k indexes, so I'll just assume that i is index of the species, j, k are indexes into the 2d map called hab which I changed to x_rel, y_rel and x_idx and y_idx to make it more readable. If I'm mistaken, change the code or let me know.
I also took the liberty of doing some minor fixes:
introduced N constant representing number of species
changed range to xrange (xrange is faster, uses less memory, etc)
no need to specify 0 in range (or xrange)
instead of X = X + 1 for increasing value, I used += increment operator like this: X += 1
Here is resulting code:
N = len(allspec)
numspec = [0] * N
for i in xrange(N):
for x_rel in xrange(-1, +1):
for y_rel in xrange(-1, +1):
x_idx = (x + xrel) % N
y_idx = (y + yrel) % N
if x_idx != x or y_idx != y:
numspec[hab[x_idx][y_idx]] += 1
You could construct a list of the adjacent elements and go from there. For example if your 2d list is called my_array and you wanted to examine the blocks immediately surrounding my_array[x][y] then you can do something like this:
xmax = len(my_array)
ymax = len(my_array[0]) #assuming it's a square...
x_vals = [i%xmax for i in [x-1,x,x+1]]
y_vals = [blah]
surrounding_blocks = [
my_array[x_vals[0]][y_vals[0]],
my_array[x_vals[0]][y_vals[1]],
my_array[x_vals[0]][y_vals[2]],
my_array[x_vals[2]][y_vals[0]],
my_array[x_vals[2]][y_vals[1]],
my_array[x_vals[2]][y_vals[2]],
my_array[x_vals[1]][y_vals[0]],
my_array[x_vals[1]][y_vals[2]],
]