I am currently using a nested for loop to iterate through to arrays to find values that match a certain criterion. The problem is that this method is incredibly inefficient and time consuming. I was told that a better way might be to sort the two arrays based on the data, but this requires me to combine several 1D arrays and one multi-D array, sort based on one column, then separate them again. Is there a more efficient way of doing this? Here is a sample of my code:
x1 = []
x2 = []
velocity = []
plane1Times = np.array([[2293902],[2848853],[482957]])
plane2Times = np.array([[7416504],[2613113],[2326542]])
plane1Local = np.array([[0,0,0],[0,u,0],[0,2*u,0],[u,0,0],[u,u,0],[u,2*u,0],[2*u,0,0],[2*u,u,0],[2*u,2*u,0],[3*u,0,0],[3*u,u,0],[3*u,2*u,0]],dtype='float')
plane2Local = np.array([[0,0,D],[0,u,D],[0,2*u,D],[u,0,D],[u,u,D],[u,2*u,D],[2*u,0,D],[2*u,u,D],[2*u,2*u,D],[3*u,0,D],[3*u,u,D],[3*u,2*u,D]],dtype='float')
for i in range(0,len(plane1Times)):
tic = time.time()
for n in range(0,len(plane2Times)):
if plane2Times[n] - plane1Times[i] <= 10000 and plane2Times[n] - plane1Times[i] > 0:
x1 = plane1Local[plane1Dets[i]]
x2 = plane2Local[plane2DetScale[n]]
distance = np.sqrt((x2[0]-x1[0])**2 + (x2[1]-x1[1])**2 + (x2[2])**2)
timeSeparation = (plane2Times[n]-plane1Times[i])*timeScale
velocity += distance/timeSeparation
break
To give you an example of the time it is currently taking, each array of times is 10**6 values long so 100 loops in i takes about 60 seconds. Can someone please help me?
I cant really test because the code you provided isn't complete, but this is a possible solution
for index,value in enumerate(plane1Times):
vec = plane2Times - value
row,col = np.where((vec<=10000)&(vec>0))
if len(row) > 0:
x1 = plane1Local[plane1Dets[index]]
x2 = plane2Local[plane2DetScale[row[0]]]
distance = np.sqrt((x2[0] - x1[0]) ** 2 + (x2[1] - x1[1]) ** 2 + (x2[2]) ** 2)
timeSeparation = (plane2Times[row[0]] - plane1Times[index]) * timeScale
velocity += distance / timeSeparation
Eliminate the second loop, and just do the subtraction all at once. Then search the new array, where it meats your criteria. Since it seems that you want the first value, just take the first index like row[0] to get the index of the value check. Removing the second for loop should drop the time considerably.
Related
I am trying to write a code where i have a list of vectors and Ι have to find the angle between every vector and the rest of them.(I am working on mediapipe's hand landmarks).
My code so far is this one:
vectors = [thumb_cmc_vec, thumb_mcp_vec, thumb_ip_vec, thumb_tip_vec, index_mcp_vec, index_pip_vec,
index_dip_vec, index_tip_vec, middle_mcp_vec, middle_pip_vec, middle_dip_vec, middle_tip_vec,
ring_mcp_vec, ring_pip_vec, ring_dip_vec, ring_tip_vec, pinky_mcp_vec, pinky_pip_vec,
pinky_dip_vec, pinky_tip_vec]
for vector in vectors:
next_vector = vector + 1
print(vector)
for next_vector in vectors:
print(next_vector)
M = (np.linalg.norm(vector) * np.linalg.norm(next_vector))
ES = np.dot(vector, next_vector)
th = math.acos(ES / M)
list.append(th)
print(list)
where M = the multiplication of the norms of the current sets of vectors, ES = the
scalar product of the vectors and th = the angle of the vectors.
My problem is that the variable next_vector always starts the for loop from the first vector of the list even though I want it to start from the previous loop's next vector in order not to have duplicate results. Also when both of the loops are on the 3rd vector (thumb_ip_vec) I am getting this error
th = math.acos(ES / M)
ValueError: math domain error . Is there any way to solve this? Thank you!
I think you can iterate through the list indices (using range(len(vectors) - 1)) and access the elements through their indices instead of looping through each element
for i in range(len(vectors) - 1):
# Iterate from 0 to len(vectors) -1
vector = vectors[i]
for j in range(i + 1, len(vectors)):
# Iterate from index i + 1 to len(vectors)
next_vector = vectors[j]
M = (np.linalg.norm(vector) * np.linalg.norm(next_vector))
ES = np.dot(vector, next_vector)
th = math.acos(ES / M)
list.append(th)
print(list)
The efficient solution here is to iterate over combinations of vectors:
from itertools import combinations # At top of file
for vector, next_vector in itertools.combinations(vectors, 2):
M = (np.linalg.norm(vector) * np.linalg.norm(next_vector))
ES = np.dot(vector, next_vector)
th = math.acos(ES / M)
list.append(th)
It's significantly faster than looping over indices and indexing, reduces the level of loop nesting, and makes it more clear what you're trying to do (working with every unique pairing of the input).
I'm not sure I understand your question, but consider using ranges instead.
Ranges allow you to iterate, but without calling the exact value only, but by calling it's address.
Which means you can manipulate that index to access neighboring values.
for i in range(len(iterables)-1):
ii = i+1
initial_value = iterables[i]
next_value = iterables[ii]
for ii in range(len(iterables)):
# do_rest_of_code
Sort of like the mailman, you can reach someone's neighbor without knowing the neighbor's address.
The structure above generally works, but you will need to tweak it to meet your needs.
This is my code, working with dim=3, but I would like it to work for any dimensionality without having to manually edit code.
I would like to be able to vary the dimensionality between 3 and 20 eventually without manually having to ad for-loops.
I was looking at itertools, but don't know how to select the correct values from the tuples created by itertools.product() to square and add up for my if statement.
arrayshape = (width * 2 + 1,) * dim
funcspace = np.zeros(shape=arrayshape, dtype='b')
x1 = list(range(-int(width), int(width + 1)))
x2 = x1
x3 = x1
for i in range(len(x1)):
for j in range(len(x2)):
for k in range(len(x3)):
if round(np.sqrt(x1[i] ** 2 + x2[j] ** 2 + x3[k] ** 2)) in ranges:
funcspace[i][j][k] = 1
You can use product on enumerate of your vectors, which will yield the value and the index:
for ((i,v1),(j,v2),(k,v3)) in itertools.product(enumerate(x1),enumerate(x2),enumerate(x3)):
if round(np.sqrt(v1**2+v2**2+v3**2)) in ranges:
funcspace[i][j][k]=1
as a bonus, you get rid of the unpythonic range(len()) construct.
I've cooked a more general case when you have a vector of vectors. It's a little harder to read because unpacking isn't done in the for loop.
The square sum is done using sum on the 1 indexes (the values), and if the condition matches, we loop until we find the "deeper" list to set the value to 1.
for t in itertools.product(*(enumerate(x) for x in x_list)):
# compute the squared sum of values
sqsum = sum(v[1]**2 for v in t)
if round(sqsum) in ranges:
# traverse the dimensions except the last one
deeper_list = funcspace
for i in range(len(t)-1):
deeper_list = deeper_list[t[i][0]]
# set the flag using the last dimension list
deeper_list[t[-1][0]] = 1
as noted in comments, since x1 seems to be repeated you can replace the first statement by:
for t in itertools.product(enumerate(x1), repeat=dim):
Another comments states that since funcspace is a numpy ndarray, we can simplify the "set to 1" loop by passing the list of indexes:
funcspace[[x[0] for x in t]] = 1
I'm new at programming so, I had troubles even with simple things;
I am trying to get 3 vectorsg1vec & a1vec & z1vec out of a triple for loop below, I have a Xvec which has 120 data in it, what I want is for every x in Xvec trying 90 A values and at every A there will be Zs up to that A (Zs are float at first then by comparing the ceil and floor I will take the one which gives me minimum) then with these values find the best A,Zs couple that minimizes my function gibbs for 120 Xvec values.
After days of trials, I could write the code below, but the code works so slow and I think there should be direct method to find these A and Zs argument, actually I'm not even sure that with this code I am getting the results I need. Any advice is highly appreciated
for x in Xvec:
for A in range(1,91):
for Zs in (np.arange(1, A+1 , 0.1 , dtype=float)):
g = gibbs(x,A,Zs)
g1 = np.append(g1,g)
min_pos, minofg = g1.argmin() , g1.min()
Zpos = 1+(0.1)*min_pos
Zceil = np.ceil(Zpos)
Zfloor = np.floor(Zpos)
gc = gibbs(x,A,Zceil)
gf = gibbs(x,A,Zfloor)
k = min(gc,gf)
if k == gc: Z = Zceil
else: Z = Zfloor
z1 = np.append(z1,Z)
x1 = np.append(x1,x)
a1 = np.append(a1,A)
for N in range(0,10711,90):
a = min(g1[N:N+90])
g1vec = np.append(g1vec,a)
b = g1[N:N+90].argmin()
a1vec = np.append(a1vec,a1[N+b])
z1vec = np.append(z1vec,z1[N+b])
So at the end I need to have 3 vectors one with a minimum possible values of a function gibbs at every x and the other 2 are the A and Zs values that gives that minimum value of the function.
After I could not find any other way I calculated all posibilities and then tried to seperate them in another for loop but it takes forever and I have lots of doubts that it actually even works.
I have a block of code that I need to optimize as much as possible since I have to run it several thousand times.
What it does is it finds the closest float in a sub-list of a given array for a random float and stores the corresponding float (ie: with the same index) stored in another sub-list of that array. It repeats the process until the sum of floats stored reaches a certain limit.
Here's the MWE to make it clearer:
import numpy as np
# Define array with two sub-lists.
a = [np.random.uniform(0., 100., 10000), np.random.random(10000)]
# Initialize empty final list.
b = []
# Run until the condition is met.
while (sum(b) < 10000):
# Draw random [0,1) value.
u = np.random.random()
# Find closest value in sub-list a[1].
idx = np.argmin(np.abs(u - a[1]))
# Store value located in sub-list a[0].
b.append(a[0][idx])
The code is reasonably simple but I haven't found a way to speed it up. I tried to adapt the great (and very fast) answer given in a similar question I made some time ago, to no avail.
OK, here's a slightly left-field suggestion. As I understand it, you are just trying to sample uniformally from the elements in a[0] until you have a list whose sum exceeds some limit.
Although it will be more costly memory-wise, I think you'll probably find it's much faster to generate a large random sample from a[0] first, then take the cumsum and find where it first exceeds your limit.
For example:
import numpy as np
# array of reference float values, equivalent to a[0]
refs = np.random.uniform(0, 100, 10000)
def fast_samp_1(refs, lim=10000, blocksize=10000):
# sample uniformally from refs
samp = np.random.choice(refs, size=blocksize, replace=True)
samp_sum = np.cumsum(samp)
# find where the cumsum first exceeds your limit
last = np.searchsorted(samp_sum, lim, side='right')
return samp[:last + 1]
# # if it's ok to be just under lim rather than just over then this might
# # be quicker
# return samp[samp_sum <= lim]
Of course, if the sum of the sample of blocksize elements is < lim then this will fail to give you a sample whose sum is >= lim. You could check whether this is the case, and append to your sample in a loop if necessary.
def fast_samp_2(refs, lim=10000, blocksize=10000):
samp = np.random.choice(refs, size=blocksize, replace=True)
samp_sum = np.cumsum(samp)
# is the sum of our current block of samples >= lim?
while samp_sum[-1] < lim:
# if not, we'll sample another block and try again until it is
newsamp = np.random.choice(refs, size=blocksize, replace=True)
samp = np.hstack((samp, newsamp))
samp_sum = np.hstack((samp_sum, np.cumsum(newsamp) + samp_sum[-1]))
last = np.searchsorted(samp_sum, lim, side='right')
return samp[:last + 1]
Note that concatenating arrays is pretty slow, so it would probably be better to make blocksize large enough to be reasonably sure that the sum of a single block will be >= to your limit, without being excessively large.
Update
I've adapted your original function a little bit so that its syntax more closely resembles mine.
def orig_samp(refs, lim=10000):
# Initialize empty final list.
b = []
a1 = np.random.random(10000)
# Run until the condition is met.
while (sum(b) < lim):
# Draw random [0,1) value.
u = np.random.random()
# Find closest value in sub-list a[1].
idx = np.argmin(np.abs(u - a1))
# Store value located in sub-list a[0].
b.append(refs[idx])
return b
Here's some benchmarking data.
%timeit orig_samp(refs, lim=10000)
# 100 loops, best of 3: 11 ms per loop
%timeit fast_samp_2(refs, lim=10000, blocksize=1000)
# 10000 loops, best of 3: 62.9 µs per loop
That's a good 3 orders of magnitude faster. You can do a bit better by reducing the blocksize a fraction - you basically want it to be comfortably larger than the length of the arrays you're getting out. In this case, you know that on average the output will be about 200 elements long, since the mean of all real numbers between 0 and 100 is 50, and 10000 / 50 = 200.
Update 2
It's easy to get a weighted sample rather than a uniform sample - you can just pass the p= parameter to np.random.choice:
def weighted_fast_samp(refs, weights=None, lim=10000, blocksize=10000):
samp = np.random.choice(refs, size=blocksize, replace=True, p=weights)
samp_sum = np.cumsum(samp)
# is the sum of our current block of samples >= lim?
while samp_sum[-1] < lim:
# if not, we'll sample another block and try again until it is
newsamp = np.random.choice(refs, size=blocksize, replace=True,
p=weights)
samp = np.hstack((samp, newsamp))
samp_sum = np.hstack((samp_sum, np.cumsum(newsamp) + samp_sum[-1]))
last = np.searchsorted(samp_sum, lim, side='right')
return samp[:last + 1]
Write it in cython. That's going to get you a lot more for a high iteration operation.
http://cython.org/
One obvious optimization - don't re-calculate sum on each iteration, accumulate it
b_sum = 0
while b_sum<10000:
....
idx = np.argmin(np.abs(u - a[1]))
add_val = a[0][idx]
b.append(add_val)
b_sum += add_val
EDIT:
I think some minor improvement (check it out if you feel like it) may be achieved by pre-referencing sublists before the loop
a_0 = a[0]
a_1 = a[1]
...
while ...:
....
idx = np.argmin(np.abs(u - a_1))
b.append(a_0[idx])
It may save some on run time - though I don't believe it will matter that much.
Sort your reference array.
That allows log(n) lookups instead of needing to browse the whole list. (using bisect for example to find the closest elements)
For starters, I reverse a[0] and a[1] to simplify the sort:
a = np.sort([np.random.random(10000), np.random.uniform(0., 100., 10000)])
Now, a is sorted by order of a[0], meaning if you are looking for the closest value to an arbitrary number, you can start by a bisect:
while (sum(b) < 10000):
# Draw random [0,1) value.
u = np.random.random()
# Find closest value in sub-list a[0].
idx = bisect.bisect(a[0], u)
# now, idx can either be idx or idx-1
if idx is not 0 and np.abs(a[0][idx] - u) > np.abs(a[0][idx - 1] - u):
idx = idx - 1
# Store value located in sub-list a[1].
b.append(a[1][idx])
I am profiling some genetic algorithm code with some nested loops and from what I see most of the time is spent in two of my functions which involve slicing and adding up numpy arrays. I tried my best to further optimize them but would like to see if others come up with ideas.
Function 1:
The first function is called 2954684 times for a total time spent inside the function of 19 seconds
We basically just create views inside numpy arrays contained in data[0], according to coordinates contained in data[1]
def get_signal(data, options):
#data[0] contains bed, data[1] contains position
#forward = 0, reverse = 1
start = data[1][0] - options.halfwinwidth
end = data[1][0] + options.halfwinwidth
if data[1][1] == 0:
normals_forward = data[0]['normals_forward'][start:end]
normals_reverse = data[0]['normals_reverse'][start:end]
else:
normals_forward = data[0]['normals_reverse'][end - 1:start - 1: -1]
normals_reverse = data[0]['normals_forward'][end - 1:start - 1: -1]
row = {'normals_forward': normals_forward,
'normals_reverse': normals_reverse,
}
return row
Function 2:
Called 857 times for a total time of 13.674 seconds spent inside the function:
signal is a list of numpy arrays of equal length with dtype float, options is just random options
The goal of the function is just to add up the lists of each numpy arrays to a single one, calculate the intersection of the two curves formed by the forward and reverse arrays and return the result
def calculate_signal(signal, options):
profile_normals_forward = np.zeros(options.halfwinwidth * 2, dtype='f')
profile_normals_reverse = np.zeros(options.halfwinwidth * 2, dtype='f')
#here i tried np.sum over axis = 0, its significantly slower than the for loop approach
for b in signal:
profile_normals_forward += b['normals_forward']
profile_normals_reverse += b['normals_reverse']
count = len(signal)
if options.normalize == 1:
#print "Normalizing to max counts"
profile_normals_forward /= max(profile_normals_forward)
profile_normals_reverse /= max(profile_normals_reverse)
elif options.normalize == 2:
#print "Normalizing to number of elements"
profile_normals_forward /= count
profile_normals_reverse /= count
intersection_signal = np.fmin(profile_normals_forward, profile_normals_reverse)
intersection = np.sum(intersection_signal)
results = {"intersection": intersection,
"profile_normals_forward": profile_normals_forward,
"profile_normals_reverse": profile_normals_reverse,
}
return results
As you can see the two are very simple, but account for > 60% of my execution time on a script that can run for hours / days (genetic algorithm optimization), so even minor improvements are welcome :)
One simple thing I would do to increase the speed of the first function is to use different notation for the accessing of the list indices as detailed here.
For example:
foo = numpyArray[1][0]
bar = numpyArray[1,0]
The second line will execute much faster because you don't have to return the entire element at numpyArray[1] and then find the first element of that. Try it out