Differences bewteen time consumed of python loops - python

I am running a program and it really consumes a lot of time for a 3-layers loop. I reduced the size of loop and observed an interesting thing.
When the first layer is 50 iterations, it only comsumes 3 seconds. But when I changed it to 100 iterations, the time increased to 43 seconds. Why the time spent not doubled when the number of iterations just doubled? How the calculation complexity was calculted... I dont understand.
By the way my original designed loop was 160x192x160. It spent really a lot of time and I just stopped it. I think I need to figure out one way to solve this time problem. This is why I tried above mentioned.
start=time.time()
choice_list=[]
result_list=[]
mean_list=[]
point_info=[]
patch_radius=patch_radius
for k in range (0,50):
for l in range (0,100):
for h in range (0,10):
if img[k][l][h]!=0:
mean=patch_mean(coordinate_x=k,coordinate_y=l,coordinate_z=h,image=img,patch_radius=patch_radius)
point_info=[k,l,h,mean]
mean_list.append(point_info)
end=time.time()
print(end-start)
patch_mean is a function calculated the mean around a point. It is another loop. I think it would not matter. Because it is an independent fucntion. To be clear, patch raidus is a constant
def patch_mean(coordinate_x,coordinate_y,coordinate_z,image,patch_radius):
sum=0
count=0
for k in range(coordinate_x- patch_radius, coordinate_x + patch_radius):
for l in range(coordinate_y - patch_radius, coordinate_y + patch_radius):
for h in range (coordinate_z - patch_radius, coordinate_z + patch_radius):
if 0<k<159 and 0<l<191 and 0<h<159:
if img[k][l][h] != 0:
sum = sum + img[k][l][h]
count = count + 1
if count==0:
mean=0
else:
mean=sum/count
return mean

The first iterations of your outer loop give you coordinates that are near boundary of your image. That makes the patch_mean faster to calculate, as a big chunk of its area is cut off. When you move towards the middle of the image, the computation will be slower, since you'll be able to get an average of the whole patch area, not just a part of it.
If you change the range from range(0, 50) to range(0, 100), you're will be a lot more of the middle part of the image. Those coordinates are the slow ones, so overall, the loop will be a lot slower. If you changed it to range(0, 160), you'd find that the last few iterations would speed up again, as you'd start running into the other side of the image. But the interval from 50-100 is right in the middle of the image, and will be the slowest part.

Related

Random numpy array in order

I have a random sampling method as below:
def costum_random_sample(size):
randomList = []
counter = 0
last_n = -1
while(size != counter):
n = random.random()
if abs(n - last_n) < 0.05:
continue
else:
randomList.append(n)
counter += 1
last_n = n
return np.array(randomList)
The result is an array([0.50146945, 0.17442673, 0.60011469, 0.13501798]) like this. Now, I want to change it to make it resulted in order likes array in ascending order. Sort() doesn't work in this case since it change the order of my array after it is generated, and the logic between each number is changed. I want it random the number in list in order, by that way it could keep the logic in the number sequence. How can I do that?
If your arrays are shortish, you can simply generate the whole array, sort it, and reject it and regenerate as long as the constraint is violated.
bad = True
while bad:
arr = np.sort(np.random.rand(size))
bad = np.any(np.ediff1d(arr) < 0.05)
If size is too big, the conflicts will be too plentiful, and this will take forever, so only use it if there is a reasonable chance a conformant array will be generated randomly. Note that if size > 20 there is no array that will fit the criteria, turning this into an infinite loop.
Another approach would be to generate and sort the array as above, find the non-conformant element pairs, then nudge the array elements by increasing the distance between the non-comformant pairs and evenly subtracting this difference from other places. This can't get stuck in an infinite loop, but has a bit more math, and bends the uniform distribution (though I couldn't tell you how much).
EDIT After thinking a bit, there's a much better way. Basically, you need a spaced array, where there's a fixed spacer and a little bit of extra randomness between each element:
random start space
[element1]
0.05 spacer
some more space
[element2]
0.05 spacer
some more space
[element3]
random end space
All the space needs to add up to 1. However, some of that space is fixed ((size - 1) * 0.05); so if we take out the fixed spacers, we have our "space budget" to distribute between our start, end and random space. So we generate random space and then punch it a bit so it sums up to our space budget. Then add in the fixed spacers, and cumulative sum will give us the final array (and an extra 1.0 at the end, which we chop off).
space_budget = 1 - (size - 1) * 0.05
space = np.random.rand(size + 1)
space *= space_budget / np.sum(space)
space[1:-1] += 0.05
arr = np.cumsum(space)[:-1]
For size = 21, you get exactly one solution every time, as space_budget is zero. For larger size, you start bursting out of the 0...1 range as it's mathematically impossible to stuff more than 21 0.05 spacers into that interval.

How to make this Python for-loop run faster?

for j in range(0, NumberOfFeatures):
for k in range(j+1,NumberOfFeatures):
countArray = np.ones((2,2))
for i in range(0,NumberOfTrainingExamples):
countArray[XTrain[i,j],XTrain[i,k]] += 1
The innermost for loop takes quite some time for large NumberOfFeatures, NumberOfTrainingExamples
Its O(n^3) basically (where n is not the same number).
Because the code is not complete it is very hard to determine what could be done better but from what you provided, try to reduce it to at least n^2 otherwise it will just take some time.
If you have 10 of each its 1000 cycles, 1000 of each is 1,000,000,000 so with bigger numbers it gets hard to calculate very fast.

Append Multiples of Squares of integers greater than 2 and sort them

For Example :
4,8,9,12,16,18,.......
How can we do this for big size array?
The basic one I made, took a lot of time to execute :
for i in range(1,1000):
for j in range(2,1000):
l.append((i)*((j)*(j)))
s=set(l)
l1=list(s)
l1.sort()
print(l1)
Size of list should be in order of 10^6.
I wrote your code in a more better way:
my_list = sorted([i*(j**2) for i in range(1, 1000) for j in range(2, 1000)])
However, what you are asking is execution time which for this operation takes a fraction of seconds (around 1.5 second) and what you are seeing is printing time which in different editors and interfaces is different. For example I ran this script in sublime text it took almost 9 minutes while it took 1.4 minute in command line.
if you check the length of your list:
print len(sorted([i*(j**2) for i in range(1, 1000) for j in range(2, 1000)]))
you will see, to print a list with 997002 elements, obviously you should consider an execution time.
Another factor which you should take into account is the Big-O Notation which in your case called O(n^2) notation and this run time will exponentially increase if you increase the number of loops.

Trying to speed up python code by replacing loops with functions

I am trying to come up with a faster way of coding what I want to. Here is the part of my program I am trying to speed up, hopefully using more inbuilt functions:
num = 0
num1 = 0
rand1 = rand_pos[0:10]
time1 = time.clock()
for rand in rand1:
for gal in gal_pos:
num1 = dist(gal, rand)
num = num + num1
time2 = time.clock()
time_elap = time2-time1
print time_elap
Here, rand_pos and gal_pos are lists of length 900 and 1 million respectively.
Here dist is function where I calculate the distance between two points in euclidean space.
I used a snippet of the rand_pos to get a time measurement.
My time measurements are coming to be about 125 seconds. This is way too long!
It means that if I run the code over all the rand_pos, it will take about three hours to do!
Is there a faster way I can do this?
Here is the dist function:
def dist(pos1,pos2):
n = 0
dist_x = pos1[0]-pos2[0]
dist_y = pos1[1]-pos2[1]
dist_z = pos1[2]-pos2[2]
if dist_x<radius and dist_y<radius and dist_z<radius:
positions = [pos1,pos2]
distance = scipy.spatial.distance.pdist(positions, metric = 'euclidean')
if distance<radius:
n = 1
return n
While most of the optimization probably needs to happen within your dist function, there are some tips here to speed things up:
# Don't manually sum
for rand in rand1:
num += sum([dist(gal, rand) for gal in gal_pos])
#If you can vectorize something, then do
import numpy as np
new_dist = np.vectorize(dist)
for rand in rand1:
num += np.sum(new_dist(gal_pos, rand))
# use already-built code whenever possible (as already suggested)
scipy.spatial.distance.cdist(gal, rand1, metric='euclidean')
There is a function in scipy that does exactly what you want to do here:
scipy.spatial.distance.cdist(gal, rand1, metric='euclidean')
It will be faster than anything you write in pure Python probably, since the heavy lifting (looping over the pairwise combinations between arrays) is implemented in C.
Currently your loop is happening in Python, which means there is more overhead per iteration, then you are making many calls to pdist. Even though pdist is very optimized, the overhead of making so many calls to it slows down your code. This type of performance issue was once described to me with a very useful analogy: its like trying to have a conversation with someone over the phone by saying one word per phone call, even though each word is going across the line very fast, your conversation will take a long time because you need to hang up and dial again repeatedly.

Python: sliding window of variable width

I'm writing a program in Python that's processing some data generated during experiments, and it needs to estimate the slope of the data. I've written a piece of code that does this quite nicely, but it's horribly slow (and I'm not very patient). Let me explain how this code works:
1) It grabs a small piece of data of size dx (starting with 3 datapoints)
2) It evaluates whether the difference (i.e. |y(x+dx)-y(x-dx)| ) is larger than a certain minimum value (40x std. dev. of noise)
3) If the difference is large enough, it will calculate the slope using OLS regression. If the difference is too small, it will increase dx and redo the loop with this new dx
4) This continues for all the datapoints
[See updated code further down]
For a datasize of about 100k measurements, this takes about 40 minutes, whereas the rest of the program (it does more processing than just this bit) takes about 10 seconds. I am certain there is a much more efficient way of doing these operations, could you guys please help me out?
Thanks
EDIT:
Ok, so I've got the problem solved by using only binary searches, limiting the number of allowed steps by 200. I thank everyone for their input and I selected the answer that helped me most.
FINAL UPDATED CODE:
def slope(self, data, time):
(wave1, wave2) = wt.dwt(data, "db3")
std = 2*np.std(wave2)
e = std/0.05
de = 5*std
N = len(data)
slopes = np.ones(shape=(N,))
data2 = np.concatenate((-data[::-1]+2*data[0], data, -data[::-1]+2*data[N-1]))
time2 = np.concatenate((-time[::-1]+2*time[0], time, -time[::-1]+2*time[N-1]))
for n in xrange(N+1, 2*N):
left = N+1
right = 2*N
for i in xrange(200):
mid = int(0.5*(left+right))
diff = np.abs(data2[n-mid+N]-data2[n+mid-N])
if diff >= e:
if diff < e + de:
break
right = mid - 1
continue
left = mid + 1
leftlim = n - mid + N
rightlim = n + mid - N
y = data2[leftlim:rightlim:int(0.05*(rightlim-leftlim)+1)]
x = time2[leftlim:rightlim:int(0.05*(rightlim-leftlim)+1)]
xavg = np.average(x)
yavg = np.average(y)
xlen = len(x)
slopes[n-N] = (np.dot(x,y)-xavg*yavg*xlen)/(np.dot(x,x)-xavg*xavg*xlen)
return np.array(slopes)
Your comments suggest that you need to find a better method to estimate ik+1 given ik. No knowledge of values in data would yield to the naive algorithm:
At each iteration for n, leave i at previous value, and see if the abs(data[start]-data[end]) value is less than e. If it is, leave i at its previous value, and find your new one by incrementing it by 1 as you do now. If it is greater, or equal, do a binary search on i to find the appropriate value. You can possibly do a binary search forwards, but finding a good candidate upper limit without knowledge of data can prove to be difficult. This algorithm won't perform worse than your current estimation method.
If you know that data is kind of smooth (no sudden jumps, and hence a smooth plot for all i values) and monotonically increasing, you can replace the binary search with a search backwards by decrementing its value by 1 instead.
How to optimize this will depend on some properties of your data, but here are some ideas:
Have you tried profiling the code? Using one of the Python profilers can give you some useful information about what's taking the most time. Often, a piece of code you've just written will have one biggest bottleneck, and it's not always obvious which piece it is; profiling lets you figure that out and attack the main bottleneck first.
Do you know what typical values of i are? If you have some idea, you can speed things up by starting with i greater than 0 (as #vhallac noted), or by increasing i by larger amounts — if you often see big values for i, increase i by 2 or 3 at a time; if the distribution of is has a long tail, try doubling it each time; etc.
Do you need all the data when doing the least squares regression? If that function call is the bottleneck, you may be able to speed it up by using only some of the data in the range. Suppose, for instance, that at a particular point, you need i to be 200 to see a large enough (above-noise) change in the data. But you may not need all 400 points to get a good estimate of the slope — just using 10 or 20 points, evenly spaced in the start:end range, may be sufficient, and might speed up the code a lot.
I work with Python for similar analyses, and have a few suggestions to make. I didn't look at the details of your code, just to your problem statement:
1) It grabs a small piece of data of size dx (starting with 3
datapoints)
2) It evaluates whether the difference (i.e. |y(x+dx)-y(x-dx)| ) is
larger than a certain minimum value (40x std. dev. of noise)
3) If the difference is large enough, it will calculate the slope
using OLS regression. If the difference is too small, it will increase
dx and redo the loop with this new dx
4) This continues for all the datapoints
I think the more obvious reason for slow execution is the LOOPING nature of your code, when perhaps you could use the VECTORIZED (array-based operations) nature of Numpy.
For step 1, instead of taking pairs of points, you can perform directly `data[3:] - data[-3:] and get all the differences in a single array operation;
For step 2, you can use the result from array-based tests like numpy.argwhere(data > threshold) instead of testing every element inside some loop;
Step 3 sounds conceptually wrong to me. You say that if the difference is too small, it will increase dx. But if the difference is small, the resulting slope would be small because it IS actually small. Then, getting a small value is the right result, and artificially increasing dx to get a "better" result might not be what you want. Well, it might actually be what you want, but you should consider this. I would suggest that you calculate the slope for a fixed dx across the whole data, and then take the resulting array of slopes to select your regions of interest (for example, using data_slope[numpy.argwhere(data_slope > minimum_slope)].
Hope this helps!

Categories

Resources