Some python / numpy optimization possible?

Some python / numpy optimization possible? - python

I am profiling some genetic algorithm code with some nested loops and from what I see most of the time is spent in two of my functions which involve slicing and adding up numpy arrays. I tried my best to further optimize them but would like to see if others come up with ideas.
Function 1:
The first function is called 2954684 times for a total time spent inside the function of 19 seconds
We basically just create views inside numpy arrays contained in data[0], according to coordinates contained in data[1]
def get_signal(data, options):
#data[0] contains bed, data[1] contains position
#forward = 0, reverse = 1
start = data[1][0] - options.halfwinwidth
end = data[1][0] + options.halfwinwidth
if data[1][1] == 0:
normals_forward = data[0]['normals_forward'][start:end]
normals_reverse = data[0]['normals_reverse'][start:end]
else:
normals_forward = data[0]['normals_reverse'][end - 1:start - 1: -1]
normals_reverse = data[0]['normals_forward'][end - 1:start - 1: -1]
row = {'normals_forward': normals_forward,
'normals_reverse': normals_reverse,
}
return row
Function 2:
Called 857 times for a total time of 13.674 seconds spent inside the function:
signal is a list of numpy arrays of equal length with dtype float, options is just random options
The goal of the function is just to add up the lists of each numpy arrays to a single one, calculate the intersection of the two curves formed by the forward and reverse arrays and return the result
def calculate_signal(signal, options):
profile_normals_forward = np.zeros(options.halfwinwidth * 2, dtype='f')
profile_normals_reverse = np.zeros(options.halfwinwidth * 2, dtype='f')
#here i tried np.sum over axis = 0, its significantly slower than the for loop approach
for b in signal:
profile_normals_forward += b['normals_forward']
profile_normals_reverse += b['normals_reverse']
count = len(signal)
if options.normalize == 1:
#print "Normalizing to max counts"
profile_normals_forward /= max(profile_normals_forward)
profile_normals_reverse /= max(profile_normals_reverse)
elif options.normalize == 2:
#print "Normalizing to number of elements"
profile_normals_forward /= count
profile_normals_reverse /= count
intersection_signal = np.fmin(profile_normals_forward, profile_normals_reverse)
intersection = np.sum(intersection_signal)
results = {"intersection": intersection,
"profile_normals_forward": profile_normals_forward,
"profile_normals_reverse": profile_normals_reverse,
}
return results
As you can see the two are very simple, but account for > 60% of my execution time on a script that can run for hours / days (genetic algorithm optimization), so even minor improvements are welcome :)

One simple thing I would do to increase the speed of the first function is to use different notation for the accessing of the list indices as detailed here.
For example:
foo = numpyArray[1][0]
bar = numpyArray[1,0]
The second line will execute much faster because you don't have to return the entire element at numpyArray[1] and then find the first element of that. Try it out

Related

Function to get the time complexity graph of any function/program

I am new in the developing community and I was wondering if there is any function or method you use to decide which algorithm has the best performance and therefore use it instead of any other.
For example:
I am using a decorator to know how long the functions are taking to solve problems, but I dont think that is extrapolable, hence, I was thinking maybe there is a general method or function you use to decide which algorithm to use.
Can you help me please?
Example I was using the time library to know how long two independent functions take to count the negative numbers in an array:
import time
def time_it(func):
def wrapper(*args,**kwargs):
start=time.time()
result=func(*args,**kwargs)
end=time.time()
print(func.__name__ +" took " +str((end-start)*1000) + " mil seconds")
return result
return wrapper
array=[
[-4, -3, -1, 1],
[-2, -2, 1, 2],
[-1, 1, 2, 3],
[1, 2, 4, 5]
]
#time_it
def count_negatives(array):
count=0
for i in array:
for j in i:
if j < 0:
count +=1
return count
#time_it
def count_neg(array):
count=0
row=0
column=0
while row<len(array) and column<len(array[0]):
if array[row][column]<0:
count +=1
column +=1
else:
row +=1
column=0
return count
print(count_negatives(array))
print(count_neg(array))

An algorithm runs according to some input given to it and as well the operations its doing on it (and other variables).
With enough sampling, you can plot graphs (I prefer the matplotlib library) and see which one handles the input you're giving it the best.
Keep in mind these will be only samples from your own "computer" - meaning it may run faster or slower for others.
Here we can use the time_it decorator you've written with a little change:
I would prefer using time.perf_counter() as it uses the fastest clock cycles thus can give more accurate results than just time.time()
The decorator will return the actual time it took in milliseconds.
I'll change some names so it will be easier to follow + remove the return value as we don't care about the answer of whether an array contains a negative number.
import time
def time_it(func):
def wrapper(*args,**kwargs):
start=time.perf_counter()
result=func(*args,**kwargs)
end=time.perf_counter()
return (end - start) * 1_000 # return is in milliseconds!
return wrapper
#time_it
def count_negatives_v1(array):
count = 0
for i in array:
for j in i:
if j < 0:
count += 1
#time_it
def count_negatives_v2(array):
count = 0
row = 0
column = 0
while row < len(array) and column < len(array[0]):
if array[row][column] < 0:
count += 1
column += 1
else:
row += 1
column = 0
We can now build some function that generates list of lists containing random integers between any range we choose! I've chosen that it could generate a list that contain 500-1000 lists, and these "inner" lists contain 50 numbers each can be between -1000 and 1000
def generate_arrays(inner_lists_amount=(500, 1000), numbers=(-1_000, 1_000), inner_lists_length=50):
inner_arrays_count = random.choice(range(*inner_lists_amount))
return [list(random.choices(range(*numbers), k=inner_lists_length)) for _ in range(inner_arrays_count)]
This will generate up inner_arrays_quantity inner arrays to inner arrays, each one containing 50 number between -1000 and 1000.
Then we will pass it to each of the function you've written: (e.g. v1, v2) and get the result, we will save the output as our "y" values on the graph, the "x" values will be the sample index, here I've chosen sample amount of 1000 meaning it will call 1000 times the generate_arrays, pass it to v1 and v2 and save the results for each of these methods in a different "y" value lists:
import matplotlib.pyplot as plt
def build_graphs(sample_count=100):
x = range(sample_count)
y_v1 = []
y_v2 = []
for _ in range(sample_count):
print(_)
arrays = generate_arrays()
y_v1.append(count_negatives_v1(arrays))
y_v2.append(count_negatives_v2(arrays))
plt.plot(x, y_v1, 'r')
plt.plot(x, y_v2, 'g')
plt.show()
Using the matplotlib module we coloured the second method (v2) with green and v1 in red.
This will give us results as following:
Now this is not 100% accurate and will never be as it depends on a lot of things such as:
PC memory
CPU clock rate sampling the time
and much more, but can be somewhat be improved if for each call of the generate_arrays we do X more tests and check the average time it takes on each specific array. Because here we tested only once how much time it takes for v1, v2 to run on each array... however because the sample amount is 1000 it gives fairly the same results as expected.
Note: this does not give the actual order of the functions (big-o) - if you want to do it, then you can give it increasing amount of data, plotting it into excel and use a trendline with the highest R value to find the best graph function that has the nearest to 100%.
More info using the openpyxl module

Why running time of better algorithm code is more than primitive algorithm code?

Problem description : Write a function that takes in a non-empty array of integers that are sorted in ascending order and returns a new array of the same length with the squares of the original integers also sorted in ascending order.
I have wrote two Python functions to solve the problem. One is 'sortedSquaredArrayNormal' and another called 'sortedSquaredArrayBetter'. First one has O(nlogn) time complexity and second function has O(n) time complexity I guess. I have also written third function 'test_runtime_compare' that prints each function run time. Below is my code:
import random
import time
def sortedSquaredArrayNormal(array):
square_arr = []
for elem in array:
square_arr.append(elem*elem)
square_arr.sort()
return square_arr
def sortedSquaredArrayBetter(array):
big_index = len(array)-1
small_index = 0
output_arr = [0 for elem in array]
# elements of bigger indices inserted first in output array
for idx in range(len(array)-1, -1, -1):
small_elem = array[small_index]
big_elem = array[big_index]
if(abs(small_elem) > abs(big_elem)):
output_arr[idx] = small_elem * small_elem
small_index += 1 # small index is shifted 1 position to right
else:
output_arr[idx] = big_elem * big_elem
big_index -= 1 # big index is shifted 1 position to left
return output_arr
def test_runtime_compare():
new_arr = [random.randrange(-100, 100) for i in range(100000)]
new_arr.sort()
initial = time.time()
dummy = sortedSquaredArrayNormal(new_arr)
final = time.time()
normal_time = final - initial
print('Normal time: {}'.format(normal_time))
time.sleep(5)
initial = time.time()
new = sortedSquaredArrayBetter(new_arr)
final = time.time()
better_time = final - initial
print('Better time: {}'.format(better_time))
test_runtime_compare()
I got the output:
Normal time: 0.03777050971984863
Better time: 0.11590099334716797
I was expecting 'better time' to be smaller than 'normal time'. But every time I run the code in my machine with larger input array I get 'normal time' less than 'better time'. I can't find the cause. Can anyone help me to understand the cause? Do I have any mistake in complexity analysis?

take in non-empty array of integers that are sorted in ascending order
and returns a new array of the same length with the squares of the
original integers also sorted in ascending order.
That sounds like a very trivial issue, were it not for negative numbers.
For example, for [-3,-2,-1,0,1,2,3], you would have to sort [9,4,1,0,1,4,9].
The order of the squared numbers is very predictable, hence you found an algorithm that does it in O(n).
But maybe the built-in algorithm for sort is also very good at sorting these kinds of very monotone sequences, so it can also do it in O(n) instead of O(n * log(n)), which is for completely random sequences.

Removing nested for loop to find coincidence values

I am currently using a nested for loop to iterate through to arrays to find values that match a certain criterion. The problem is that this method is incredibly inefficient and time consuming. I was told that a better way might be to sort the two arrays based on the data, but this requires me to combine several 1D arrays and one multi-D array, sort based on one column, then separate them again. Is there a more efficient way of doing this? Here is a sample of my code:
x1 = []
x2 = []
velocity = []
plane1Times = np.array([[2293902],[2848853],[482957]])
plane2Times = np.array([[7416504],[2613113],[2326542]])
plane1Local = np.array([[0,0,0],[0,u,0],[0,2*u,0],[u,0,0],[u,u,0],[u,2*u,0],[2*u,0,0],[2*u,u,0],[2*u,2*u,0],[3*u,0,0],[3*u,u,0],[3*u,2*u,0]],dtype='float')
plane2Local = np.array([[0,0,D],[0,u,D],[0,2*u,D],[u,0,D],[u,u,D],[u,2*u,D],[2*u,0,D],[2*u,u,D],[2*u,2*u,D],[3*u,0,D],[3*u,u,D],[3*u,2*u,D]],dtype='float')
for i in range(0,len(plane1Times)):
tic = time.time()
for n in range(0,len(plane2Times)):
if plane2Times[n] - plane1Times[i] <= 10000 and plane2Times[n] - plane1Times[i] > 0:
x1 = plane1Local[plane1Dets[i]]
x2 = plane2Local[plane2DetScale[n]]
distance = np.sqrt((x2[0]-x1[0])**2 + (x2[1]-x1[1])**2 + (x2[2])**2)
timeSeparation = (plane2Times[n]-plane1Times[i])*timeScale
velocity += distance/timeSeparation
break
To give you an example of the time it is currently taking, each array of times is 10**6 values long so 100 loops in i takes about 60 seconds. Can someone please help me?

I cant really test because the code you provided isn't complete, but this is a possible solution
for index,value in enumerate(plane1Times):
vec = plane2Times - value
row,col = np.where((vec<=10000)&(vec>0))
if len(row) > 0:
x1 = plane1Local[plane1Dets[index]]
x2 = plane2Local[plane2DetScale[row[0]]]
distance = np.sqrt((x2[0] - x1[0]) ** 2 + (x2[1] - x1[1]) ** 2 + (x2[2]) ** 2)
timeSeparation = (plane2Times[row[0]] - plane1Times[index]) * timeScale
velocity += distance / timeSeparation
Eliminate the second loop, and just do the subtraction all at once. Then search the new array, where it meats your criteria. Since it seems that you want the first value, just take the first index like row[0] to get the index of the value check. Removing the second for loop should drop the time considerably.

Speed up loop to fill an array with closest values from another array

I have a block of code that I need to optimize as much as possible since I have to run it several thousand times.
What it does is it finds the closest float in a sub-list of a given array for a random float and stores the corresponding float (ie: with the same index) stored in another sub-list of that array. It repeats the process until the sum of floats stored reaches a certain limit.
Here's the MWE to make it clearer:
import numpy as np
# Define array with two sub-lists.
a = [np.random.uniform(0., 100., 10000), np.random.random(10000)]
# Initialize empty final list.
b = []
# Run until the condition is met.
while (sum(b) < 10000):
# Draw random [0,1) value.
u = np.random.random()
# Find closest value in sub-list a[1].
idx = np.argmin(np.abs(u - a[1]))
# Store value located in sub-list a[0].
b.append(a[0][idx])
The code is reasonably simple but I haven't found a way to speed it up. I tried to adapt the great (and very fast) answer given in a similar question I made some time ago, to no avail.

OK, here's a slightly left-field suggestion. As I understand it, you are just trying to sample uniformally from the elements in a[0] until you have a list whose sum exceeds some limit.
Although it will be more costly memory-wise, I think you'll probably find it's much faster to generate a large random sample from a[0] first, then take the cumsum and find where it first exceeds your limit.
For example:
import numpy as np
# array of reference float values, equivalent to a[0]
refs = np.random.uniform(0, 100, 10000)
def fast_samp_1(refs, lim=10000, blocksize=10000):
# sample uniformally from refs
samp = np.random.choice(refs, size=blocksize, replace=True)
samp_sum = np.cumsum(samp)
# find where the cumsum first exceeds your limit
last = np.searchsorted(samp_sum, lim, side='right')
return samp[:last + 1]
# # if it's ok to be just under lim rather than just over then this might
# # be quicker
# return samp[samp_sum <= lim]
Of course, if the sum of the sample of blocksize elements is < lim then this will fail to give you a sample whose sum is >= lim. You could check whether this is the case, and append to your sample in a loop if necessary.
def fast_samp_2(refs, lim=10000, blocksize=10000):
samp = np.random.choice(refs, size=blocksize, replace=True)
samp_sum = np.cumsum(samp)
# is the sum of our current block of samples >= lim?
while samp_sum[-1] < lim:
# if not, we'll sample another block and try again until it is
newsamp = np.random.choice(refs, size=blocksize, replace=True)
samp = np.hstack((samp, newsamp))
samp_sum = np.hstack((samp_sum, np.cumsum(newsamp) + samp_sum[-1]))
last = np.searchsorted(samp_sum, lim, side='right')
return samp[:last + 1]
Note that concatenating arrays is pretty slow, so it would probably be better to make blocksize large enough to be reasonably sure that the sum of a single block will be >= to your limit, without being excessively large.
Update
I've adapted your original function a little bit so that its syntax more closely resembles mine.
def orig_samp(refs, lim=10000):
# Initialize empty final list.
b = []
a1 = np.random.random(10000)
# Run until the condition is met.
while (sum(b) < lim):
# Draw random [0,1) value.
u = np.random.random()
# Find closest value in sub-list a[1].
idx = np.argmin(np.abs(u - a1))
# Store value located in sub-list a[0].
b.append(refs[idx])
return b
Here's some benchmarking data.
%timeit orig_samp(refs, lim=10000)
# 100 loops, best of 3: 11 ms per loop
%timeit fast_samp_2(refs, lim=10000, blocksize=1000)
# 10000 loops, best of 3: 62.9 µs per loop
That's a good 3 orders of magnitude faster. You can do a bit better by reducing the blocksize a fraction - you basically want it to be comfortably larger than the length of the arrays you're getting out. In this case, you know that on average the output will be about 200 elements long, since the mean of all real numbers between 0 and 100 is 50, and 10000 / 50 = 200.
Update 2
It's easy to get a weighted sample rather than a uniform sample - you can just pass the p= parameter to np.random.choice:
def weighted_fast_samp(refs, weights=None, lim=10000, blocksize=10000):
samp = np.random.choice(refs, size=blocksize, replace=True, p=weights)
samp_sum = np.cumsum(samp)
# is the sum of our current block of samples >= lim?
while samp_sum[-1] < lim:
# if not, we'll sample another block and try again until it is
newsamp = np.random.choice(refs, size=blocksize, replace=True,
p=weights)
samp = np.hstack((samp, newsamp))
samp_sum = np.hstack((samp_sum, np.cumsum(newsamp) + samp_sum[-1]))
last = np.searchsorted(samp_sum, lim, side='right')
return samp[:last + 1]

Write it in cython. That's going to get you a lot more for a high iteration operation.
http://cython.org/

One obvious optimization - don't re-calculate sum on each iteration, accumulate it
b_sum = 0
while b_sum<10000:
....
idx = np.argmin(np.abs(u - a[1]))
add_val = a[0][idx]
b.append(add_val)
b_sum += add_val
EDIT:
I think some minor improvement (check it out if you feel like it) may be achieved by pre-referencing sublists before the loop
a_0 = a[0]
a_1 = a[1]
...
while ...:
....
idx = np.argmin(np.abs(u - a_1))
b.append(a_0[idx])
It may save some on run time - though I don't believe it will matter that much.

Sort your reference array.
That allows log(n) lookups instead of needing to browse the whole list. (using bisect for example to find the closest elements)
For starters, I reverse a[0] and a[1] to simplify the sort:
a = np.sort([np.random.random(10000), np.random.uniform(0., 100., 10000)])
Now, a is sorted by order of a[0], meaning if you are looking for the closest value to an arbitrary number, you can start by a bisect:
while (sum(b) < 10000):
# Draw random [0,1) value.
u = np.random.random()
# Find closest value in sub-list a[0].
idx = bisect.bisect(a[0], u)
# now, idx can either be idx or idx-1
if idx is not 0 and np.abs(a[0][idx] - u) > np.abs(a[0][idx - 1] - u):
idx = idx - 1
# Store value located in sub-list a[1].
b.append(a[1][idx])

Find large number of consecutive values fulfilling condition in a numpy array

I have some audio data loaded in a numpy array and I wish to segment the data by finding silent parts, i.e. parts where the audio amplitude is below a certain threshold over a period in time.
An extremely simple way to do this is something like this:
values = ''.join(("1" if (abs(x) < SILENCE_THRESHOLD) else "0" for x in samples))
pattern = re.compile('1{%d,}'%int(MIN_SILENCE))
for match in pattern.finditer(values):
# code goes here
The code above finds parts where there are at least MIN_SILENCE consecutive elements smaller than SILENCE_THRESHOLD.
Now, obviously, the above code is horribly inefficient and a terrible abuse of regular expressions. Is there some other method that is more efficient, but still results in equally simple and short code?

Here's a numpy-based solution.
I think (?) it should be faster than the other options. Hopefully it's fairly clear.
However, it does require a twice as much memory as the various generator-based solutions. As long as you can hold a single temporary copy of your data in memory (for the diff), and a boolean array of the same length as your data (1-bit-per-element), it should be pretty efficient...
import numpy as np
def main():
# Generate some random data
x = np.cumsum(np.random.random(1000) - 0.5)
condition = np.abs(x) < 1
# Print the start and stop indices of each region where the absolute
# values of x are below 1, and the min and max of each of these regions
for start, stop in contiguous_regions(condition):
segment = x[start:stop]
print start, stop
print segment.min(), segment.max()
def contiguous_regions(condition):
"""Finds contiguous True regions of the boolean array "condition". Returns
a 2D array where the first column is the start index of the region and the
second column is the end index."""
# Find the indicies of changes in "condition"
d = np.diff(condition)
idx, = d.nonzero()
# We need to start things after the change in "condition". Therefore,
# we'll shift the index by 1 to the right.
idx += 1
if condition[0]:
# If the start of condition is True prepend a 0
idx = np.r_[0, idx]
if condition[-1]:
# If the end of condition is True, append the length of the array
idx = np.r_[idx, condition.size] # Edit
# Reshape the result into two columns
idx.shape = (-1,2)
return idx
main()

There is a very convenient solution to this using scipy.ndimage. For an array:
a = np.array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0])
which can be the result of a condition applied to another array, finding the contiguous regions is as simple as:
regions = scipy.ndimage.find_objects(scipy.ndimage.label(a)[0])
Then, applying any function to those regions can be done e.g. like:
[np.sum(a[r]) for r in regions]

Slightly sloppy, but simple and fast-ish, if you don't mind using scipy:
from scipy.ndimage import gaussian_filter
sigma = 3
threshold = 1
above_threshold = gaussian_filter(data, sigma=sigma) > threshold
The idea is that quiet portions of the data will smooth down to low amplitude, and loud regions won't. Tune 'sigma' to affect how long a 'quiet' region must be; tune 'threshold' to affect how quiet it must be. This slows down for large sigma, at which point using FFT-based smoothing might be faster.
This has the added benefit that single 'hot pixels' won't disrupt your silence-finding, so you're a little less sensitive to certain types of noise.

I haven't tested this but you it should be close to what you are looking for. Slightly more lines of code but should be more efficient, readable, and it doesn't abuse regular expressions :-)
def find_silent(samples):
num_silent = 0
start = 0
for index in range(0, len(samples)):
if abs(samples[index]) < SILENCE_THRESHOLD:
if num_silent == 0:
start = index
num_silent += 1
else:
if num_silent > MIN_SILENCE:
yield samples[start:index]
num_silent = 0
if num_silent > MIN_SILENCE:
yield samples[start:]
for match in find_silent(samples):
# code goes here

This should return a list of (start,length) pairs:
def silent_segs(samples,threshold,min_dur):
start = -1
silent_segments = []
for idx,x in enumerate(samples):
if start < 0 and abs(x) < threshold:
start = idx
elif start >= 0 and abs(x) >= threshold:
dur = idx-start
if dur >= min_dur:
silent_segments.append((start,dur))
start = -1
return silent_segments
And a simple test:
>>> s = [-1,0,0,0,-1,10,-10,1,2,1,0,0,0,-1,-10]
>>> silent_segs(s,2,2)
[(0, 5), (9, 5)]

another way to do this quickly and concisely:
import pylab as pl
v=[0,0,1,1,0,0,1,1,1,1,1,0,1,0,1,1,0,0,0,0,0,1,0,0]
vd = pl.diff(v)
#vd[i]==1 for 0->1 crossing; vd[i]==-1 for 1->0 crossing
#need to add +1 to indexes as pl.diff shifts to left by 1
i1=pl.array([i for i in xrange(len(vd)) if vd[i]==1])+1
i2=pl.array([i for i in xrange(len(vd)) if vd[i]==-1])+1
#corner cases for the first and the last element
if v[0]==1:
i1=pl.hstack((0,i1))
if v[-1]==1:
i2=pl.hstack((i2,len(v)))
now i1 contains the beginning index and i2 the end index of 1,...,1 areas

#joe-kington I've got about 20%-25% speed improvement over np.diff / np.nonzero solution by using argmax instead (see code below, condition is boolean)
def contiguous_regions(condition):
idx = []
i = 0
while i < len(condition):
x1 = i + condition[i:].argmax()
try:
x2 = x1 + condition[x1:].argmin()
except:
x2 = x1 + 1
if x1 == x2:
if condition[x1] == True:
x2 = len(condition)
else:
break
idx.append( [x1,x2] )
i = x2
return idx
Of course, your mileage may vary depending on your data.
Besides, I'm not entirely sure, but i guess numpy may optimize argmin/argmax over boolean arrays to stop searching on first True/False occurrence. That might explain it.

I know I'm late to the party, but another way to do this is with 1d convolutions:
np.convolve(sig > threshold, np.ones((cons_samples)), 'same') == cons_samples
Where cons_samples is the number of consecutive samples you require above threshold

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Some python / numpy optimization possible? - python

Related

Function to get the time complexity graph of any function/program

Why running time of better algorithm code is more than primitive algorithm code?

Removing nested for loop to find coincidence values

Speed up loop to fill an array with closest values from another array

Find large number of consecutive values fulfilling condition in a numpy array

Categories

Resources