This is perhaps too generic, but I don't know how to ask it without TMI.
I have a python program that is creating a 6 x 12 numpy matrix with a unique set of 12 strings in the x and y direction. From what I can tell from running my program, it seems takes about 2000 iterations +/- to produce a solution. If it fails to create a unique solution or the solution provided doesn't meet some qualifications after the initial success, then it starts over.
At this point, I was just looping through the program until I had success. I tried using garbage collection, but it still just chewed through memory (I have 26 GB free) and crashed. I then moved to having subprocess.call() and just rerunning the whole program while killing off the old PID.
This has stabilized the memory consumption, but Windows is saying the Python.exe process is 500 MB and I am getting about 340 attempts per minute. I can't tell if this is good or bad. I am 15,000 attempts into my first try. I imagined it would take a ton of attempts, but not sure as to relative speed.
Does this seem slow? Have I messed up the efficiency of the matrices and using too much memory? I don't have any frame of reference on what an optimized calculation time per minute would be. I have a lot more info if anyone is interested.
Here is the main loop where the program spends the most time:
def uniquecheck(inning, position, checkplayer, checkarr)
global xcheck
uniquelist = []
if xcheck < 2000:
y=0
for row in checkplayer:
if y <= (inning-2):
uniquelist.append(checkplayer[y,position])
y = y + 1
xcheck = xcheck + 1
columns = checkplayer.shape[1]
z=0
for z in range(columns):
if z != 0:
if z <= (position-1):
uniquelist.append(checkplayer[(inning-1), z])
z = z + 1
xcheck = xcheck + 1
I am getting around the problem by testing and failing the solution faster. Instead of filling the full array and then checking it after it's created. I put in some intermediate checks and fail ones that have no chance of passing early and save all the iterations.
Not sure how to speed it up anymore, but at least it isn't 12 hours slow now. I am sure I could continue to optimize the code, but it doesn't seem like it is worth it now that it is working.
Related
I´m currently writing a program in python for calculating the total spectral emissivity (infrared waves) of any given material at different temperatures (200K - 500K), based on measurement data received by measuring the directional - hemispherical emissivity of the material at many different wavelengths using an IR - spectroscope. The calculation is done by integrating the measured intensity over all wavelenghts, using Plancks law as a weighing function (all of this doesn´t really matter for my question itself; i just want to explain the background so that the code is easier to understand). This is my code:
from scipy import integrate
from scipy.interpolate import CubicSpline
import numpy as np
import math as m
def planck_blackbody(lambda_, T): # wavelength, temperature
h = float(6.6260755e-34)
c = float(2.99792458e+8)
k = float(1.380658e-23)
try:
a = 2.0 * h * (c ** 2)
b = h * c / (lambda_ * k * T)
intensity = a / ((lambda_ ** 5) * (m.exp(b) - 1.0))
return float(intensity)
except OverflowError: # for lower temperatures
pass
def spectral_emissivity(emifilename, t, lambda_1, lambda_2):
results = []
with open(emifilename, 'r') as emifile:
emilines = emifile.readlines()
try:
w = [float(x.split('\t')[0].strip('\n')) * 1e-6 for x in emilines]
e = [float(x.split('\t')[1].strip('\n')) for x in emilines]
except ValueError:
pass
w = np.asarray(w) # wavelength
e = np.asarray(e) # measured intensity
def part_1(lambda_, T):
E = interp1d(w, e, fill_value = 'extrapolate')(lambda_)
return E * planck_blackbody(lambda_, T)
def E_complete(T):
E_complete_part_1 = integrate.quad(part_1, lambda_1, lambda_2, args=T, limit=50)
E_complete_part_2 = integrate.quad(planck_blackbody, lambda_1, lambda_2, args=T, limit=50)
return E_complete_part_1[0] / E_complete_part_2[0]
for T in t:
results.append([T, E_complete(T)])
with open("{}.plk".format(emifilename[:-4]), 'w') as resultfile:
for item in results:
resultfile.write("{}\t{}\n".format(item[0], item[1]))
t = np.arange(200, 501, 1)
spectral_emissivity('C:\test.dat', t, 1.4e-6, 35e-6)
The measured intensity is stored in a text file with two columns, the first being the wavelength of the infrared waves and the second being the directional-hemispherical emissivity of the measured material at that wavelength.
When i run this code, while it is producing the right results, i still encounter 2 problems:
I get an error message from scipy.integrate.quad:
IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator
on the subranges. Perhaps a special-purpose integrator should be used.
warnings.warn(msg, IntegrationWarning)
Can someone explain to me what exactly this means? I understand that integrate.quad is a numerical iteration method and that my functions somehow seem to require more than 50 iterations, but is there a way around this? i tried increasing the Limit, but even with 200 i still get this error message... it´s especially weird given that the integrands are pretty straightforward functions...
is closely connected to the first problem: this program takes ages (about 5 minutes!) to finish one single file, but i need to process many files every hour. cProfile reveals that 98% percent of this time is spent inside the integraion function. A MathCad program doing the exact same thing and producing the same outputs only takes some seconds to finish. Even though i spent the last week seatching for a solution, i simply don´t manage to speed this program up, and no one else on stackoverflow and elsewhere seems to have comparable time problems with integrate.quad.
So, finally, my question: is there any obvious way to optimize this code for it to run faster (except from compiling it into C+ or anything like that)? I tried reducing all floats to 6 digits (i can´t go any lower in accuracy) but that didn´t change anything.
Update: looking into it some more, i figured out that most of the time wasn´t actually consumed by the Integration itself, but by the CubicSpline operation that i used to interpolate my data. I tried out different methods and CubicSpline seemed to be the only working one for some reason (even though my data is monotonically increasing, i got errors from every other method i tried, saying that some values were either above or below the interpolation range). That is, until i found out about extrapolation with scipy.interpolate.interp1dand (fill_value = 'extrapolate'. Ths did the trick for me, enabling me to use the far less consuming interp1d method and effectively reducing the runtime of my program from 280 to 49 seconds (also added list comprehension for w and e). While this is a big improvement, i still wonder why my program takes nearly 1 Minute to calculate some integrals... and i still get the above mentioned IntegrationWarning. So any advice is highly appreciated!
(btw, since i am pretty new to python, I´m happy about any tips or critique i can get!)
i = 1
d = 1
e = 20199
a = 10242248284272
while(True):
print(i)
if((i*e)%a == 1):
d = i
break
i = i + 1
Numbers are given to represent lengths of them.
I have a piece of Python code which works for really huge numbers and I have to wait for a day maybe two. How can I decrease the time needed for the code to run?
There are a variety of things you can do to speed this up; whether it will be enough is another matter.
Avoid printing things you don't have to.
Avoid multiplication. Instead of computing i*e each iteration, you could have a new variable, say i_e, which holds this value. Each time you increment i, you can add e to it.
The vast majority of your iterations aren't even close to satisfying (i*e)%a==1. If you used a larger increment for i, you could avoid testing most of them. Of course, you have to be careful not to overshoot.
Because you are doing modular arithmetic, you don't need to use the full value of i in your test (though you will still need to track it, for the output)
Combining the previous two points, you can avoid computing the modulus: If you know that a<=x<2*a, then x-a==1 is equivalent to x%a==1.
None of these may be enough to reduce the complexity of the problem, but might be enough to reduce the constant factor to speed things up enough for your purposes.
First time posting, so I apologize for any confusion.
I have two numpy arrays which are time stamps for a signal.
chan1,chan2 looks like:
911.05, 7.7
1055.6, 455.0
1513.4, 1368.15
4604.6, 3004.4
4970.35, 3344.25
13998.25, 4029.9
15008.7, 6310.15
15757.35, 7309.75
16244.2, 8696.1
16554.65, 9940.0
..., ...
and so on, (up to 65000 elements per chan. pre file)
Edit : The lists are already sorted but the issue is that they are not always equal in spacing. There are gaps that could show up, which would misalign them, so chan1[3] could be closer to chan2[23] instead of, if the spacing was qual chan2[2 or 3 or 4] : End edit
For each elements in chan1, I am interested in finding the closest neighbor in chan2, which is done with:
$ np.min(np.abs(chan2-chan1[i]))
and to keep track of positive or neg. difference:
$ index=np.where( np.abs( chan2-chan1[i]) == res[i])[0][0]
$ if chan2[index]-chan1[i] <0.0 : res[i]=res[i]*(-1.0)
Lastly, I create a histogram of all the differences, in a range I am interested in.
My concern is that I do this in the for loop. I usually try to avoid for loops when I can by utilizing the numpy arrays, as each operation can be performed on the entire array. However, in this case I am unable to find a solution or a build in function (which I understand run significantly faster than anything I can make).
The routine takes about 0.03 seconds per file. There are a few more things happening outside of the function but not a significant number, mostly plotting after everything is done, and a loop to read in files.
I was wondering if anyone has seen a similar problem, or is familiar enough with the python libraries to suggest a solution (maybe a build in function?) to obtain the data I am interested in? I have to go over hundred of thousands of files, and currently my data analysis is about 10 slower than data acquisition. We are also in the middle of upgrading our instruments to where we will be able to obtain data 10-100 times faster, and so the analysis speed is going to become an serious issue.
I would prefer not to use a cluster to brute force the problem, and not too familiar with parallel processing, although I would not mind dabbling in it. It would take me a while to write it in C, and I am not sure if I would be able to make it faster.
Thank you in advance for your help.
def gen_hist(chan1,chan2):
res=np.arange(1,len(chan1)+1,1)*0.0
for i in range(len(chan1)):
res[i]=np.min(np.abs(chan2-chan1[i]))
index=np.where( np.abs( chan2-chan1[i]) == res[i])[0][0]
if chan2[index]-chan1[i] <0.0 : res[i]=res[i]*(-1.0)
return np.histogram(res,bins=np.arange(time_range[0]-interval,\
time_range[-1]+interval,\
interval))[0]
After all the files are cycled through I obtain a plot of the data:
Example of the histogram
Your question is a little vague, but I'm assuming that, given two sorted arrays, you're trying to return an array containing the differences between each element of the first array and the closest value in the second array.
Your algorithm will have a worst case of O(n^2) (np.where() and np.min() are O(n)). I would tackle this by using two iterators instead of one. You store the previous (r_p) and current (r_c) value of the right array and the current (l_c) value of the left array. For each value of the left array, increment the right array until r_c > l_c. Then append min(abs(r_p - l_c), abs(r_c - l_c)) to your result.
In code:
l = [ ... ]
r = [ ... ]
i = 0
j = 0
result = []
r_p = r_c = r[0]
while i < len(l):
l_c = l[i]
while r_c < l and j < len(r):
j += 1
r_c = r[j]
r_p = r[j-1]
result.append(min(abs(r_c - l_c), abs(r_p - l_c)))
i += 1
This runs in O(n). If you need additional speed out of it, try writing it in C or running it in Cython.
I am trying to come up with a faster way of coding what I want to. Here is the part of my program I am trying to speed up, hopefully using more inbuilt functions:
num = 0
num1 = 0
rand1 = rand_pos[0:10]
time1 = time.clock()
for rand in rand1:
for gal in gal_pos:
num1 = dist(gal, rand)
num = num + num1
time2 = time.clock()
time_elap = time2-time1
print time_elap
Here, rand_pos and gal_pos are lists of length 900 and 1 million respectively.
Here dist is function where I calculate the distance between two points in euclidean space.
I used a snippet of the rand_pos to get a time measurement.
My time measurements are coming to be about 125 seconds. This is way too long!
It means that if I run the code over all the rand_pos, it will take about three hours to do!
Is there a faster way I can do this?
Here is the dist function:
def dist(pos1,pos2):
n = 0
dist_x = pos1[0]-pos2[0]
dist_y = pos1[1]-pos2[1]
dist_z = pos1[2]-pos2[2]
if dist_x<radius and dist_y<radius and dist_z<radius:
positions = [pos1,pos2]
distance = scipy.spatial.distance.pdist(positions, metric = 'euclidean')
if distance<radius:
n = 1
return n
While most of the optimization probably needs to happen within your dist function, there are some tips here to speed things up:
# Don't manually sum
for rand in rand1:
num += sum([dist(gal, rand) for gal in gal_pos])
#If you can vectorize something, then do
import numpy as np
new_dist = np.vectorize(dist)
for rand in rand1:
num += np.sum(new_dist(gal_pos, rand))
# use already-built code whenever possible (as already suggested)
scipy.spatial.distance.cdist(gal, rand1, metric='euclidean')
There is a function in scipy that does exactly what you want to do here:
scipy.spatial.distance.cdist(gal, rand1, metric='euclidean')
It will be faster than anything you write in pure Python probably, since the heavy lifting (looping over the pairwise combinations between arrays) is implemented in C.
Currently your loop is happening in Python, which means there is more overhead per iteration, then you are making many calls to pdist. Even though pdist is very optimized, the overhead of making so many calls to it slows down your code. This type of performance issue was once described to me with a very useful analogy: its like trying to have a conversation with someone over the phone by saying one word per phone call, even though each word is going across the line very fast, your conversation will take a long time because you need to hang up and dial again repeatedly.
I'm writing a program in Python that's processing some data generated during experiments, and it needs to estimate the slope of the data. I've written a piece of code that does this quite nicely, but it's horribly slow (and I'm not very patient). Let me explain how this code works:
1) It grabs a small piece of data of size dx (starting with 3 datapoints)
2) It evaluates whether the difference (i.e. |y(x+dx)-y(x-dx)| ) is larger than a certain minimum value (40x std. dev. of noise)
3) If the difference is large enough, it will calculate the slope using OLS regression. If the difference is too small, it will increase dx and redo the loop with this new dx
4) This continues for all the datapoints
[See updated code further down]
For a datasize of about 100k measurements, this takes about 40 minutes, whereas the rest of the program (it does more processing than just this bit) takes about 10 seconds. I am certain there is a much more efficient way of doing these operations, could you guys please help me out?
Thanks
EDIT:
Ok, so I've got the problem solved by using only binary searches, limiting the number of allowed steps by 200. I thank everyone for their input and I selected the answer that helped me most.
FINAL UPDATED CODE:
def slope(self, data, time):
(wave1, wave2) = wt.dwt(data, "db3")
std = 2*np.std(wave2)
e = std/0.05
de = 5*std
N = len(data)
slopes = np.ones(shape=(N,))
data2 = np.concatenate((-data[::-1]+2*data[0], data, -data[::-1]+2*data[N-1]))
time2 = np.concatenate((-time[::-1]+2*time[0], time, -time[::-1]+2*time[N-1]))
for n in xrange(N+1, 2*N):
left = N+1
right = 2*N
for i in xrange(200):
mid = int(0.5*(left+right))
diff = np.abs(data2[n-mid+N]-data2[n+mid-N])
if diff >= e:
if diff < e + de:
break
right = mid - 1
continue
left = mid + 1
leftlim = n - mid + N
rightlim = n + mid - N
y = data2[leftlim:rightlim:int(0.05*(rightlim-leftlim)+1)]
x = time2[leftlim:rightlim:int(0.05*(rightlim-leftlim)+1)]
xavg = np.average(x)
yavg = np.average(y)
xlen = len(x)
slopes[n-N] = (np.dot(x,y)-xavg*yavg*xlen)/(np.dot(x,x)-xavg*xavg*xlen)
return np.array(slopes)
Your comments suggest that you need to find a better method to estimate ik+1 given ik. No knowledge of values in data would yield to the naive algorithm:
At each iteration for n, leave i at previous value, and see if the abs(data[start]-data[end]) value is less than e. If it is, leave i at its previous value, and find your new one by incrementing it by 1 as you do now. If it is greater, or equal, do a binary search on i to find the appropriate value. You can possibly do a binary search forwards, but finding a good candidate upper limit without knowledge of data can prove to be difficult. This algorithm won't perform worse than your current estimation method.
If you know that data is kind of smooth (no sudden jumps, and hence a smooth plot for all i values) and monotonically increasing, you can replace the binary search with a search backwards by decrementing its value by 1 instead.
How to optimize this will depend on some properties of your data, but here are some ideas:
Have you tried profiling the code? Using one of the Python profilers can give you some useful information about what's taking the most time. Often, a piece of code you've just written will have one biggest bottleneck, and it's not always obvious which piece it is; profiling lets you figure that out and attack the main bottleneck first.
Do you know what typical values of i are? If you have some idea, you can speed things up by starting with i greater than 0 (as #vhallac noted), or by increasing i by larger amounts — if you often see big values for i, increase i by 2 or 3 at a time; if the distribution of is has a long tail, try doubling it each time; etc.
Do you need all the data when doing the least squares regression? If that function call is the bottleneck, you may be able to speed it up by using only some of the data in the range. Suppose, for instance, that at a particular point, you need i to be 200 to see a large enough (above-noise) change in the data. But you may not need all 400 points to get a good estimate of the slope — just using 10 or 20 points, evenly spaced in the start:end range, may be sufficient, and might speed up the code a lot.
I work with Python for similar analyses, and have a few suggestions to make. I didn't look at the details of your code, just to your problem statement:
1) It grabs a small piece of data of size dx (starting with 3
datapoints)
2) It evaluates whether the difference (i.e. |y(x+dx)-y(x-dx)| ) is
larger than a certain minimum value (40x std. dev. of noise)
3) If the difference is large enough, it will calculate the slope
using OLS regression. If the difference is too small, it will increase
dx and redo the loop with this new dx
4) This continues for all the datapoints
I think the more obvious reason for slow execution is the LOOPING nature of your code, when perhaps you could use the VECTORIZED (array-based operations) nature of Numpy.
For step 1, instead of taking pairs of points, you can perform directly `data[3:] - data[-3:] and get all the differences in a single array operation;
For step 2, you can use the result from array-based tests like numpy.argwhere(data > threshold) instead of testing every element inside some loop;
Step 3 sounds conceptually wrong to me. You say that if the difference is too small, it will increase dx. But if the difference is small, the resulting slope would be small because it IS actually small. Then, getting a small value is the right result, and artificially increasing dx to get a "better" result might not be what you want. Well, it might actually be what you want, but you should consider this. I would suggest that you calculate the slope for a fixed dx across the whole data, and then take the resulting array of slopes to select your regions of interest (for example, using data_slope[numpy.argwhere(data_slope > minimum_slope)].
Hope this helps!