Trying to improve efficiency of iterative Python program

Trying to improve efficiency of iterative Python program - python

Good morning, and sorry if this is a vague question. I'll try to be as descriptive as possible.
Basically, I am using a Python code to post-process the results from an air dispersion model to test out different scenarios. I am using Python as it can iterate through the results in a matter of seconds while the dispersion software takes hours. My problem is that the code will still take weeks to run through all my scenarios and I'm wondering if it's due to poor programming. I won't put the whole code here, as a lot if it is not relevant, but I will go through the steps I'm taking. First, here's an outline of my problem:
I have 17 sources all acting simultaneously
Each source can have four different emission rates, which are independant of the other source. I.e. source #1 can have emission rate a, b, c, or d. As can source #2-#17.
Each source can take one of two states. We will call them working or not working, but in both states there is an emissions rate. HOWEVER, only 5 sources can be working simultaneously. This is important.
To summarize, the emission from each source is a function of the four emission rates as well as the two states. So each source has 8 possible emission scenarios, and all 17 sources could be in any of these scenarios at a time. Quite a lot of permutations!
Here's how I'm currently computing the results. I want to know for each combination of states what the maximum result would be. If you're familiar with air dispersion modelling, I have already calculated the results based on a 1 g/s emission rate, so I can scale the results by the emission rates above.
THE CODE:
sources = ['1','2',...'17']
emission_rates = ['a','b','c','d']
Source_1_results = [list of values of length x] ## NOTE THAT x is VERY LONG. THESE ARE HUGE ARRAYS (400,000 values)
Source_2_result = [list of values of length x]
.
.
Source_17_results = [list of values of length x]
working_sources = list(itertools.combinations(sources, 5))
source_emission_rate = list(itertools.combinations_with_replacement(emission_rates, 17))
for e in source_emission_rate:
for w in working_sources:
temp_results = []
for num, source in enumerate(sources):
temp_results[num] = [Source_x_result * e * w] ##THIS LINE INVOLVES SOME LOOKUP IN MY CODE TO REFERENCE THE ACTUAL RESULTS AND EMISSIONS ETC.
I'm sorry if this isn't enough code. I can post the full code, but again, for the most part it's just assigning variables etc.
My question is: is there a quicker way to iterate through all my possible states? My code is currently working, but I have limited python knowledge and would like to be able to run it more frequently while changing variables etc.
Thank you in advance!

this should go slightly faster (less intermediate list + list comprehension)
working_sources = itertools.combinations(sources, 5)
source_emission_rate = itertools.combinations_with_replacement(emission_rates, 17)
for e in source_emission_rate:
for w in working_sources:
temp_results = [source * e * w for source in sources]

Related

Difference of random numbers generation in matlab and python

I am a newbie in python. I am trying to implement a genetic algorithm, which I have previously implemented in MatLab. I need to create several chromosomes (individuals) for my initial population. In MatLab, I used to do this with a rand function and It used to give me plenty of unique (or at least different enough) initial population. But Here in python, I have tried different methods of random but among 50 individuals I have only 3 to 4 unique chromosomes. Here this is my python code in __main__.py :
for i in range(pop_size):
popSpace.append(Chromosom(G=mG,M=mM))
sum_Q+=popSpace[i].Q
And my Chromosom class:
class Chromosom:
def __init__(self,G,M):
self.Q = 0
self.V = []
self.Chr = [0]*len(self.M)
self.M= M
self.G= G
self.randomChromosom()
self.updateQ_Qs()
def randomChromosom(self):
for m in range(len(self.M)):
if (random.random()< 0.5):
self.Chr[m] = 1
else:
self.Chr[m] = 0
I have also tried to get random bits but results still were the same. For example, I have used print(str(main.mRand.getrandbits(6)) to see the results in the console and realized that there were too much-duplicated numbers. Is there a method to create more unique random numbers? in MatLab, the same code with rand function was working well (of course pretty slowly). Having such a close initial population causes poor results in the next steps (also I should mention that the randomness problem causes similar mutations too). My problem is that there are so many similar chromosomes. For example, I have several 01111001s which is strange considering their probability of occurrence.

How to implement a cost minimization objective function correctly in Gurobi?

Given transport costs, per single unit of delivery, for a supermarket from three distribution centers to ten separate stores.
Note: Please look in the #data section of my code to see the data that I'm not allowed to post in photo form. ALSO note while my costs are a vector with 30 entries. Each distribution centre can only access 10 costs each. So DC1 costs = entries 1-10, DC2 costs = entries 11-20 etc..
I want to minimize the transport cost subject to each of the ten stores demand (in units of delivery).
This can be done by inspection. The the minimum cost being $150313. The problem being implementing the solution with Python and Gurobi and producing the same result.
What I've tried is a somewhat sloppy model of the problem in Gurobi so far. I'm not sure how to correctly index and iterate through my sets that are required to produce a result.
This is my main problem: The objective function I define to minimize transport costs is not correct as I produce a non-answer.
The code "runs" though. If I change to maximization I just get an unbounded problem. So I feel like I am definitely not calling the correct data/iterations through sets into play.
My solution so far is quite small, so I feel like I can format it into the question and comment along the way.
from gurobipy import *
#Sets
Distro = ["DC0","DC1","DC2"]
Stores = ["S0", "S1", "S2", "S3", "S4", "S5", "S6", "S7", "S8", "S9"]
D = range(len(Distro))
S = range(len(Stores))
Here I define my sets of distribution centres and set of stores. I am not sure where or how to exactly define the D and S iteration variables to get a correct answer.
#Data
Demand = [10,16,11,8,8,18,11,20,13,12]
Costs = [1992,2666,977,1761,2933,1387,2307,1814,706,1162,
2471,2023,3096,2103,712,2304,1440,2180,2925,2432,
1642,2058,1533,1102,1970,908,1372,1317,1341,776]
Just a block of my relevant data. I am not sure if my cost data should be 3 separate sets considering each distribution centre only has access to 10 costs and not 30. Or if there is a way to keep my costs as one set but make sure each centre can only access the costs relevant to itself I would not know.
m = Model("WonderMarket")
#Variables
X = {}
for d in D:
for s in S:
X[d,s] = m.addVar()
Declaring my objective variable. Again, I'm blindly iterating at this point to produce something that works. I've never programmed before. But I'm learning and putting as much thought into this question as possible.
#set objective
m.setObjective(quicksum(Costs[s] * X[d, s] * Demand[s] for d in D for s in S), GRB.MINIMIZE)
My objective function is attempting to multiply the cost of each delivery from a centre to a store, subject to a stores demand, then make that the smallest value possible. I do not have a non zero constraint yet. I will need one eventually?! But right now I have bigger fish to fry.
m.optimize()
I produce a 0 row, 30 column with 0 nonzero entries model that gives me a solution of 0. I need to set up my program so that I get the value that can be calculated easily by hand. I believe the issue is my general declaring of variables and low knowledge of iteration and general "what goes where" issues. A lot of thinking for just a study exercise!
Appreciate anyone who has read all the way through. Thank you for any tips or help in advance.

Your objective is 0 because you do not have defined any constraints. By default all variables have a lower bound of 0 and hence minizing an unconstrained problem puts all variables to this lower bound.
A few comments:
Unless you need the names for the distribution centers and stores, you could define them as follows:
D = 3
S = 10
Distro = range(D)
Stores = range(S)
You could define the costs as a 2-dimensional array, e.g.
Costs = [[1992,2666,977,1761,2933,1387,2307,1814,706,1162],
[2471,2023,3096,2103,712,2304,1440,2180,2925,2432],
[1642,2058,1533,1102,1970,908,1372,1317,1341,776]]
Then the cost of transportation from distribution center d to store s are stored in Costs[d][s].
You can add all variables at once and I assume you want them to be binary:
X = m.addVars(D, S, vtype=GRB.BINARY)
(or use Distro and Stores instead of D and S if you need to use the names).
Your definition of the objective function then becomes:
m.setObjective(quicksum(Costs[d][s] * X[d, s] * Demand[s] for d in Distro for s in Stores), GRB.MINIMIZE)
(This is all assuming that each store can only be delivered from one distribution center, but since your distribution centers do not have a maximal capacity this seems to be a fair assumption.)
You need constraints ensuring that the stores' demands are actually satisfied. For this it suffices to ensure that each store is being delivered from one distribution center, i.e., that for each s one X[d, s] is 1.
m.addConstrs(quicksum(X[d, s] for d in Distro) == 1 for s in Stores)
When I optimize this, I indeed get an optimal solution with value 150313.

Python optimizing a calculation with scipy.integrate.quad (takes very long)

I´m currently writing a program in python for calculating the total spectral emissivity (infrared waves) of any given material at different temperatures (200K - 500K), based on measurement data received by measuring the directional - hemispherical emissivity of the material at many different wavelengths using an IR - spectroscope. The calculation is done by integrating the measured intensity over all wavelenghts, using Plancks law as a weighing function (all of this doesn´t really matter for my question itself; i just want to explain the background so that the code is easier to understand). This is my code:
from scipy import integrate
from scipy.interpolate import CubicSpline
import numpy as np
import math as m
def planck_blackbody(lambda_, T): # wavelength, temperature
h = float(6.6260755e-34)
c = float(2.99792458e+8)
k = float(1.380658e-23)
try:
a = 2.0 * h * (c ** 2)
b = h * c / (lambda_ * k * T)
intensity = a / ((lambda_ ** 5) * (m.exp(b) - 1.0))
return float(intensity)
except OverflowError: # for lower temperatures
pass
def spectral_emissivity(emifilename, t, lambda_1, lambda_2):
results = []
with open(emifilename, 'r') as emifile:
emilines = emifile.readlines()
try:
w = [float(x.split('\t')[0].strip('\n')) * 1e-6 for x in emilines]
e = [float(x.split('\t')[1].strip('\n')) for x in emilines]
except ValueError:
pass
w = np.asarray(w) # wavelength
e = np.asarray(e) # measured intensity
def part_1(lambda_, T):
E = interp1d(w, e, fill_value = 'extrapolate')(lambda_)
return E * planck_blackbody(lambda_, T)
def E_complete(T):
E_complete_part_1 = integrate.quad(part_1, lambda_1, lambda_2, args=T, limit=50)
E_complete_part_2 = integrate.quad(planck_blackbody, lambda_1, lambda_2, args=T, limit=50)
return E_complete_part_1[0] / E_complete_part_2[0]
for T in t:
results.append([T, E_complete(T)])
with open("{}.plk".format(emifilename[:-4]), 'w') as resultfile:
for item in results:
resultfile.write("{}\t{}\n".format(item[0], item[1]))
t = np.arange(200, 501, 1)
spectral_emissivity('C:\test.dat', t, 1.4e-6, 35e-6)
The measured intensity is stored in a text file with two columns, the first being the wavelength of the infrared waves and the second being the directional-hemispherical emissivity of the measured material at that wavelength.
When i run this code, while it is producing the right results, i still encounter 2 problems:
I get an error message from scipy.integrate.quad:
IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator
on the subranges. Perhaps a special-purpose integrator should be used.
warnings.warn(msg, IntegrationWarning)
Can someone explain to me what exactly this means? I understand that integrate.quad is a numerical iteration method and that my functions somehow seem to require more than 50 iterations, but is there a way around this? i tried increasing the Limit, but even with 200 i still get this error message... it´s especially weird given that the integrands are pretty straightforward functions...
is closely connected to the first problem: this program takes ages (about 5 minutes!) to finish one single file, but i need to process many files every hour. cProfile reveals that 98% percent of this time is spent inside the integraion function. A MathCad program doing the exact same thing and producing the same outputs only takes some seconds to finish. Even though i spent the last week seatching for a solution, i simply don´t manage to speed this program up, and no one else on stackoverflow and elsewhere seems to have comparable time problems with integrate.quad.
So, finally, my question: is there any obvious way to optimize this code for it to run faster (except from compiling it into C+ or anything like that)? I tried reducing all floats to 6 digits (i can´t go any lower in accuracy) but that didn´t change anything.
Update: looking into it some more, i figured out that most of the time wasn´t actually consumed by the Integration itself, but by the CubicSpline operation that i used to interpolate my data. I tried out different methods and CubicSpline seemed to be the only working one for some reason (even though my data is monotonically increasing, i got errors from every other method i tried, saying that some values were either above or below the interpolation range). That is, until i found out about extrapolation with scipy.interpolate.interp1dand (fill_value = 'extrapolate'. Ths did the trick for me, enabling me to use the far less consuming interp1d method and effectively reducing the runtime of my program from 280 to 49 seconds (also added list comprehension for w and e). While this is a big improvement, i still wonder why my program takes nearly 1 Minute to calculate some integrals... and i still get the above mentioned IntegrationWarning. So any advice is highly appreciated!
(btw, since i am pretty new to python, I´m happy about any tips or critique i can get!)

Np.cross produces wrong results, search for a working alternative

I am rewriting an analysis code for Molecular Dynamics time series. Due to the huge amount of time steps (150 000 for each simulation run) which have to be analysed, it is very important that my code is as fast as possible.
The old code is very slow (actually it requires 300 to 500 times more time compared to my one) because it was written for the analysis of a few thousand PDB files and not a bunch full of different simulations (around 60), each one having 150 000 time steps. I know that C or Fortran would be the swiss army knife in this case but my experience with c is .....
Therefore I am trying to use as much as possible numpy/scipy routines for my python code. Because I've a license for the accelerated distribution of anaconda with the mkl, this is a really significant speedup.
Now I am facing a problem and I hope that I can explain it in a manner that you understand what i mean.
I have three arrays each one with a shape of (n, 3, 20). In the first row are all residuals of my peptide, commonly around 23 to 31. In the second row are coordinates in the order xyz and in the third row are some specific time steps.
Now I'am calculating the torsion for each residual at each time step. my code for the case of arrays with shape (n,3,1) its:
def fast_torsion(d1, d2, d3):
tt = dot(d1, np.cross(d2, d3))
tb = dot(d1, d1) * dot(d2, d2)
torsion = np.zeros([len(d1), 1])
for i in xrange(len(d1)):
if tb[i] != 0:
torsion[i] = tt[i]/tb[i]
return torsion
Now I tried to use the same code for the arrays with the extended third axis but the cross product function produces the wrong values compared to the original slow code, which is using a for loop.
I tried this code with my the big arrays it is around 10 to 20 times faster than a for loop solution and around 200 times fast than the old code.
What I am trying is that np.cross() only computes the cross product over the second (xyz) axis and iterates over the other two axis. In the case with the short third axis it works fine, but with the big arrays it only works for the first time step. I also tried the axis settings but I had no chance.
I can also use Cython or numba if this is the only solution for my problem.
P.S. Sorry for my english I hope you can understand everything.

np.crosshas axisa, axisb and axisc keyword arguments to select where in the input and output arguments are the vectors to be cross multiplied. I think you want to use:
np.cross(d2, d3, axisa=1, axisb=1, axisc=1)
If you don't include the axisc=1, the result of the multiplication will be at the end of the output array.
Also, you can avoid explicitly looping over your torsion array by doing:
torsion = np.zeros((len(d1), 1)
idx = (tb !=0)
torsion[idx] = tt[idx] / tb[idx]

Python: sliding window of variable width

I'm writing a program in Python that's processing some data generated during experiments, and it needs to estimate the slope of the data. I've written a piece of code that does this quite nicely, but it's horribly slow (and I'm not very patient). Let me explain how this code works:
1) It grabs a small piece of data of size dx (starting with 3 datapoints)
2) It evaluates whether the difference (i.e. |y(x+dx)-y(x-dx)| ) is larger than a certain minimum value (40x std. dev. of noise)
3) If the difference is large enough, it will calculate the slope using OLS regression. If the difference is too small, it will increase dx and redo the loop with this new dx
4) This continues for all the datapoints
[See updated code further down]
For a datasize of about 100k measurements, this takes about 40 minutes, whereas the rest of the program (it does more processing than just this bit) takes about 10 seconds. I am certain there is a much more efficient way of doing these operations, could you guys please help me out?
Thanks
EDIT:
Ok, so I've got the problem solved by using only binary searches, limiting the number of allowed steps by 200. I thank everyone for their input and I selected the answer that helped me most.
FINAL UPDATED CODE:
def slope(self, data, time):
(wave1, wave2) = wt.dwt(data, "db3")
std = 2*np.std(wave2)
e = std/0.05
de = 5*std
N = len(data)
slopes = np.ones(shape=(N,))
data2 = np.concatenate((-data[::-1]+2*data[0], data, -data[::-1]+2*data[N-1]))
time2 = np.concatenate((-time[::-1]+2*time[0], time, -time[::-1]+2*time[N-1]))
for n in xrange(N+1, 2*N):
left = N+1
right = 2*N
for i in xrange(200):
mid = int(0.5*(left+right))
diff = np.abs(data2[n-mid+N]-data2[n+mid-N])
if diff >= e:
if diff < e + de:
break
right = mid - 1
continue
left = mid + 1
leftlim = n - mid + N
rightlim = n + mid - N
y = data2[leftlim:rightlim:int(0.05*(rightlim-leftlim)+1)]
x = time2[leftlim:rightlim:int(0.05*(rightlim-leftlim)+1)]
xavg = np.average(x)
yavg = np.average(y)
xlen = len(x)
slopes[n-N] = (np.dot(x,y)-xavg*yavg*xlen)/(np.dot(x,x)-xavg*xavg*xlen)
return np.array(slopes)

Your comments suggest that you need to find a better method to estimate ik+1 given ik. No knowledge of values in data would yield to the naive algorithm:
At each iteration for n, leave i at previous value, and see if the abs(data[start]-data[end]) value is less than e. If it is, leave i at its previous value, and find your new one by incrementing it by 1 as you do now. If it is greater, or equal, do a binary search on i to find the appropriate value. You can possibly do a binary search forwards, but finding a good candidate upper limit without knowledge of data can prove to be difficult. This algorithm won't perform worse than your current estimation method.
If you know that data is kind of smooth (no sudden jumps, and hence a smooth plot for all i values) and monotonically increasing, you can replace the binary search with a search backwards by decrementing its value by 1 instead.

How to optimize this will depend on some properties of your data, but here are some ideas:
Have you tried profiling the code? Using one of the Python profilers can give you some useful information about what's taking the most time. Often, a piece of code you've just written will have one biggest bottleneck, and it's not always obvious which piece it is; profiling lets you figure that out and attack the main bottleneck first.
Do you know what typical values of i are? If you have some idea, you can speed things up by starting with i greater than 0 (as #vhallac noted), or by increasing i by larger amounts — if you often see big values for i, increase i by 2 or 3 at a time; if the distribution of is has a long tail, try doubling it each time; etc.
Do you need all the data when doing the least squares regression? If that function call is the bottleneck, you may be able to speed it up by using only some of the data in the range. Suppose, for instance, that at a particular point, you need i to be 200 to see a large enough (above-noise) change in the data. But you may not need all 400 points to get a good estimate of the slope — just using 10 or 20 points, evenly spaced in the start:end range, may be sufficient, and might speed up the code a lot.

I work with Python for similar analyses, and have a few suggestions to make. I didn't look at the details of your code, just to your problem statement:
1) It grabs a small piece of data of size dx (starting with 3
datapoints)
2) It evaluates whether the difference (i.e. |y(x+dx)-y(x-dx)| ) is
larger than a certain minimum value (40x std. dev. of noise)
3) If the difference is large enough, it will calculate the slope
using OLS regression. If the difference is too small, it will increase
dx and redo the loop with this new dx
4) This continues for all the datapoints
I think the more obvious reason for slow execution is the LOOPING nature of your code, when perhaps you could use the VECTORIZED (array-based operations) nature of Numpy.
For step 1, instead of taking pairs of points, you can perform directly `data[3:] - data[-3:] and get all the differences in a single array operation;
For step 2, you can use the result from array-based tests like numpy.argwhere(data > threshold) instead of testing every element inside some loop;
Step 3 sounds conceptually wrong to me. You say that if the difference is too small, it will increase dx. But if the difference is small, the resulting slope would be small because it IS actually small. Then, getting a small value is the right result, and artificially increasing dx to get a "better" result might not be what you want. Well, it might actually be what you want, but you should consider this. I would suggest that you calculate the slope for a fixed dx across the whole data, and then take the resulting array of slopes to select your regions of interest (for example, using data_slope[numpy.argwhere(data_slope > minimum_slope)].
Hope this helps!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.