Pyomo constraints and using pandas - python

I am using Pyomo to optimize a cashflow matching problem using bonds.
I also want to have a detailed constraint that does looks at the cashflows I am expecting to get from my portfolio versus fixed requirements and conduct a number of calculations on the differences:
Calculate difference (wanted minus expected to receive or "in - out" in the picture)
Calculate the accumulation of these differences to the last end point using accumulation factors (multiply difference with accumulation factors - which are stored as a model.AccumFactors)
Sum these year-on-year accumulation factors (cumsum(axis=1))
Find the minimum
[Excel description of process][1]
Now Panda commands don't work in this situation. Is there anything I can do to fix this? Alternative approaches?

Thanks gmavrom.
Trying to think a different formulation. The code below has Multipliers as the model variable and everything else as parameters.
Unfortunately the below doesn't work and just prints out strings of:
54993.219033692505*Multipliers[Bond1] + 63662.18895851663*Multipliers[Bond2] + 64451.10079031628*Multipliers[Bond3] + … etc
def Test1_Constraint(model, TimeIndex):
SumAccumulatedShortfall=0
for TimeCount in range(0,TimeIndex+1):
AccumulatedShortfall = (model.Liabilities[TimeCount] - \
sum(model.BondPayment[BondIndex, TimeCount] *model.Multipliers[BondIndex] for BondIndex in model.Bonds))* \
model.AccumulationFactor[TimeCount]
SumAccumulatedShortfall = SumAccumulatedShortfall + AccumulatedShortfall
print('SumAccum',SumAccumulatedShortfall)
return (SumAccumulatedShortfall/model.TotalLiabilityValue <= 0.03)

Related

PuLP Conditional Sum based on key of the loop

I am trying to use this conditional sum in Pulp's objective function. For the second lpSum, I am trying to calculate the costs of when we don't have enough chassis' to cover the demand and will need pool chassis' with a higher costs. Of course, I only want to calculate this when we don't have enough dedicated chassis'(dedicated_chassis_needed) to cover the demand(chassis_needed) for each day.
The problem is a cost minimizing one. The last "if" part doesn't seem to be working and the lpSum seems to be summing up every date's pool cost and ignoring the if condition, and it just sets the decision variable of dedicated_chassis_needed to 0(lower constraint) and the objective value is a negative number which should not be allowed.
prob += lpSum(dedicated_chassis_needed * dedicated_rate for date in chassis_needed.keys()) + \
lpSum(((chassis_needed[(date)] - dedicated_chassis_needed) * pool_rate_day) \
for date in chassis_needed.keys() if ((chassis_needed[(date)] - dedicated_chassis_needed) >= 0))
In general, in LP, you cannot use a conditional statement that is dependent on the value of a variable in any of the constraints or objective function because the value of the variable is unknown when the model is built before solving, so you will have to reformulate.
You don't have much information there about what the variables and constants are, so it isn't possible to give good suggestions. However, a well-designed objective function should be able to handle extra cost for excess demand without a condition as the model will select the cheaper items first.
For example, if:
demand[day 5] = 20
and
cheap_units[day 5] = 15 # $100 (availability)
and
reserve units = 100 # $150 (availability from some pool of reserves)
and you have some constraint to meet demand via both of those sources and an objective function like:
min(cost) s.t. cost[day] = cheap_units[day] * 100 + reserve_units * 150
it should work out fine...

Implement variational approach for budget closure with 2 constraints in python

I'm new to Python and am quite helpless with a problem I have to solve:
I have two budget equations, let's say a+b+c+d=Res1 and a+c+e+f=Res2, now every term has a specific standard deviation a_std, b_std,... and I want to distribute the budget residuals Res1 and Res2 onto the individual terms relative to their uncertainty (see eqution below), to get a_new+b_new+c_new+d_new=0 and a_new+c_new+e_new+f_new=0
Regarding only 1 budget equation I'm able to solve the problem and get the terms a_new, b_new, c_new and d_new. But how can I add the second constraint to also get e_new and f_new?
e.g. I calculate a_new = a + (a_std^2/(a_std+b_std+c_std))*Res1 , however this is only dependent of the first equation, but I want a to be modified that way to also satisfy the second equation..
I appreciate any help/any ideas on how to approach this problem.
Thanks in advance,
Sue
Edit:
What I have so far:
def var_close(a,a_std,b,b_std,c,c_std,d,d_std,e,e_std,f,f_std,g,g_std):
x=[a,b,c,d,e]
Res1=np.sum([x])
std_ges1=a_std*a_std+b_std*b_std+c_std*c_std+d_std*d_std+e_std*e_std
y=[a,c,f,g]
Res2=np.sum([y])
std_ges2=a_std*a_std+c_std*c_std+f_std*f_std+g_std*g_std
a_new=a-((a_std*a_std)/std_ges1)*Res1
b_new=b-((b_std*b_std)/std_ges1)*Res1
c_new=c-((c_std*c_std)/std_ges1)*Res1
d_new=d-((d_std*d_std)/std_ges1)*Res1
e_new=e-((e_std*e_std)/std_ges1)*Res1
a_new2=a-((a_std*a_std)/std_ges2)*Res2
c_new2=c-((c_std*c_std)/std_ges2)*Res2
f_new=f-((f_std*f_std)/std_ges2)*Res2
g_new=g-((g_std*g_std)/std_ges2)*Res2
return a_new,b_new,c_new,d_new,e_new,a_new2,c_new2,f_new,g_new
But like this e.g. a_new and a_new2 are slightly different, but I want them to be equal and the other terms modified correspondng to their uncertainty..

my code is giving me the wrong output sometimes, how to solve it?

I am trying to solve this problem: 'Your task is to construct a building which will be a pile of n cubes. The cube at the bottom will have a volume of n^3, the cube above will have the volume of (n-1)^3 and so on until the top which will have a volume of 1^3.
You are given the total volume m of the building. Being given m can you find the number n of cubes you will have to build?
The parameter of the function findNb (find_nb, find-nb, findNb) will be an integer m and you have to return the integer n such as n^3 + (n-1)^3 + ... + 1^3 = m if such a n exists or -1 if there is no such n.'
I tried to first create an arithmetic sequence then transform it into a sigma sum with the nth term of the arithmetic sequence, the get a formula which I can compare its value with m.
I used this code and work 70 - 80% fine, most of the calculations that it does are correct, but some don't.
import math
def find_nb(m):
n = 0
while n < 100000:
if (math.pow(((math.pow(n, 2))+n), 2)) == 4*m:
return n
break
n += 1
return -1
print(find_nb(4183059834009))
>>> output 2022, which is correct
print(find_nb(24723578342962))
>>> output -1, which is also correct
print(find_nb(4837083252765022010))
>>> real output -1, which is incorrect
>>> expected output 57323
As mentioned, this is a math problem, which is mainly what I am better at :).
Sorry for the in-line mathematical formula as I cannot do any math formula rendering (in SO).
I do not see any problem with your code and I believe your sample test case is wrong. However, I'll still give optimisation "tricks" below for your code to run quicker
Firstly as you know, sum of the cubes between 1^3 and n^3 is n^2(n+1)^2/4. Therefore we want to find integer solutions for the equation
n^2(n+1)^2/4 == m i.e. n^4+2n^3+n^2 - 4m=0
Running a loop for n from 1 (or in your case, 2021) to 100000 is inefficient. Firstly, if m is a large number (1e100+) the complexity of your code is O(n^0.25). Considering Python's runtime, you can run your code in time only if m is less than around 1e32.
To optimise your code, you have two approaches.
1) Use Binary Search. I will not get into the details here, but basically, you can halve the search range for a simple comparison. For the initial bounds you can use lower = 0 & upper = k. A better bound for k will be given below, but let's use k = m for now.
Complexity: O(log(k)) = O(log(m))
Feasible range for m: m < 10^(3e7)
2) Use the almighty Newton-Raphson!! Using the iteration formula x_(n+1) = x_n - f(x_n) / f'(x_n), where f'(x) can be calculated explicitly, and a reasonable initial guess, let's say k = m again, the complexity is (I believe) O(log(k)) + O(1) = O(log(m)).
Complexity: O(log(k)) = O(log(m))
Feasible range for m: m < 10^(3e7)
Finally, I'll give a better initial guess for k in the above methods, also given in Ian's answer to this question. Since n^4+2n^3+n^2 = O(n^4), we can actually take k ~ m^0.25 = (m^0.5)^0.5. To calculate this, We can take k = 2^(log(k)/4) where log is base 2. The log should be O(1), but I'm not sure for big numbers/dynamic size (int in Python). Not a theorist. Using this better guess and Newton-Raphson, since the guess is in a constant range from the result, the algorithm is nearly O(1). Again, check out the links for better understanding.
Finally
Since your goal is to find whether n exists such that the equation is "exactly satisfied", use Newton-Raphson and iterate until the next guess is less than 0.5 from the current guess. If your implementation is "floppy", you can also do a range +/- 10 from the guess to ensure that you find the solution.
I think this is a Math question rather than a programming question.
Firstly, I would advise you to start iterating from a function of your input m. Right now you are initialising your n value arbitrarily (though of course it might be a requirement of the question) but I think there are ways to optimise it. Maybe, just maybe you can iterate from the cube root, so if n reaches zero or if at any point the sum becomes smaller than m you can safely assume there is no possible building that can be built.
Secondly, the equation you derived from your summation doesn't seem to be correct. I substituted your expected n and input m into the condition in your if clause and they don't match. So either 1) your equation is wrong or 2) the expected output is wrong. I suggest that you relook at your derivation of the condition. Are you using the sum of cubes factorisation? There might be some edge cases that you neglected (maybe odd n) but my Math is rusty so I can't help much.
Of course, as mentioned, the break is unnecessary and will never be executed.

Augmented Dickey Fuller test in Python with statsmodels

I'm performing an ADF test on several(~500) time series to test stationarity. So I need a quantitative way of choosing the correct number of lags for each one of them. One possible approach is to use, say, 80% of my sample for the test and to get the parameters of the regression in it and compute the ssr (sum of squares regression) and search for the minimum. However, this may lead to over fitting and in order to avoid it, this regression can then be applied to the remaining 20% and calculate the ssr of this sub-sample. The number of lags that lead to the minimum value of this second ssr, should be the correct one.
The problem is that statsmodels documentation is less than incomplete (at least for a newbie like me!). For example, given the line
res = ts.adfuller(dUs, maxlag=max_lag_, autolag=None, regression='ct', store=True, regresults=True)
the regression coefficients are stored in res[3].resols.params but the order is unknown. I had to ask someone to run the test on one of my time series in R (which gives you the used formula and the corresponding coefficients like this
R-output).
Python order of parameters seems to be (for a 'ct' regression) lag 1, lag diff 1, lag diff 2, ...lag diff N, intercept, time trend. I, then, re-construct the fitted series with the following code:
xFit[0:max_lag_ + 1] = dUs[0:max_lag_ + 1]
for i in range (max_lag_ + 1,xFit.size):
xFit[i] = xFit[i-1] + res[3].resols.params[0] * xFit[i-1] + res[3].resols.params[res[3].resols.params.size - 2] + res[3].resols.params[res[3].resols.params.size - 1] * t[i]
for j in range(1,max_lag_ +1):
xFit[i] = xFit[i] + res[3].resols.params[j] * lag[i-1-j]
Note that the lag variable is constructed from my dUs variable like this
lag = dUs[1:]-dUs[:-1]
The thing is that the xFit series and res[3].resols.fittedvalues are different! I think that it might has something to do with my initialization of the first max_lag_ data points (in fact, note that the res[3].resols.fittedvalues is max_lag_ + 1 shorter than the original series): I chose them to be equal to the original series. But I can' t figure out what exactly is going on. The difference between xFit and res[3].resols.params is HUGE:
time-series-comparison. Note also that, increasing the lag number makes my fitting better up to some value, and then the series explodes. This does not happen with fittedvalues!
As a final test, I ran the ADF test on xFit; I understand this should lead to the res[3].resols.params I already got.
Given the line
res2 = ts.adfuller(xFit, maxlag=max_lag_, autolag=None, regression='ct', store=True, regresults=True)
the output of res2[3].resols.params is
[ -1.60231256e+00 4.23814175e-02 -4.15837300e-02 4.99642618e-02
-6.92483339e+02 3.89141878e+00]
while res[3].resols.params is
[ -1.29269094e+00 2.11857016e-02 -5.82679110e-02 -2.09614163e-02
-5.44413351e+02 2.69502722e+00]
I know that many of you would suggest to move to R but, a) I never used it (although I could learn) and b) getting software installed at work is not that easy and it could take me a lot of precious time.
Any ideas? any mistake I'm missing?
Thanks in advance,
C
I solved the issue (although I didn't had time to post it before!).
The thing is that R fits the difference series while Python fits the time series itself. That in combination with an error (xFit should be replaced with dUs in the reconstruction of the time series!) made everything weird as explained above.
The right code is
for i in range (max_lag_ + 1,xFit.size):
xFit[i] = res[3].resols.params[0] * dUs[i-1] + res[3].resols.params[res[3].resols.params.size - 1]
for j in range(1,max_lag_ +1):
xFit[i] = xFit[i] + res[3].resols.params[j] * lag[i-1-j]

Devising objective function for integer linear programming

I am working to devise a objective function for a integer linear programming model. The goal is to determine the copy number of two genes as well as if a gene conversion event has happened (where one copy is overwritten by the other, which looks like one was deleted but the net copy number has not changed).
The problem involves two data vectors, P_A and P_B. The vectors contain continuous values larger than zero that correspond to a measure of copy number made at each position. P_{A,i} is not necessarily the same spot across the gene as P_{B,i} is, because the positions are unique to each copy (and can be mapped to an absolute position in the genome).
Given this, my plan was to try and minimize the difference between my decision variables and the measured data across different genome windows, giving me different slices of the two data vectors that correspond to the same region.
Decision variables:
A_w = copy number of A in window in {0,1,2,3,4}
B_w = copy number of B in window in {0,1,2,3,4}
C_w = gene conversion in {-2,-1,0,1,2}
The goal then would be to minimize the difference between the left and right sides of the below equations:
A_w - C_w ~= mean(P_{A,W})
B_w + C_w ~= mean(P_{B,W})
Subject to a handful of constraints such as 2 <- A_w + B_w <= 4
But I am unsure how to formulate this into a function to minimize. I have two equations that are not really a function, and the decision variables have no coefficients.
I am also unsure of how to handle the negative values of C_w.
I also am unsure of how to bring the results back together; after I solve the LP in each window, I still need to merge it into one gene-wide call (and ideally identify which window(s) had non-zero values of C_w.
Create the LpProblem instance:
problem = LpProblem("Another LpProblem", LpMinimize)
Objective (per what you've vaguely described above):
problem += (mean(P_{A,W}) - (A_w - C_w)) + (mean(P_{B,W}) - (B_w + C_w))
This is all I could tell from your really rather vague question. You'll need to be much more specific with what you mean by terms like "bring the results back together", or "handle the negative values in C_w". Add in your current code snippets and the errors you're getting for more details.

Categories

Resources