Implement variational approach for budget closure with 2 constraints in python

Implement variational approach for budget closure with 2 constraints in python - python

I'm new to Python and am quite helpless with a problem I have to solve:
I have two budget equations, let's say a+b+c+d=Res1 and a+c+e+f=Res2, now every term has a specific standard deviation a_std, b_std,... and I want to distribute the budget residuals Res1 and Res2 onto the individual terms relative to their uncertainty (see eqution below), to get a_new+b_new+c_new+d_new=0 and a_new+c_new+e_new+f_new=0
Regarding only 1 budget equation I'm able to solve the problem and get the terms a_new, b_new, c_new and d_new. But how can I add the second constraint to also get e_new and f_new?
e.g. I calculate a_new = a + (a_std^2/(a_std+b_std+c_std))*Res1 , however this is only dependent of the first equation, but I want a to be modified that way to also satisfy the second equation..
I appreciate any help/any ideas on how to approach this problem.
Thanks in advance,
Sue
Edit:
What I have so far:
def var_close(a,a_std,b,b_std,c,c_std,d,d_std,e,e_std,f,f_std,g,g_std):
x=[a,b,c,d,e]
Res1=np.sum([x])
std_ges1=a_std*a_std+b_std*b_std+c_std*c_std+d_std*d_std+e_std*e_std
y=[a,c,f,g]
Res2=np.sum([y])
std_ges2=a_std*a_std+c_std*c_std+f_std*f_std+g_std*g_std
a_new=a-((a_std*a_std)/std_ges1)*Res1
b_new=b-((b_std*b_std)/std_ges1)*Res1
c_new=c-((c_std*c_std)/std_ges1)*Res1
d_new=d-((d_std*d_std)/std_ges1)*Res1
e_new=e-((e_std*e_std)/std_ges1)*Res1
a_new2=a-((a_std*a_std)/std_ges2)*Res2
c_new2=c-((c_std*c_std)/std_ges2)*Res2
f_new=f-((f_std*f_std)/std_ges2)*Res2
g_new=g-((g_std*g_std)/std_ges2)*Res2
return a_new,b_new,c_new,d_new,e_new,a_new2,c_new2,f_new,g_new
But like this e.g. a_new and a_new2 are slightly different, but I want them to be equal and the other terms modified correspondng to their uncertainty..

Related

How to implement a cost minimization objective function correctly in Gurobi?

Given transport costs, per single unit of delivery, for a supermarket from three distribution centers to ten separate stores.
Note: Please look in the #data section of my code to see the data that I'm not allowed to post in photo form. ALSO note while my costs are a vector with 30 entries. Each distribution centre can only access 10 costs each. So DC1 costs = entries 1-10, DC2 costs = entries 11-20 etc..
I want to minimize the transport cost subject to each of the ten stores demand (in units of delivery).
This can be done by inspection. The the minimum cost being $150313. The problem being implementing the solution with Python and Gurobi and producing the same result.
What I've tried is a somewhat sloppy model of the problem in Gurobi so far. I'm not sure how to correctly index and iterate through my sets that are required to produce a result.
This is my main problem: The objective function I define to minimize transport costs is not correct as I produce a non-answer.
The code "runs" though. If I change to maximization I just get an unbounded problem. So I feel like I am definitely not calling the correct data/iterations through sets into play.
My solution so far is quite small, so I feel like I can format it into the question and comment along the way.
from gurobipy import *
#Sets
Distro = ["DC0","DC1","DC2"]
Stores = ["S0", "S1", "S2", "S3", "S4", "S5", "S6", "S7", "S8", "S9"]
D = range(len(Distro))
S = range(len(Stores))
Here I define my sets of distribution centres and set of stores. I am not sure where or how to exactly define the D and S iteration variables to get a correct answer.
#Data
Demand = [10,16,11,8,8,18,11,20,13,12]
Costs = [1992,2666,977,1761,2933,1387,2307,1814,706,1162,
2471,2023,3096,2103,712,2304,1440,2180,2925,2432,
1642,2058,1533,1102,1970,908,1372,1317,1341,776]
Just a block of my relevant data. I am not sure if my cost data should be 3 separate sets considering each distribution centre only has access to 10 costs and not 30. Or if there is a way to keep my costs as one set but make sure each centre can only access the costs relevant to itself I would not know.
m = Model("WonderMarket")
#Variables
X = {}
for d in D:
for s in S:
X[d,s] = m.addVar()
Declaring my objective variable. Again, I'm blindly iterating at this point to produce something that works. I've never programmed before. But I'm learning and putting as much thought into this question as possible.
#set objective
m.setObjective(quicksum(Costs[s] * X[d, s] * Demand[s] for d in D for s in S), GRB.MINIMIZE)
My objective function is attempting to multiply the cost of each delivery from a centre to a store, subject to a stores demand, then make that the smallest value possible. I do not have a non zero constraint yet. I will need one eventually?! But right now I have bigger fish to fry.
m.optimize()
I produce a 0 row, 30 column with 0 nonzero entries model that gives me a solution of 0. I need to set up my program so that I get the value that can be calculated easily by hand. I believe the issue is my general declaring of variables and low knowledge of iteration and general "what goes where" issues. A lot of thinking for just a study exercise!
Appreciate anyone who has read all the way through. Thank you for any tips or help in advance.

Your objective is 0 because you do not have defined any constraints. By default all variables have a lower bound of 0 and hence minizing an unconstrained problem puts all variables to this lower bound.
A few comments:
Unless you need the names for the distribution centers and stores, you could define them as follows:
D = 3
S = 10
Distro = range(D)
Stores = range(S)
You could define the costs as a 2-dimensional array, e.g.
Costs = [[1992,2666,977,1761,2933,1387,2307,1814,706,1162],
[2471,2023,3096,2103,712,2304,1440,2180,2925,2432],
[1642,2058,1533,1102,1970,908,1372,1317,1341,776]]
Then the cost of transportation from distribution center d to store s are stored in Costs[d][s].
You can add all variables at once and I assume you want them to be binary:
X = m.addVars(D, S, vtype=GRB.BINARY)
(or use Distro and Stores instead of D and S if you need to use the names).
Your definition of the objective function then becomes:
m.setObjective(quicksum(Costs[d][s] * X[d, s] * Demand[s] for d in Distro for s in Stores), GRB.MINIMIZE)
(This is all assuming that each store can only be delivered from one distribution center, but since your distribution centers do not have a maximal capacity this seems to be a fair assumption.)
You need constraints ensuring that the stores' demands are actually satisfied. For this it suffices to ensure that each store is being delivered from one distribution center, i.e., that for each s one X[d, s] is 1.
m.addConstrs(quicksum(X[d, s] for d in Distro) == 1 for s in Stores)
When I optimize this, I indeed get an optimal solution with value 150313.

Optimization of multiple functions

I have 3 functions which consist of 6 variables (p1,p2,p3,p4,p5,p6). The value of each function is equal to x (say):
f1=
sgn(2-p1)*sqrt(abs(2-p1))+sgn(2-p2)*sqrt(abs(2-p2))+sgn(2-p3)*sqrt(abs(2-p3));
f2= sgn(p4-2)*sqrt(abs(p4-2))+sgn(p5-2)*sqrt(abs(p5-2))+sgn(p6-2)*sqrt(abs(p6-2));
f3=
sgn(p1-p4)*sqrt(abs(p1-p4))+sgn(p2-p5)*sqrt(abs(p2-p5))+sgn(p3-p6)*sqrt(abs(p3-p6));
I want to find the combination of values of p1,p2,p3,p4,p5 and p6 for which x is maximum. Constraints are:
0 <= p1,p2,p3,p4,p5,p6 <= 4
Simply varying every variable from 0 to 4 taking small steps is not a good solution. Can someone tell me an efficient method to optimise the solution (preferably in python).

This is a non-linear optimization problem without an obvious close form solution. Better ask this question in another forum.

Find subset with similar mean as full set

I have 50 lists, each one filled with 0s ans 1s. I know the overall proportion of 1s when you consider all the 50 lists pooled together. I want to find the 10 lists that pooled together best resemble the overall proportion of 1s.
The function I want to minimise is abs(mean(pooled subset) - mean(pooled full set))
For those who know pandas:
In pandas terms, I have a dataframe as follows
and so on, with a total of 50 labels, each one with a number of values ranging between 100 and 1000.
I want to find the subset of 10 labels that minimises d, where d
d = abs(df.loc[df.label.isin(subset), 'Value'].mean() - df.Value.mean())
I tried to apply dynamic programming solutions to the knapsack problem, but the issue is that the contribution of each list (label) to the final sample mean changes depending on which other lists you will include afterwards (because they will increase the sample size in unpredictable ways). It's like having knapsack problem where every new item you pick changes the value of the items you previously picked. Tricky.
Is there a better algorithm to solve this problem?

There is a way, somewhat cumbersome, to formulate this problem as a MIP (Mixed Integer Programming) problem.
We need the following data:
mu : mean of all data
mu(i) : mean of each subset i
n(i) : number of elements in each subset
N : number of subsets we need to select
And we need some binary decision variables
delta(i) = 1 if subset i is selected and 0 otherwise
A formal statement of the optimization problem can look like:
min | mu - sum(i, mu(i)*n(i)*delta(i)) / sum(i, n(i)*delta(i)) |
subject to
sum(i, delta(i)) = N
delta(i) in {0,1}
Here sum(i, mu(i)*n(i)*delta(i)) is the total value of the selected items and sum(i, n(i)*delta(i)) is the total number of selected items.
The objective is clearly nonlinear (we have an absolute value and a division). This is sometimes called an MINLP problem (MINLP for Mixed Integer Nonlinear Programming). Although MINLP solvers are readily available, we actually can do better. Using some gymnastics we can reformulate this problem into a linear problem (by adding some extra variables and extra inequality constraints). The full details are here. The resulting MIP model can be solved with any MIP solver.
Interestingly we don't need the data values in the model, just n(i),mu(i) for each subset.

matching a multi-variable function with 2 bounded unknown variables to a value with graphical representation

My question is about matching this function: N = 0.13*(s^a), where s and a are variables, to a value. I am trying to find all values of s and a that satisfy N = 100 and N = 10,000,000. S is bounded from 0 to 101 and a is bounded from 3 to 8. And I would like to visualize the results possibly by graphing it with the axes being s and a, like a 2D plot. The algorithms I found that were similar to what I need seemed to all want to find the minimum or maximum of a function instead of matching it to a value. I have hit a wall and I don't know if my coding skills are high enough to write my own algorithm. Any help would be greatly appreciated! Thanks in advance!

This can easily be converted to a minimization problem. Simply minimize this function:
abs(0.13 * s ^ a - 100)
Replace the 100 with 10,000,000 for the second part. It will take some modification to find all values of s and a, rather than just one pair. This could be done by fixing an s value and minimizing over a, then repeating for different s values.

Devising objective function for integer linear programming

I am working to devise a objective function for a integer linear programming model. The goal is to determine the copy number of two genes as well as if a gene conversion event has happened (where one copy is overwritten by the other, which looks like one was deleted but the net copy number has not changed).
The problem involves two data vectors, P_A and P_B. The vectors contain continuous values larger than zero that correspond to a measure of copy number made at each position. P_{A,i} is not necessarily the same spot across the gene as P_{B,i} is, because the positions are unique to each copy (and can be mapped to an absolute position in the genome).
Given this, my plan was to try and minimize the difference between my decision variables and the measured data across different genome windows, giving me different slices of the two data vectors that correspond to the same region.
Decision variables:
A_w = copy number of A in window in {0,1,2,3,4}
B_w = copy number of B in window in {0,1,2,3,4}
C_w = gene conversion in {-2,-1,0,1,2}
The goal then would be to minimize the difference between the left and right sides of the below equations:
A_w - C_w ~= mean(P_{A,W})
B_w + C_w ~= mean(P_{B,W})
Subject to a handful of constraints such as 2 <- A_w + B_w <= 4
But I am unsure how to formulate this into a function to minimize. I have two equations that are not really a function, and the decision variables have no coefficients.
I am also unsure of how to handle the negative values of C_w.
I also am unsure of how to bring the results back together; after I solve the LP in each window, I still need to merge it into one gene-wide call (and ideally identify which window(s) had non-zero values of C_w.

Create the LpProblem instance:
problem = LpProblem("Another LpProblem", LpMinimize)
Objective (per what you've vaguely described above):
problem += (mean(P_{A,W}) - (A_w - C_w)) + (mean(P_{B,W}) - (B_w + C_w))
This is all I could tell from your really rather vague question. You'll need to be much more specific with what you mean by terms like "bring the results back together", or "handle the negative values in C_w". Add in your current code snippets and the errors you're getting for more details.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.