I am trying to solve an LP problem with two variables with two constraints where one is inequality and the other one is equality constraint in Scipy.
To convert the inequality in the constraint I have added another variable in it called A.
Min(z) = 80x + 60y
Constraints:
0.2x + 0.32y <= 0.25
x + y = 1
x, y <= 0
I have changed the inequality constraints by the following equations by adding an extra variable A
0.2x + 0.32y + A = 0.25
Min(z) = 80x + 60y + 0A
X+ Y + 0A = 1
from scipy.optimize import linprog
import numpy as np
z = np.array([80, 60, 0])
C = np.array([
[0.2, 0.32, 1],
[1, 1, 0]
])
b = np.array([0.25, 1])
x1 = (0, None)
x2 = (0, None)
sol = linprog(-z, A_eq = C, b_eq = b, bounds = (x1, x2), method='simplex')
However, I am getting an error message
Invalid input for linprog with method = 'simplex'. Length of bounds
is inconsistent with the length of c
How can I fix this?
The problem is that you do not provide bounds for A. If you e.g. run
linprog(-z, A_eq = C, b_eq = b, bounds = (x1, x2, (0, None)), method='simplex')
you will obtain:
con: array([0., 0.])
fun: -80.0
message: 'Optimization terminated successfully.'
nit: 3
slack: array([], dtype=float64)
status: 0
success: True
x: array([1. , 0. , 0.05])
As you can see, the constraints are met:
0.2 * 1 + 0.32 * 0.0 + 0.05 = 0.25 # (0.2x + 0.32y + A = 0.25)
and also
1 + 0 + 0 = 1 # (X + Y + 0A = 1)
Related
Friends - Can someone help me formulate a LP problem using scipy in python as below, sorry for this naive ask, I am not able to get started at all with this. I could do this in excel, but finding it difficult in python (am new to this library and couldn't solve)
I would be very thankful if someone could help me out please:
This is the data:
This is problem formulated
import pulp as p
import numpy as np
arr = np.array([[0.1167, 2.40, 6.95], [0.1327, 3.44, 15.1], [0.1901, 3.76, 12.7]])
arr = arr.transpose()
# create a problem
Lp_prob = p.LpProblem('Problem', p.LpMinimize)
# create variables
x1 = p.LpVariable("x1", lowBound=0, upBound=np.inf)
x2 = p.LpVariable("x2", lowBound=0, upBound=np.inf)
x3 = p.LpVariable("x3", lowBound=0, upBound=np.inf)
# define problem
Lp_prob += 6.95 * x1 + 15.1 * x2 + 12.7 * x3
# define constraints
Lp_prob += x1 * 0.1167 + x2 * .1327 + x3 * 0.1901 >= 1.95
Lp_prob += x1 * 2.4 + x2 * 3.44 + x3 * 3.76 >= 0
Lp_prob += x1 >= x2
Lp_prob += x1 >= 0
Lp_prob += x2 >= 0
Lp_prob += x3 >= 0
# see the problem created
print(Lp_prob)
status = Lp_prob.solve()
PulpSolverError: Pulp: Error while executing C:\Users\FinanceProfessional\.conda\envs\spyder-env\Lib\site-packages\pulp\apis\..\solverdir\cbc\win\64\cbc.exe
Using scipy
from scipy.optimize import linprog
arr = np.array([[0.1167, 2.40, 6.95], [0.1327, 3.44, 15.1], [0.1901, 3.76, 12.7]])
arr = arr.transpose()
c = arr[-1]
A = [arr[0], arr[1], [1,1,0]]
b = [0.09, 0, 0]
x0_bounds = (0, None)
x1_bounds = (0, None)
x2_bounds = (0, None)
result = linprog(c, A_ub=A, b_ub=b, bounds=[x0_bounds, x1_bounds, x2_bounds], method='revised simplex')
print(result)
con: array([], dtype=float64)
fun: 0.0
message: 'Optimization terminated successfully.'
nit: 0
slack: array([0.09, 0. , 0. ])
status: 0
success: True
x: array([0., 0., 0.])
from scipy.optimize import minimize
a1, a2, a3 = 1167,1327,1907
b1,b2,b3 = 24000, 34400, 36000
c1,c2,c3 = 69500,15100,12700
x = [10000,10000,10000]
res = minimize(
lambda x: c1*x[0]+c2*x[1]+c3*x[2], #what we want to minimize
x,
constraints = (
{'type':'eq','fun': lambda x: x[0]*a1-x[1]*a2}, #1st subject
{'type':'ineq','fun': lambda x: a1*x[0]+a2*x[1]+a3*x[2]-7}, #2st subject
{'type':'ineq','fun': lambda x: b1*x[0]+b2*x[1]+b3*x[2]-0}, #3st subject
{'type':'eq','fun': lambda x: x[0]%5+x[1]%5+x[2]%5-0}, # x1 x2 x3 are multiple of 5
),
bounds = ((0,None),(0,None),(0,None)),
method='SLSQP',options={'disp': True,'maxiter' : 10000})
print(res)
here the output :
> Optimization terminated successfully (Exit mode 0)
> Current function value: 381000000.00006175
> Iterations: 2
> Function evaluations: 9
> Gradient evaluations: 2
> fun: 381000000.00006175
> jac: array([69500., 15100., 12700.]) message: 'Optimization terminated successfully'
> nfev: 9
> nit: 2
> njev: 2 status: 0 success: True
> x: array([ 0., 0., 30000.])
I had to multiplied all value by 10000 to avoid mode 8 as explained here
I hope this is what you needed. However you should try Or-Tools, a CP library powerful and easier than scipy.
edit: answer to comment
here is a link to a google collab as the original poster cannot run this code on his side.
I don't know if there is a name for this algorithm, but basically for a given y, I want to find the maximum x such that:
import numpy as np
np_array = np.random.rand(1000, 1)
np.sum(np_array[np_array > x] - x) >= y
Of course, a search algo would be to find the top value n_1, reduce it to the second largest value, n_2. Stop if n_1 - n-2 > y; else reduce both n_1 and n_2 to n_3, stop if (n_1 - n_3) + (n_2 - n_3) > y ...
But I feel there must be an algo to generate a sequence of {xs} that converges to its true value.
Let's use your example from the comments:
a = np.array([0.1, 0.3, 0.2, 0.6, 0.1, 0.4, 0.5, 0.2])
y = 0.5
First let's sort the data in descending order:
s = np.sort(a)[::-1] # 0.6, 0.5, 0.4, 0.3, 0.2, 0.2, 0.1,
Let's take a look at how the choice of x affects the possible values of the sum r = np.sum(np_array[np_array > x] - x):
If x ≥ 0.6, then r = 0.0 - x ⇒ -∞ < r ≤ -0.6
If 0.6 > x ≥ 0.5, then r = 0.6 - x ⇒ 0.0 < r ≤ 0.1 (where 0.1 = 0.6 - 0.5 × 1)
If 0.5 > x ≥ 0.4, then r = 0.6 - x + 0.5 - x = 1.1 - 2 * x ⇒ 0.1 < r ≤ 0.3 (where 0.3 = 1.1 - 0.4 × 2)
If 0.4 > x ≥ 0.3, then r = 0.6 - x + 0.5 - x + 0.4 - x = 1.5 - 3 * x ⇒ 0.3 < r ≤ 0.6 (where 0.6 = 1.5 - 0.3 × 3)
If 0.3 > x ≥ 0.2, then r = 0.6 - x + 0.5 - x + 0.4 - x + 0.3 - x = 1.8 - 4 * x ⇒ 0.6 < r ≤ 1.0 (where 1.0 = 1.8 - 0.2 × 4)
If 0.2 > x ≥ 0.1, then r = 0.6 - x + 0.5 - x + 0.4 - x + 0.3 - x + 0.2 - x + 0.2 - x = 2.2 - 6 * x ⇒ 1.0 < r ≤ 1.6 (where 1.6 = 2.2 - 0.1 × 6)
If 0.1 > x, then r = 0.6 - x + 0.5 - x + 0.4 - x + 0.3 - x + 0.2 - x + 0.2 - x + 0.1 - x + 0.1 - x = 2.4 - 8 * x ⇒ 1.6 < r ≤ ∞
The range of r is continuous except for the portion a[0] < r ≤ 0.0. Duplicate elements affect the range of available r values for each value in a, but otherwise are nothing special. We can remove, but also account for the duplicates by using np.unique instead of np.sort:
s, t = np.unique(a, return_counts=True)
s, t = s[::-1], t[::-1]
w = np.cumsum(t)
If your data can reasonably be expected not to contain duplicates, then use the sorted s shown in the beginning, and set t = np.ones(s.size, dtype=int) and therefore w = np.arange(s.size) + 1.
For s[i] > x ≥ s[i + 1], the bounds of r are given by c[i] - w[i] * s[i] < r ≤ c[i] - w[i] * s[i + 1], where
c = np.cumsum(s * t) # You can use just `np.cumsum(s)` if no duplicates
So finding where y ends up is a matter of placing it between the correct bounds. This can be done with a binary search, e.g., np.searchsorted:
# Left bound. Sum is strictly greater than this
bounds = c - w * s
i = np.searchsorted(bounds[1:], y, 'right')
The first element of bounds is always 0.0, and the resulting index i will point to the upper bound. By truncating off the first element, we shift the result to the lower bound, and ignore the zero.
The solution is found by solving for the location of x in the selected bin:
y = c[i] - w[i] * x
So you have:
x = (c[i] - y) / w[i]
You can write a function:
def dm(a, y, duplicates=False):
if duplicates:
s, t = np.unique(a, return_counts=True)
s, t = s[::-1], t[::-1]
w = np.cumsum(t)
c = np.cumsum(s * t)
i = np.searchsorted((c - w * s)[1:], y, 'right')
x = (c[i] - y) / w[i]
else:
s = np.sort(a)[::-1]
c = np.cumsum(s)
i = np.searchsorted((c - s)[1:], y, 'right')
x = (c[i] - y) / (i + 1)
return x
This does not handle the case where y < 0, but it does allow you to enter many y values simultaneously, since searchsorted is pretty well vectorized.
Here is a usage sample:
>>> dm(a, 0.5, True)
Out[247]: 0.3333333333333333
>>> dm(a, 0.6, True)
0.3
>>> dm(a, [0.1, 0.2, 0.3, 0.4, 0.5], True)
array([0.5 , 0.45 , 0.4 , 0.36666667, 0.33333333])
As for whether this algorithm has a name: I am not aware of any. Since I wrote this, I feel that "discrete madness" is an appropriate name. Slips off the tongue nicely too: "Ah yes, I computed the threshold using discrete madness".
This is an answer to the original question, where we find the maximum x s.t. np.sum(np_array[np_array > x]) >= y:
You can accomplish this with sorting and cumulative sum:
s = np.sort(np_array)[::-1]
c = np.cumsum(s)
i = np.argmax(c > y)
result = s[i]
s is the candidates for x in descending order. Comparing the cumulative sum c to y tells you exactly where the sum will exceed y. np.argmax returns the index of the first place that happens. The result is that index extracted from s.
This computation in numpy is slower than it needs to be because we can short circuit the sum immediately without computing a separate mask. The complexity is the same, however. You could speed up the following with numba or cython:
s = np.sort(np_array)[::-1]
c = 0
for i in range(len(s)):
c += s[i]
if c > y:
break
result = s[i]
MatchId ExpectedGoals_Team1 ExpectedGoals_Team2 Timestamp Stages Home Away
0 698085 0.8585339288573895 1.4819072820614578 2016-08-13 11:30:00 0 [92, 112] [94]
1 698086 1.097064295289673 1.0923520385902274 2016-09-12 14:00:00 0 [] [164]
2 698087 1.2752442136224664 0.8687263006179976 2016-11-25 14:00:00 1 [90] [147]
3 698088 1.0571269856980154 1.4323522262211752 2016-02-16 14:00:00 2 [10, 66, 101] [50, 118]
4 698089 1.2680212913301165 0.918961072480616 2016-05-10 14:00:00 2 [21] [134, 167]
Here is the function that needs to be updating the outcomes based on the categorized column 'Stages'.
x1 = np.array([1, 0, 0])
x2 = np.array([0, 1, 0])
x3 = np.array([0, 0, 1])
total_timeslot = 196
m=1
def squared_diff(row):
ssd = []
Home = row.Home
Away = row.Away
y = np.array([1 - (row.ExpectedGoals_Team1*m + row.ExpectedGoals_Team2*m), row.ExpectedGoals_Team1*m, row.ExpectedGoals_Team2*m])
for k in range(total_timeslot):
if k in Home:
ssd.append(sum((x2 - y) ** 2))
elif k in Away:
ssd.append(sum((x3 - y) ** 2))
else:
ssd.append(sum((x1 - y) ** 2))
return sum(ssd)
sum(df.apply(squared_diff, axis=1))
For m=1, Out[400]: 7636.305551658377
By assigning an arbitrary value of m for each category in Stages I want to test a cost function. Let m1 = 2, m2 = 3.
Here is how I attempted.
def stages(row):
Stages = row.Stages
if Stages == 0:
return np.array([1 - (row.ExpectedGoals_Team1*m + row.ExpectedGoals_Team2*m), row.ExpectedGoals_Team1*m, row.ExpectedGoals_Team2*m])
elif Stages == 1:
return np.array([1 - (row.ExpectedGoals_Team1*m1 + row.ExpectedGoals_Team2*m1), row.ExpectedGoals_Team1*m1, row.ExpectedGoals_Team2*m1])
else:
return np.array([1 - (row.ExpectedGoals_Team1*m2 + row.ExpectedGoals_Team2*m2), row.ExpectedGoals_Team1*m2, row.ExpectedGoals_Team2*m2])
df.apply(squared_diff, Stages, axis=1)
TypeError: apply() got multiple values for argument 'axis'
df.apply(squared_diff, Stages, axis=1) got error because the second parameter is for axis so it thought axis=Stages, but then the third parameter is again axis=1.
To address the problem, you can first store desired m into a separate column
df['m'] = df.Stages.apply(lambda x: 1 if x == 0 else 2 if x == 1 else 3)
Then replace this line in your squared_diff function
y = np.array([1 - (row.ExpectedGoals_Team1*m + row.ExpectedGoals_Team2*m), row.ExpectedGoals_Team1*m, row.ExpectedGoals_Team2*m])
with
y = np.array([1 - (row.ExpectedGoals_Team1*row.m + row.ExpectedGoals_Team2*row.m), row.ExpectedGoals_Team1*row.m, row.ExpectedGoals_Team2*row.m])
I have the following code:
def constraint(params):
if abs(params[0] - 15) < 2 and abs(params[1] + 10) < 2:
return -1
else:
return 0
def f(params):
x, z = params
if abs(x - 15) < 2 and abs(z + 10) < 2:
return -9999999
return (x - 15) ** 2 + (z + 10) ** 2 * numpy.sqrt(numpy.abs(numpy.sin(x)))
# Last: 15.00024144, -9.99939634
result = optimize.minimize(f, (-15, -15),
bounds=((-15.01, 15.01,), (-15.01, 15.01,),),
method="SLSQP",
options={'maxiter': 1024 * 1024},
jac=False,
constraints={
'type': 'ineq',
'fun': constraint,
})
print(result)
print(f(result.x))
And it gives the following result:
fun: -9999999.0
jac: array([0., 0.])
message: 'Optimization terminated successfully.'
nfev: 12
nit: 7
njev: 3
status: 0
success: True
x: array([ 15.01 , -11.60831378])
-9999999
The given values [ 15.01, -11.60831378] should be dropped by the constraint (and they were: if I add more verbose logging, I see that constraint function returns -1, but scipy ignores it. Why?
I'm pretty far from data science and maths, so I'm sorry for stupid mistakes if they are there.
To help the algorithm find the right direction, you need to separate your constraints:
def f(params):
print(params)
x, z = params
if abs(x - 15) < 2 and abs(z + 10) < 2:
return -9999999
return (x - 15) ** 2 + (z + 10) ** 2 * numpy.sqrt(numpy.abs(numpy.sin(x)))
# Last: 15.00024144, -9.99939634
result = optimize.minimize(f, (-15, -15),
bounds=((-15.01, 15.01,), (-15.01, 15.01,),),
method="SLSQP",
options={'disp':True, 'maxiter': 1024 * 1024},
jac=False,
constraints=({
'type': 'ineq',
'fun': lambda params : abs(params[0] - 15) -2,
},
{
'type': 'ineq',
'fun': lambda params : abs(params[1] + 10) -2,
},)
)
print(result)
print(f(result.x))
Gives:
Optimization terminated successfully. (Exit mode 0)
Current function value: 6.5928117149596535
Iterations: 6
Function evaluations: 24
Gradient evaluations: 6
fun: 6.5928117149596535
jac: array([-1.2001152, 2.5928117])
message: 'Optimization terminated successfully.'
nfev: 24
nit: 6
njev: 6
status: 0
success: True
x: array([13., -8.])
[13. -8.]
6.5928117149596535
Bingo!
I'm trying to minimize a dot product of 2 vectors but it doesn't work and I have no idea why. Can someone please help me?
I have a matrix c of this form:
c = [[c11, c12, c13, c14, c15],
[c21, c22, c23, c24, c25]]
I want to get a matrix p of this form:
p = [[p11, p12, p13, p14, p15],
[p21, p22, p23, p24, p25]]
I want to maximize this value :
c11*p11 + c12*p12 +c13*p13 + c14*p14 + c15*p15 + c21*p21 + c22*p22 +c23*p23 + c24*p24 + c25*p25
To get that I convert the c and p to 1-D vector and do the dot product so that my function to maximize is:
f(p) = c.dot(p)
The constraints are:
c11 + c12 + c13 + c14 + c15 = 1
c21 + c22 + c23 + c24 + c25 = 1
every element in p must be between 0.01 and 0.99.
I have tried scipy.optimize.linprog and it works:
from scipy.optimize import linprog
c = np.array([0. , 0. , 0. , 0. , 0. , 0. , 20094.21019108, 4624.08079143, 6625.51724138, 3834.81081081])
A_eq = np.array([[1,1,1,1,1,0,0,0,0,0],
[0,0,0,0,0,1,1,1,1,1]])
b_eq = np.array([1, 1])
res = linprog(-c, A_eq=A_eq, b_eq=b_eq, bounds=(0.01, 0.99))
res
Out[561]:
fun: -19441.285871873002
message: 'Optimization terminated successfully.'
nit: 13
slack: array([0.03, 0.98, 0.98, 0.98, 0.98, 0.98, 0.03, 0.98, 0.98, 0.98, 0. ,
0. , 0.95, 0. , 0. , 0. , 0. , 0. , 0. , 0. ])
status: 0
success: True
x: array([0.96, 0.01, 0.01, 0.01, 0.01, 0.01, 0.96, 0.01, 0.01, 0.0
But I'm trying to use scipy.optimize.minimize with SLSQP instead and that's where I get this 'Singular matrix C in LSQ subproblem' . Here is what I've done:
from scipy.optimize import minimize
def build_objective(ck, sign = -1.00):
"""
Builds the objective fuction for matrix ck
"""
# Here I turn my c matrix to a 1-D matrix
ck = np.concatenate(ck)
def objective(P):
return sign*(ck.dot(P))
return objective
def build_constraint_rows(ck):
"""
Builds the constraint functions that specify that the sum of the proportions for
each bin equals 1
"""
ncol = ck.shape[1]
nrow = ck.shape[0]
constrain_dict = []
for i in range(nrow):
vector = np.zeros((nrow,ncol))
vector[i, :] = 1
vector = np.concatenate(vector)
def row_constrain(P):
return 1 - vector.dot(P)
constrain_dict.append({'type': 'eq', 'fun': row_constrain})
return constrain_dict
# Matrix: Notice that this is not in vector form yet
c = np.array([[0. , 0. , 0. , 0., 0.],
[0. , 20094.21019108, 4624.08079143, 6625.51724138, 3834.81081081]])
# I need some initial p matrix for the function 'minimize'. I look for the value of the row that is the highest and assign it a proportion p of 0.96 and the rest 0.01 so the sum in 1 per row
P_initial = np.ones(c.shape)*0.01
nrow = test.shape[0]
for i in range(nrow):
index= np.where(c[i,] == np.max(c[i,]))[0]
if index.shape[0] > 1:
index = int(np.random.choice(index, size = 1))
else:
index = int(index)
P_initial[i,index] = 0.96
# I turn the P_initial to vector form
P_initial = np.concatenate(P_initial)
# These are the bounds of each p value
b = (0.01,0.99)
bnds = (b,)*c.size
# I then use my previous functions
objective_fun = build_objective(c)
cons = build_constraint_rows(c)
res = minimize(objective_fun,P_initial,method='SLSQP',\
bounds=bnds,constraints=cons)
This is my final result:
res
Out[546]:
fun: -19434.501741138763
jac: array([0. , 0.,0. , 0. ,0. , 0., -20094.21020508, -4624.08056641, -6625.51708984, -3834.81079102])
message: 'Singular matrix C in LSQ subproblem'
nfev: 24
nit: 2
njev: 2
status: 6
success: False
x: array([0.96 , 0.01 , 0.01 , 0.01 , 0.01 ,
0.01020202, 0.95962502, 0.01006926, 0.01001178, 0.01009192])
Please help me understand what I'm doing wrong.
Thank you in advanced,
Karol