Finding equal variables in non solvable multi-variables linear equations - python

I am trying to find an algorithm to solve the current problem. I have multiple unknown variables (F1,F2,F3, ... Fx) and (R1,R2,R3 ... Rx) and multiple equations like this:
F1 + R1 = a
F1 + R2 = a
F2 + R1 = b
F3 + R2 = b
F2 + R3 = c
F3 + R4 = c
where a, b and c are known numbers. I am trying to find all equal variables in such equations. For example in the above equation I could see that F2 and F3 are equal and R3 and R4 are equal.
First equations tells us that R1 and R2 are equal, and second tells us that F2 and F3 are equal, while the third tells us that R3 and R4 are equal.
For a more complex scenario, is there any known algorithm that can find all equal (F and R) variables????
(I will edit the question if it is not clear enough)
Thanks

For the general situation, row echelon is probably the way to go. If every equation has only two variables, then you can consider each variable to be in a partition. Every time two variables appear in an equation together, their partitions are joined. So to begin with each variable is in its own partition. After the first equation, there a partition that contains F1 and R1. After the second equation, that partition is replaced by a partition that contain F1, R1 and R2. You should have the variables in some sort of order, and when two partitions are joined, put all the variables except the first one in terms of the first one (it doesn't really matter how you order the variables, you just need some way of deciding which is the "first"). So for instance, after the first equation, you have R1 = a-F1. After the second equation, you have R1 = a-F1 and R2 = a-F1. Each variable can be represented by two numbers: some number times the first variable in their partition, plus a constant. Then at the end, you go through each partition, and look for variables that have the same two numbers representing them.

Here's a hint: you have defined a system of linear equations with 7 variables and 6 equations. Here's a crude matrix/vector notation:
1 0 0 1 0 0 0 F1 a
1 0 0 0 1 0 0 F2 a
0 1 0 1 0 0 0 * F3 = b
0 0 1 0 1 0 0 R1 b
0 1 0 0 0 1 0 R2 c
0 0 1 0 0 0 1 R3 c
R4
If you do the Gaussian elimination manually, you can see that e.g. first row minus the second row results in
(0 0 0 1 -1 0 0) * (F1 F2 F3 R1 R2 R3 R4)^T = a - a
R1 - R2 = 0
R1 = R2
Which implies that R1 and R2 are what you call equivalent. There are many different methods to solve the system or interpret the results. Maybe you will find this thread useful: Is there a standard solution for Gauss elimination in Python?

Related

Ternary function for Hamming distance, where '2' is wildcard

Let say I have the following array of vectors x, where the possible values are 0,1,2 :
import numpy as np
x = np.random.randint(0,3,(10,5), dtype=np.int8)
I want to do similarity match for all vectors with Hamming Distance zero or one, where the rules for matching are :
1. 0 == 0 and 1 == 1 i.e. hamming distance is 0
2. 2 match both 1 and 0 i.e. hamming distance is 0
3. otherwise Hamming distance is 1
i.e. find some arithmetic operation that will return:
0 x 0 = 0
1 x 1 = 0
0 x 1 = 1
1 x 0 = 1
0 x 2 = 0
1 x 2 = 0
And my output should be the Hamming distance between each vector (row of) x, and arbitary vector z:
z = np.random.randint(0,2,5)
np.sum(np.add(x,z) == 1, axis=1)
int(x+y == 1)
Is there something in this question I'm missing???
Wouldn't this do the trick?
((x!=y) ^ (x==2) ^ (y==2)).sum() <=1
Or if you want to allow two on either and both sides
((x!=y) ^ (x==2) | (y==2)).sum() <=1

Constrain logic in Linear programming

I'm trying to build a linear optimization model for a production unit. I have Decision variable (binary variable) X(i)(j) where I is hour of J day. The constrain I need to introduce is a limitation on downtime (minimum time period the production unit needs to be turned off between two starts).
For example:
Hours: 1 2 3 4 5 6 7 8 9 10 11 12
On/off: 0 1 0 1 1 0 1 1 1 0 0 1
I cannot run the hour 4 or 7 because time period between 2 and 4 / 5 and 7 is one. I can run hour 12 since I have two hour gap after hour 9. How do I enforce this constrain in Linear programming/ optimization?
I think you are asking for a way to model: "at least two consecutive periods of down time". A simple formulation is to forbid the pattern:
t t+1 t+2
1 0 1
This can be written as a linear inequality:
x(t) - x(t+1) + x(t+2) <= 1
One way to convince yourself this is correct is to just enumerate the patterns:
x(t) x(t+1) x(t+2) LHS
0 0 0 0
0 0 1 1
0 1 0 -1
0 1 1 0
1 0 0 1
1 0 1 2 <--- to be excluded
1 1 0 0
1 1 1 1
With x(t) - x(t+1) + x(t+2) <= 1 we exactly exclude the pattern 101 but allow all others.
Similarly, "at least two consecutive periods of up time" can be handled by excluding the pattern
t t+1 t+2
0 1 0
or
-x(t) + x(t+1) - x(t+2) <= 0
Note: one way to derive the second from the first constraint is to observe that forbidding the pattern 010 is the same as saying y(t)=1-x(t) and excluding 101 in terms of y(t). In other words:
(1-x(t)) - (1-x(t+1)) + (1-x(t+2)) <= 1
This is identical to
-x(t) + x(t+1) - x(t+2) <= 0
In the comments it is argued this method does not work. That is based on a substantial misunderstanding of this method. The pattern 100 (i.e. x(1)=1,x(2)=0,x(3)=0) is not allowed because
-x(0)+x(1)-x(2) <= 0
Where x(0) is the status before we start our planning period. This is historic data. If x(0)=0 we have x(1)-x(2)<=0, disallowing 10. I.e. this method is correct (if not, a lot of my models would fail).

Is pyomo equality expression commutative?

Here's a constraint defined by a function:
def my_constraint(model, j):
a = sum(model.variable_1[i, j] for i in model.i) + sum(model.variable_2[o, j] for o in model.o if o != j)
b = model.variable_3[j]
# Apparently, the order matters !?
return a == b
# return b == a
model.my_constraint = pe.Constraint(model.j, rule=my_constraint)
I assumed the order of the terms of the equality wouldn't matter but if I switch them, I get different results.
I don't know how to get to the bottom of this.
The generated .nl files differ slightly but I'm in a dead end as I don't know how to interpret them.
Investigating .nl files
Two thee-line sets have a sign difference.
File 1:
[...]
24 1
32 -1
35 1
J78 3
25 1
33 -1
34 1
[...]
File 2:
[...]
24 -1
32 1
35 -1
J78 3
25 -1
33 1
34 -1
[...]
When feeding both files to ipopt, I get "infeasible" with file 1 and a solution with file 2. If I edit file 1 to change the signs in either the first or the second three-line set, I get convergence with the same results as file 2.
So the order in the equality of the expression should not matter, but when changing it, I get, in the .nl file, a sign difference that does matter.
Simple example demonstrating how the order of the terms affects the .nl file
from pyomo.environ import ConcreteModel, Set, Var, Constraint, Objective
from pyomo.opt import SolverFactory
model = ConcreteModel()
model.i = Set(initialize=['I1'])
model.j = Set(initialize=['J1'])
model.v1 = Var(model.i, model.j)
model.v2 = Var(model.i, model.j)
model.v3 = Var(initialize=0, bounds=(0, None))
def c1(model, i, j):
#return model.v2[i, j] == model.v1[i, j]
return model.v1[i, j] == model.v2[i, j]
model.c1 = Constraint(model.i, model.j, rule=c1)
def objective_rule(model):
return model.v3
model.objective = Objective(rule=objective_rule)
opt = SolverFactory('ipopt')
opt.solve(model, keepfiles=True)
Depending of the order of the terms in constraint c1, I don't get the same .nl file.
More specifically, both files are identical except for two lines:
g3 1 1 0 # problem unknown
3 1 1 0 1 # vars, constraints, objectives, ranges, eqns
0 0 0 0 0 0 # nonlinear constrs, objs; ccons: lin, nonlin, nd, nzlb
0 0 # network constraints: nonlinear, linear
0 0 0 # nonlinear vars in constraints, objectives, both
0 0 0 1 # linear network variables; functions; arith, flags
0 0 0 0 0 # discrete variables: binary, integer, nonlinear (b,c,o)
2 1 # nonzeros in Jacobian, obj. gradient
0 0 # max name lengths: constraints, variables
0 0 0 0 0 # common exprs: b,c,o,c1,o1
C0
n0
O0 0
n0
x1
2 0
r
4 0.0
b
3
3
2 0
k2
1
2
J0 2
0 -1 # The other file reads 0 1
1 1 # 1 -1
G0 1
2 1
When solving, I get the same results. Probably because the example is rubbish.
A theoretical explanation is that you're seeing alternative optimal solutions. It's entirely possible, depending on the problem formulation, that you've got more than one solution that has the optimal objective value. What order you get these in is going to be sensitive to the order of the constraints. If you're using an LP solver you ought to be able to ask it to give you all of the optimal solutions.

Find the minimal rows with maximum 1s column

I have a numpy 2D array with all zeros and 1s and I want those rows that has atleast one 1 for each column. For example:
PROBLEM STATEMENT: Find minimal rows that gives maximum 1s across all columns.
INPUT1:
A B C D E
t1 0 0 0 1 1
t2 0 1 1 0 1
t3 0 1 1 0 1
t4 1 0 1 0 1
t5 1 0 1 0 1
t6 1 1 1 1 0
Here, there are multiple answers like (t6, t1), (t6, t2), (t6, t3), (t6, t4), (t6, t5).
INPUT2:
A B C D E
t1 0 0 0 1 1
t2 0 1 1 0 1
t3 0 1 1 0 1
t4 1 0 1 0 1
t5 1 0 1 0 1
t6 1 1 1 1 1
Answer: t6
I don't want to use brute force method as my original matrix is very big. Is there a smart way to do this?
Naive solution, worst-case O(2^n)
This iterates over all possible choices of rows, starting with as few rows as possible, making average cases usually low-polynomial time.
from itertools import combinations
import numpy as np
def minimum_rows(arr):
out_list = []
rows = arr.shape[0]
for x in range(1, rows):
for combo in combinations(range(rows),x):
if np.logical_or.reduce(arr[[combo]]).all():
out_list.append(combo)
if out_list:
return out_list
I wrote this entirely on my phone without much testing, so it may or may not work. It employs no tricks, but is fairly fast. Note that it will be slower when the ratio columns/rows is larger or the the probability of a given element being True is smaller, as that makes it less likely for fewer rows to meet the conditions required, causing x to increase, which in turn will increase the number of combinations iterated though.

Dynamic number of X values (dependent variables) in glm function in R isn't giving the right output

R-glm for logistic regression. I was trying to dynamically input values to the formula according to another stack-overflow post.
The function is called from python using rpy2. when I printed out summery(glm.out).
I ran the test for 2 different scenarios.
The input x,y values were taken directly from the python part of the code, and converted to the right format, and passed to logestic_regression function in R. The input value from python is printed below (2nd code block). glm is run on those values using as.formula. And this gave me an output (4th block of code)
The input x,y values are just created in R as given in the code (in this case x=k1,k2 and y=m.) And glm function is run in the traditional way. And that gave me a different output (6th block of code)
The inputs are numerically correct. But the format is different. First scenario- a dataframe and Second- vectors.
Or my glm call is wrong.
R code.
logistic_regression = function(y,x,colnames){
print("Y value is ")
print(y)
print("X value is ")
print(x)
m <- c(1,1,1,0,0,0)
k1 <- c(4,3,5,1,2,3)
k2 <- c(6,7,8,5,6,3)
glm.out = glm(as.formula(paste("y~", paste(colnames, collapse="+"))), family=binomial(logit), data=x)
# glm.out = glm(m~k1+k2, family=binomial(logit), data=x)
return(summary(glm.out))
}
INPUT PRINTED
[1] "Y value is "
[1] 1 1 1 0 0 0
[1] "X value is "
X0 X1
0 4 6
1 3 7
2 5 8
3 1 5
4 2 6
5 3 3
When I ran the code
glm.out = glm(as.formula(paste("y~", paste(colnames, collapse="+"))), family=binomial(logit), data=x)
OUTPUT
Call:
glm(formula = as.formula(paste("y~", paste(colnames, collapse = "+"))),
family = binomial(logit), data = x)
Deviance Residuals:
[1] 0 0 0 0 0 0
Coefficients: (3 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.457e+01 1.310e+05 0 1
X02 6.872e-14 1.853e+05 0 1
X03 3.566e-14 1.853e+05 0 1
X04 4.913e+01 1.853e+05 0 1
X05 4.913e+01 1.853e+05 0 1
X15 NA NA NA NA
X16 NA NA NA NA
X17 4.913e+01 1.853e+05 0 1
X18 NA NA NA NA
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8.3178e+00 on 5 degrees of freedom
Residual deviance: 2.5720e-10 on 0 degrees of freedom
AIC: 12
Number of Fisher Scoring iterations: 23
But when I ran
glm.out = glm(m~k1+k2, family=binomial(logit), data=x)
The output was completely different (looked more correct)
Call:
glm(formula = m ~ k1 + k2, family = binomial(logit), data = x)
Deviance Residuals:
0 1 2 3 4 5
1.532e-06 1.390e-05 2.110e-08 -2.110e-08 -1.344e-05 -2.110e-08
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -199.05 1221734.18 0 1
k1 25.30 281753.45 0 1
k2 20.89 288426.19 0 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8.3178e+00 on 5 degrees of freedom
Residual deviance: 3.7636e-10 on 3 degrees of freedom
AIC: 6
Number of Fisher Scoring iterations: 24
In glm, the formula argument is a symbolic description of the model to be fitted and the data argument is an optional data frame containing the variables in the model.
In your logistic_regression function call of glm(), the model variables indicated in formula y~k1+k2 are not contained within data=x (a data frame with two columns named X0 and X1), and thus, are taken from the environment from which glm is called (your logistic_regression function). The 3 hardcoded vectors (m, k1, k2) in that environment are not associated with the inputs (i.e., the x=k1,k2 and y=m step done in your second scenario is not occurring within your function).
To call glm() using your logistic_regression() input, you could create a data frame consisting of the model variables to use as a single input and edit your function accordingly. For example, you could use:
x <- data.frame(y=c(1, 1, 1, 0, 0, 0), k1=c(4,3,5,1,2,3), k2= c(6,7,8,5,6,3))
logistic_regression <- function(x){
glm.out <- glm(as.formula(paste("y~", paste(colnames(x[,-1]), collapse="+"))), family=binomial(logit), data=x)
return(summary(glm.out))
}
logistic_regression(x)

Categories

Resources