I am new to Z3 and trying the examples found here, implementing the examples in python. When I try the examples in the "Unbounded Objectives" section I get seemingly random integer values (not 'oo'). For the following code:
x, y = Ints('x y')
opt = Optimize()
opt.add(x < 2)
opt.add((y - x) > 1)
opt.maximize(x + y)
print(opt.check())
print(opt.model())
I get the output:
sat
[y = 5, x = 1]
But the problem is unbounded, I expect it to give me y equal to infinity. A more simple example:
x, y, profit = Ints('x y profit')
opt = Optimize()
opt.add(profit == 2*x + y)
opt.maximize(profit)
print(opt.check())
print(opt.model())
This example gives me:
sat
[x = 0, y = 0, profit = 0]
My question is: Why am I not getting infinity here? Am I doing something wrong or is this the output I should expect from Z3 with python when I give it an unbounded optimization problem?
I am using python 3.9 on Pycharm 2021.2.1, Z3 version 4.8.15.
You're not quite using the API correctly. You should use the value function on the optimization goal instead. For instance, your second query is coded as:
from z3 import *
x, y, profit = Ints('x y profit')
opt = Optimize()
opt.add(profit == 2*x + y)
maxProfit = opt.maximize(profit)
print(opt.check())
print("maxProfit =", maxProfit.value())
This prints:
sat
maxProfit = oo
Note that when the optimization result is oo, then the model values for x/y/profit etc., are irrelevant. (They are no longer integers.) If the optimization result is a finite value, then you can look at opt.model() to figure out what assignment to your variables achieved the optimal goal. So, the values of x/y and profit you get in your example, while printed as 0, are meaningless; since the goal is not a proper integer value.
Related
I am trying to fit 3-dimensional data (that is, 2 independent and 1 dependent variable) using multivariate fitting in scipy curve_fit. I wish to do piecewise fitting for the same problem. I have tried to proceed on the basis of this without any success. The problem is defined below:
import numpy as np
from scipy.optimize import curve_fit
#..........................................................................................................
def F0(X, a, b, c, c0, y0):
x, y = X
value = []
for i in range(0, len(x)):
if y[i] < y0:
lnZ = x[i] + c0*y[i]
else:
lnZ = x[i] + c*y[i]
val = a + (b*lnZ)
value.append(val)
return value
#..........................................................................................................
def F1(X, a, b, c):
x, y = X
lnZ = x + c*y
value = a + (b*lnZ)
return value
#..........................................................................................................
x = [-2.302585093,
-2.302585093,
-2.302585093,
-2.302585093,
-2.302585093,
-2.302585093,
-2.302585093,
0,
0,
0,
0,
0,
0,
0,
2.302585093,
2.302585093,
2.302585093,
2.302585093,
2.302585093,
2.302585093,
2.302585093
]
y = [7.55E-04,
7.85E-04,
8.17E-04,
8.52E-04,
8.90E-04,
9.32E-04,
9.77E-04,
7.55E-04,
7.85E-04,
8.17E-04,
8.52E-04,
8.90E-04,
9.32E-04,
9.77E-04,
7.55E-04,
7.85E-04,
8.17E-04,
8.52E-04,
8.90E-04,
9.32E-04,
9.77E-04
]
z = [4.077424497,
4.358253892,
4.610475878,
4.881769469,
5.153063061,
5.323277142,
5.462023074,
4.610475878,
4.840765517,
5.04864602,
5.235070966,
5.351407761,
5.440090728,
5.540693448,
4.960439843,
5.118257381,
5.266539115,
5.370479367,
5.440090728,
5.528296904,
5.5816974,
]
popt, pcov = curve_fit(F0, (x, y), z, method = 'lm')
print(popt)
popt, pcov = curve_fit(F1, (x, y), z, method = 'lm')
print(popt)
The output is:
[1.34957781e+00 1.05456428e-01 1.00000000e+00 4.14879613e+04
1.00000000e+00]
[1.34957771e+00 1.05456434e-01 4.14879603e+04]
You can see that the values of parameters in the piecewise fitting remain as the initial values. I know I am not doing it in the correct way. Please correct me.
The main source of the problem is the insensitivity of this approach to the value of the variable that defines the switch from one function to another (see this response for a similar explanation). Moreover, the choice of starting parameters isn't good.
Since no starting values are provided, curve_fit chooses a value of 1 for all the fitting parameters (see here the default value for p0). Since the fitting algorithm works by making small variations on the parameters, y0 is varied in small steps around 1, which produces no changes in the output of the function (all y values are much smaller than 1). Since y[i] < y0 is always True and only the first branch is ever evaluated, and the output of the function does not depend on the value of c. That explains why y0 and c stay at the initial values.
One might expect that setting y0 initial value to be inside of the range of values that are evaluated (i.e. around 8E-4) might solve the problem. Indeed, since the second branch is evaluated, the value of c is now optimized. Nevertheless, y0 value will stay unchanged. As the fitting algorithm works testing very small changes to the values, the changes are not large enough to move from the interval between two experimental y values to another one. In this particular case, if one chooses 8E-4, the small variations will never be enough to make it go over 8.17E-04 or below 7.85E-4, that are the values encompassing initial y0 choice.
One can usually circumvent this problem making the function depend explicitly on the value of y0. A smart choice would be to redefine the function so the value at y0 is the same no matter which branch is taken (i.e. ensure that the function is continuous). In this case, the function definition does not ensure so. A reasonable change would be:
def F2(X, a, b, c, c0, y0):
x, y = X
value = []
for i in range(0, len(x)):
lnZ = x[i] + c0 * y[i]
if y[i] >= y0:
lnZ += c * (y[i]-y0)
val = a + (b*lnZ)
value.append(val)
return value
which changes the meaning of the parameter c, and limits the results to only continuous functions. In this case, the value of y0 is indeed the function turning point. Nevertheless, it yields the desired results:
popt2, pcov = curve_fit(F2, (x, y), z, p0=(1, 1, 1E4, 1E4, 9.1E-4), method = 'lm')
print(popt2)
results in:
[-1.93417968e-01 1.05456433e-01 -3.65740192e+04 5.97890809e+04
8.64354057e-04]
A better (pythonic) definition for the function avoids the for loop:
def F3(X, a, b, c, c0, y0):
x, y = X
lnZ = x + c0 * y
idx = np.where(y>=y0)
lnZ[idx] += c * (y[idx] - y0)
rv = a + (b * lnZ)
return rv
which will probably be much faster for larger datasets.
I am exploring PyTorch, and I do not understand the output of the following example:
# Initialize x, y and z to values 4, -3 and 5
x = torch.tensor(4., requires_grad = True)
y = torch.tensor(-3., requires_grad = True)
z = torch.tensor(5., requires_grad = True)
# Set q to sum of x and y, set f to product of q with z
q = x + y
f = q * z
# Compute the derivatives
f.backward()
# Print the gradients
print("Gradient of x is: " + str(x.grad))
print("Gradient of y is: " + str(y.grad))
print("Gradient of z is: " + str(z.grad))
Output
Gradient of x is: tensor(5.)
Gradient of y is: tensor(5.)
Gradient of z is: tensor(1.)
I have little doubt that my confusion originates with a minor misunderstanding. Can someone explain in a stepwise manner?
I hope you understand that When you do f.backward(), what you get in x.grad is .
In your case
.
So, simply (with preliminary calculus)
If you put your values for x, y and z, that explains the outputs.
But, this isn't really "Backpropagation" algorithm. This is just partial derivatives (which is all you asked in the question).
Edit:
If you want to know about the Backpropagation machinery behind it, please see #Ivan's answer.
I can provide some insights on the PyTorch aspect of backpropagation.
When manipulating tensors that require gradient computation (requires_grad=True), PyTorch keeps track of operations for backpropagation and constructs a computation graph ad hoc.
Let's look at your example:
q = x + y
f = q * z
Its corresponding computation graph can be represented as:
x -------\
-> x + y = q ------\
y -------/ -> q * z = f
/
z --------------------------/
Where x, y, and z are called leaf tensors. The backward propagation consists of computing the gradients of x, y, and y, which correspond to: dL/dx, dL/dy, and dL/dz respectively. Where L is a scalar value based on the graph output f. Each operation performed needs to have a backward function implemented (which is the case for all mathematically differentiable PyTorch builtins). For each operation, this function is effectively used to compute the gradient of the output w.r.t. the input(s).
The backward pass would look like this:
dL/dx <------\
x -----\ \
\ dq/dx
\ \ <--- dL/dq-----\
-> x + y = q ----\ \
/ / \ df/dq
/ dq/dy \ \ <--- dL/df ---
y -----/ / -> q * z = f
dL/dy <------/ / /
/ df/dz
z -------------------------/ /
dL/dz <--------------------------/
The "d(outputs)/d(inputs)" terms for the first operator are: dq/dx = 1, and dq/dy = 1. For the second operator they are df/dq = z, and df/dz = q.
Backpropagation comes down to applying the chain rule: dL/dx = dL/dq * dq/dx = dL/df * df/dq * dq/dx. Intuitively we decompose dL/dx in the opposite way than what backpropagation actually does, which to navigate bottom up.
Without shape considerations, we start from dL/df = 1. In reality dL/df has the shape of f (see my other answer linked below). This results in dL/dx = 1 * z * 1 = z. Similarly for y and z, we have dL/dy = z and dL/dz = q = x + y. Which are the results you observed.
Some answers I gave to related topics:
Understand PyTorch's graph generation
Meaning of grad_outputs in PyTorch's torch.autograd.grad
Backward function of the normalize operator
Difference between autograd.grad and autograd.backward
Understanding Jacobian tensors in PyTorch
you just got to understand what are the operations and what are the partial derivatives you should use to come at each, for example:
x = torch.tensor(1., requires_grad = True)
q = x*x
q.backward()
print("Gradient of x is: " + str(x.grad))
will give you 2, because the derivative of x*x is 2*x.
if we take your exemple for x, we have:
q = x + y
f = q * z
which can be modified as:
f = (x+y)*z = x*z+y*z
if we take the partial derivative of f in function of x, we endup with just z.
To come at this result you have to consider all other variables a constant and apply the derivative rules you already know.
But keep in mind, the process that pytorch executes to get these results are not symbolic or numeric differentiation, is Automatic differentiation, which is a computational method to efficiently get the gradients.
Take a closer look at:
https://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/slides/lec10.pdf
I have the following equation: x/0,2 * (0,2+1)+y/0,1*(0,1+1) = 26.34
The initial values of X and Y are set as 4.085 and 0.17 respectively.
I need to find the values of X and Y which satisfy the equation and have the lowest common deviation from initially set values. In other words, sum of |4.085 - x| and |0.17 - y| is minimized.
With Excel Solver Valueof Function this easy to find:
we insert x and y as variables to be changed to reach 26 in the formula result
Here is my python code (I am trying to use sympy for that)
x,y = symbols('x y')
eqn = solve([Eq(x/0.2*(0.2+1)+y/0.1*(0.1+1),26)],x,y)
print(eqn)
I am getting however strange result {x: 4.33333333333333 - 1.83333333333333*y}
Can anyone help me solve this equation?
The answer you are obtaining is not strange, it is just the answer to what you ask. You have an equation on two variables x and y, the solution to this problem is in general not unique (sometimes infinite). Now, you can either add an extra condition (inequality for example) or change the numeric Domain in which solutions are possible (like in Diophantine equations). You can do either of them in Sympy, in the following example I find the solution on x to your problem in the Real domain, using solveset:
from sympy import symbols, Eq, solveset
x,y = symbols('x y')
eqn = solveset(Eq(1.2 * x / 0.2 + 1.1 * y / 0.1, 26), x, Reals)
print(eqn)
Output:
Intersection(FiniteSet(4.33333333333333 - 1.83333333333333*y), Reals)
As you can see the solution on x is a finite set, that is the intersection between a straight line on y and the Reals. Any particular solution can be found by direct evaluation of y.
This is equivalent to say x = 4.33333333333333 - 1.83333333333333 * y if you evaluate this equation in the guess value y = 0.17, you obtain x = 4.0216 (close to your x = 4.085 guess value).
Edit:
After analyzing the new information added to your question, I think I have finally understood it: your problem is a constrained optimization. Now, I don't use Excel frequently, but it would be my bet that under the hood this optimization is carried out there using Lagrange multipliers. In your particular case, the target function represents the deviation of the solution (x, y) from the point (4.085, 0.17). For convenience, I have chosen this function to be the Euclidean distance between them (absolute values as you suggested can be problematic due to discontinuity of the derivatives). The constraint function is simply the equation you provided. To solve this problem with Sympy, one could use something like this:
import sympy as sp
# Define symbols and functions
x, y, lamb = sp.symbols('x, y, lamb', real=True)
func = sp.sqrt((x - 4.085) ** 2 + (y - 0.17) ** 2) # Target function
const = 1.2 * x / 0.2 + 1.1 * y / 0.1 - 26 # Constraint function
# Define Lagrangian
lagrang = func - lamb * const
# Compute gradient of Lagrangian
grad_lagrang = [sp.diff(lagrang, var) for var in [x, y, lamb]]
# Solve the resulting system of equations
spoints = sp.solve(grad_lagrang, [x, y, lamb], dict=True)
# Print stationary points
print(spoints)
Output:
[{x: 4.07047770700637, lamb: -0.0798086884467563, y: 0.143375796178345}]
Since in our case only one stationary point was found, this is the optimal solution (although this is only a necessary condition). The value of the lamb multiplier can be ditched, so x, y = 4.070, 0.1434. Hope this helps.
Definition of the problem
I am trying to calculate the points of intersection of geometrical objects, such as two planes and a sphere, in python.
Let's consider for example these three objects:
This system gives two solutions:
I would like to know if there is a python library that can help develop a solver to calculate these intersections. I am looking for something working as Wolfram alpha, where we can input three equations and it returns all the possible solutions when there's finite number of solutions for simplicity.
What I tried
I tried with SymPy, but it returns []:
from sympy.solvers import solve
from sympy import Symbol
x = Symbol('x')
y = Symbol('y')
z = Symbol('z')
solve(z, x, x**2 + y**2 + z**2 -1)
I then tried with scipy:
from scipy.optimize import fsolve
def f(x):
y = np.zeros(3)
y[2] = x[2]
y[0] = x[0]
y[1] = x[0] ** 2 + x[1] ** 2+ x[2] ** 2 - 1
return y
x0 = np.array([10, 10, 10])
solution = fsolve(f, x0)
print(solution[0],solution[1],solution[2])
but it only returns one of the two solutions:
6.79746218330325e-28 1.0000000000000002 -2.3528179942097343e-35
I also tried with gekko, and stil it only returns one possible solution (which depends on the initial guess):
from gekko import GEKKO
m = GEKKO()
x = m.Var(value = 1)
y = m.Var(value = 1)
z = m.Var(value = 1)
m.Equation(x == 0)
m.Equation(z == 0)
m.Equation(x**2 + y**2+z**2 ==1)
m.solve()
fsolve from scipy, and all other functions that I personally know of that will accept any form of input function, will return one value.
One workaround if you have an idea where the other solution is would be to give an x0 value that is closer to the second solution with a second call to fsolve (see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fsolve.html).
If you alternatively know what range you want to try and find solutions in, the easiest way is to make an array that you then check to see where the value changes sign (this would be doing it from scratch)
I found the solution with sympy. Apparently it's one of the only (if not only) libraries that allow finding analytical solutions, and returns more than just one solution. Also, we don't need to pass guesses as initial variables. In my question, there was an error in the example I posted with sympy. This is how I solved the system:
from sympy.solvers import solve
import sympy as sp
x = Symbol('x')
y = Symbol('y')
z = Symbol('z')
sp.solve([z , x, (x**2 + y**2 + z**2) - 1], x,y,z)
Result: [0,-1,0], [0,1,0]
Complex numbers in theano are not fully implemented yet, as discussed for example in this groups.google post and this question.
However, given that some support does seem to exist, I am trying to understand what can actually be done at the moment.
Consider for example this very simple code:
x = T.dscalar('x')
y = 2j * x
gy = T.grad(y, x)
f = theano.function([x], gy)
f(1.234)
that is, derivative with respect to the (real) scalar x of a * x with a some complex number.
The above code does not produce a result, complaining that
Casting from complex to real is ambiguous: consider real(), imag(), angle() or abs()
How can this code be made to work?
Here is the simple implementation I managed to get to work:
x = T.dscalar('x')
y_R = T.real(2j) * x
y_I = T.imag(2j) * x
gy_R = T.grad(y_R, x)
gy_I = T.grad(y_I, x)
gy = gy_R + 1j * gy_I
f = theano.function([x], gy)
f(1.234)
# array(2j)
basically, separate the complex constant into real and imaginary part, compute the gradient separately and only at the end sum them to get the complex result.
The problem with this method is that it doesn't work if we try a more complex example, like computing the gradient with respect to x of expm(1j * x * H) for some matrix H:
x = T.dscalar('x')
expH = T.slinalg.expm(1j * x * H)
expH_flat = T.flatten(expH)
expH_flat_R = T.real(T.flatten(expH))
expH_flat_I = T.imag(T.flatten(expH))
def fn(i, mat, x):
return T.grad(mat[i], x)
J_R, updates = theano.scan(fn, sequences=T.arange(expH_flat_R.shape[0]), non_sequences=[expH_flat_R, x])
J_I, updates = theano.scan(fn, sequences=T.arange(expH_flat_I.shape[0]), non_sequences=[expH_flat_I, x])
expH_J_R = J_R.reshape(expH.shape)
expH_J_I = J_I.reshape(expH.shape)
expH_J = expH_J_R + 1j * expH_J_I
f = theano.function([x], expH_J)
f(2)
which returns
TypeError: Elemwise{real,no_inplace}.grad illegally returned an integer-valued variable. (Input index 0, dtype complex128)
If this cannot be achieved at all with theano (like for example this question seems to suggest), is it for some particular reason, like fundamental difficulties of sort?
It seems that this is simply not possible natively (and likely won't be in the future, considering the development of Theano stopped in 2017).
One way around it is to remap every complex matrix into the larger real matrix and perform all operations on the bigger matrix (some care will be needed to redefine the various operations).
A possible mapping is A -> [[A_R, -A_I], [A_I, A_R]].