I'm looking to set up a constraint-check in Python using PULP. Suppose I had variables A1,..,Xn and a constraint (AffineExpression) A1X1 + ... + AnXn <= B, where A1,..,An and B are all constants.
Given an assignment for X (e.g. X1=1, X2=4,...Xn=2), how can I check if the constraints are satisfied? I know how to do this with matrices using Numpy, but wondering if it's possible to do using PULP to let the library handle the work.
My hope here is that I can check specific variable assignments. I do not want to run an optimization algorithm on the problem (e.g. prob.solve()).
Can PULP do this? Is there a different Python library that would be better? I've thought about Google's OR-Tools but have found the documentation is a little bit harder to parse through than PULP's.
It looks like this is possible doing the following:
Define PULP variables and constraints and add them to an LpProblem
Make a dictionary of your assignments in the form {'variable name': value}
Use LpProblem.assignVarsVals(your_assignment_dict) to assign those values
Run LpProblem.valid() to check that your assignment meets all constraints and variable restrictions
Note that this will almost certainly be slower than using numpy and Ax <= b. Formulating the problem might be easier, but performance will suffer due to how PULP runs these checks.
You can stay in numpy and accomplish this. Looking at a single line from a matrix you can set your row of A equal to a vector and then create a row sum that allows you to check the index and find if it is true. For example:
a = A[0, :]
row_sum = a*x
sum(row_sum) <= B[0]
The last line will return just True or False. Then if you want to change a single index you could update your row_sum array by using
row_sum[3] = a[3]*new_val
and run your analysis again.
Related
I am looking into using Nlopt for solving optimisation problems in Python.
I have a series of simultaneous equations of the form
Ax = b
where A is an NxM matrix, with x the solution. Another way to think about this is that I have N simultaneous equations of the form x_1c_1m + x_2c_2m + .... + x_Nc_Nm = k_M, where x_i are variables to solve for, c_im is a constant associated with x_i when in equation M=m, and k_m is some constant in equation M=m. c_im and k_m are all known.
What confuses me is how to even approach this in Nlopt. Nlopt requires you to have actual callable functions, which I don't have? I suppose I could generalise each of the equations in that matrix equation above to something like:
def fn(x,c_m,k_m):
val = 0
for x_i, c_im in zip(x,c_m):
val += x_i * c_im
return val - k_m
where c_m and k_m would be already known, with the variables to solve for in x. All the examples I've seen have only been looking at a single variable problem, which has kind of thrown me a little. Would I then have to somehow define M copies of this function, and set each copy of fn as an equality constraint in the Nlopt optimisation object? It's all rather confusing. I'm looking to solve for x, which itself has multiple solutions, and I want to try to find the minimum values of x (or atleast an approximate solution if an exact solution cannot be found). Would I have to then set multiple objective functions, ie obj_fn_i = min(x_i) or something like that? It's all a little confusing to me in terms of what needs to be presented to the solver. I've already got an analytical solution to the above problem, so I can check my results reliably. Any help appreciated.
Cheers!
I have been using NLopt for a couple of problems, and what I have come to understand is the solver requires an objective function which returns a float value, so you must set the function as an MSE sum, or still as a single float value to be minimized. And it can solve for an array of variables x, in which both the objective function and constraint must depend. All equations that are involved in the system you can insert either in the objective function directly, or as constraints.
Hope this was helpful somehow!
I need to define the following function. Is it possible to do in Theano?
UPDATE:
To clarify I'm asking about defining a theano symbolic variable that can take the above form. I understand that I can define 2 separate variables and use either of them based on the value of R. My questions here is it possible to define a single variable that takes the above form. The reason is that I need to take gradients of this variable as well as use it in other variables and it would drastically simplify my solution if I can define this withing a single symbolic variable.
UPDATE 2:
Proposed solution with lambda doesn't work. This doesn't generate a symbolic variable that can later be used with Theano:
r = T.dscalar('r')
dd = lambda r: r + 1 if r > 0 else r - 1
Without knowing specifics about Theano, I remember that one way to turn an if-else statement into a linear equation is to make your if check into a variable itself, setting it as 0 or 1. Then, you can do something like:
sign = (R_t > 0) ## this is the part I don't know how exactly to do
(topEquation * sign) + (bottomEquation * (sign ^ 1))
This has the nice property that if sign is 1 (or True), the bottomEquation will drop out, being multiplied by 1 ^ 1 or just 0. Similarly, topEquation drops out if sign is 0/False.
One note, though maybe Theano can help with this - it will still evaluate both equations, so this could present an efficiency concern (for every single input, it's running both equations, and then ignoring one of them).
Say, I have an equation f(x) = x**2 + 1, I need to find the value of f(2).
Easiest way is to create a function, accept a parameter and return the value.
But the problem is, f(x) is created dynamically and so, a function cannot be written beforehand to get the value.
I am using cvxpy for an optimization value. The equation would look something like below:
x = cvx.Variable()
Si = [(cvx.square(prev[i] + cvx.sqrt(200 - cvx.square(x))) for i in range(3)]
prev is an array of numbers. There will be a Si[0] Si[1] Si[2].
How do i find the value of Si[0] for x=20?
Basically, Is there any way to substitue the said Variable and find the value of equation When using cvxpy ?
Set the value of the variables and then you can obtain the value of the expression, like so:
>>> x.value = 3
>>> Si[0].value
250.281099844341
(although it won't work for x = 20 because then you'd be taking the square root of a negative number).
The general solution to interpreting code on-the-fly in Python is to use the built-in eval() but eval is dangerous with user-supplied input which could do all sorts of nasty to your system.
Fortunately, there are ways to "sandbox" eval using its additional parameters to only give the expression access to known "safe" operations. There is an example of how to limit access of eval to only white-listed operations and specifically deny it access to the built-ins. A quick look at that implementation looks close to correct, but I won't claim it is foolproof.
The sympy.sympify I mentioned in my comment uses eval() inside and carries the same warning.
In parallel to your cvx versions, you can use lambda to define functions on the fly :
f=[lambda x,i=j : (prev[i] + (200 - x*x)**.5)**2 for j in range(3)] #(*)
Then you can evaluate f[0](20), f[1](20), and so on.
(*) the i=j is needed to fit each j in the associated function.
I have read this blog which shows how an algorithm had a 250x speed-up by using numpy. I have tried to improve the following code by using numpy but I couldn't make it work:
for i in nodes[1:]:
for lb in range(2, diameter+1):
not_valid_colors = set()
valid_colors = set()
for j in nodes:
if j == i:
break
if distances[i-1, j-1] >= lb:
not_valid_colors.add(c[j, lb])
else:
valid_colors.add(c[j, lb])
c[i, lb] = choose_color(not_valid_colors, valid_colors)
return c
Explanation
The code above is part of an algorithm used to calculate the self similar dimension of a graph. It works basically by constructing dual graphs G' where a node is connected to each other node if the distance between them is greater or equals to a given value (Lb) and then compute the graph coloring on those dual networks.
The algorithm description is the following:
Assign a unique id from 1 to N to all network nodes, without assigning any colors yet.
For all Lb values, assign a color value 0 to the node with id=1, i.e. C_1l = 0.
Set the id value i = 2. Repeat the following until i = N.
a) Calculate the distance l_ij from i to all the nodes in the network with id j less than i.
b) Set Lb = 1
c) Select one of the unused colors C[ j][l_ij] from all nodes j < i for which l_ij ≥ Lb . This is the color C[i][Lb] of node i for the given Lb value.
d) Increase Lb by one and repeat (c) until Lb = Lb_max.
e) Increase i by 1.
I wrote it in python but it takes more than a minute when try to use it with small networks which have 100 nodes and p=0.9.
As I'm still new to python and numpy I did not find the way to improve its efficiency.
Is it possible to remove the loops by using the numpy.where to find where the paths are longer than the given Lb? I tried to implement it but didn't work...
Vectorized operations with numpy arrays are fast since actual calculations are done with underlying libraries such as BLAS and LAPACK without Python overheads. With loop-intensive operations, you will not see those benefits.
You usually have to figure out a way to vectorize operations (usually possible with a smart use of array slicing). Some operations are inherently loop-intensive, however, and sometimes it is not easy to vectorize them (which seems to be the case for your code).
In those cases, you can first try Numba, which generates optimized machine code from a Python function without any modifications. (You just annotate the function and it will automatically do it for you). I do not have a lot of experience with it, and have not tried using this for complicated functions.
If this does not work, then you can use Cython, which converts Python-like code (with typed variables) into efficient C code automatically and generates a Python extension module that you can import and use in Python. That will usually give you at least an order of magnitude (usually two orders of magnitude) speedup for loop-intensive operations. I generally find Cython easy to use since unlike pure C, one can access your numpy arrays directly in Cython code.
I recommend using Anaconda Python distribution, since you will be able to install these packages easily. I'm sorry I don't have a specific answer for your code.
if you want to go to numpy, you can just change the lists into arrays,
for example distances[i-1][j-1] becomes distances[i-1, j-1] after you declare distances as a numpy array. same with c[i][lb]. About valid_colors and not_valid_colors you should think a bit more because with numpy arrays you cannot append things: the array have fixed length, so you should fix a maximum size before. Another idea is that after you have everything in numpy, you can cythonize your code http://docs.cython.org/src/tutorial/cython_tutorial.html it means that all your loops will become very fast. In any case, if you don't want cython and you look at the blog, you see that distances is declared as an array in the main()
I'm programming a scientific application in Python, and the performance of my algorithm so far is terrible. I'm trying to find an efficient way to code what I'm doing. Basically, I have to multiply
def get_thing(self, chi, n):
return np.sum(self.an[n][j] * pow(chi, -j) for j in xrange(1, self.j))
where self.an[i][j] is a previously generated array. Then I'll have to do this:
pot = np.sum(self.coeffs[n] * self.get_thing(chi, n) for n in xrange(0, self.n))
where chi changes and cannot be cached, as it's a point that is being generated outside this class. Of course, this is extremely slow and not very bright. How can I improve this?
Thanks!
Within get_things you could certainly simplify things as something like:
def get_thing(self, chi, n):
return np.sum(self.an[n,1:self.j] * np.power(chi,-np.arange(1,self.j)))
Note, that you don't want to index numpy arrays using [i][j] notation; instead use [i,j].
You may be able to make further improvements using higher level broadcasting as #eat suggested.
Edit:
Made a couple of changes to the above code to try to get the indexing to match the OP and changed a sign error in my code.
Simply, try to do the computations in higher level of abstraction, i.e. try to avoid python level looping.
Study carefully how to do element-wise operations and how broadcasting operates, and last but not least don't forget the power of linear algebra!