python, gurobi: add constraints efficiently - python

I am trying to optimize a model with 800+ dimensions and 3000+ inequalities in gurobipy. As I couldn't find a method for adding a whole matrix as constraints, I add them with the following code:
for index,inequality in enumerate(inequalities):
expression = 0
for index2,variable in enumerate(inequality):
expression += variable*x[index2]
m.addConstr(expression >= rhs[index])
with x being the variables. This part of the programs needs 70+ seconds, while the problem is optimized in a fraction of seconds. can someone point me in a direction on how to add the constraints more efficiently?

I was able to improve the time to below one second due to the fact that almost all of the matrix consists of zeros by changing the line
for index2,variable in enumerate(inequality):
to
for index2,variable in [(index2,variable) for index2,variable in enumerate(inequality) if variable!=0]:
as only a tiny fraction of the operations need to be run. I would still be interested in a cleaner way of adding these constraints to my model

Related

Is there a chain constraint work around for CVXPY?

I keep encountering the same issue while trying to solve an Integer Programming Problem with cvxpy particularly with the constraints.
Some background on my problem and use case. I am trying to write a program that optimizes cut locations for 3D objects. The goal is to have as few interfaces as possible, but there is a constraint that each section can only have a certain maximum length. To visualize, you could picture a tree. If you cut at the bottom, you only have to make one large cut, but if the tree is larger than the maximum allowed length (if you needed to move it with a trailer of certain length for example) you would need to make another or several more cuts along the tree. As you go further up, it is likely that in addition to the main stem of the tree, you would need to cut some smaller, side branches along the same horizontal plane. I have written a program that output the number of interfaces (or cuts) needed at many evenly spaced horizontal planes along the height of an object. Now I am trying to pass that data to a new piece of code that will perform an Integer Programming optimization to determine the best location(s) to cut the tree if it treats each of the horizontal cutting planes as either active or inactive.
Below is my code:
#Create ideal configuration to solve for
config = cp.Variable(layer_split_number, boolean=True)
#Create objective
objective = sum(config*layer_islands_data[0])
problem = cp.Problem(cp.Minimize(objective),[layer_height_constraint(layer_split_number,layer_height,config) <= ChunkingParameters.max_reach_z])
#solve
problem.solve(solver = cp.GLPK_MI)
Layer Height Constraint function
def layer_height_constraint(layer_split_number,layer_height,config):
#create array of the absolute height (relative to ground) of each layer
layer_height_array = (np.array(range(1,layer_split_number+1))+1)*layer_height
#filter set inactive cuts to 0
active_heights = layer_height_array * config
#filter out all 0's
active_heights_trim = active_heights[active_heights != 0]
#insert top and bottom values
active_heights = np.append(active_heights,[(layer_split_number+1)*layer_height])
active_heights_trim = np.insert(active_heights,0,0)
#take the difference between active cuts to find distance
active_heights_diff = np.diff(active_heights_trim)
#find the maximum of those differences
max_height = max(active_heights_diff)
return(max_height)
With this setup, I get the following error:
Cannot evaluate the truth value of a constraint or chain constraints, e.g., 1 >= x >= 0.
I know that the two problem spots are using the python 'max' function in the last step and the step in the middle where I filter out the 0s in the array (because this introduces another equality of sorts. However, I can't really think of another way to solve this or setup the constraints differently. Is it possible to have cvxpy just accept a value into the constraint? My function is setup to just output a singular maximum distance value based on a given configuration, so to me it would make sense if I could just feed it the value being tried for the configuration (array of 0s and 1s representing inactive and active cuts respectively) for the current iteration and function would return the result which can then just be compared to the maximum allowed distance. However, I'm pretty sure IP solvers are a bit more complex than just running a bunch of iterations but I don't really know.
Any help on this or ideas would be greatly appreciated. I have tried an exhaustive search of the solution space, but when I have 10 or even 50+ cuts an exhaustive search is super inefficient. I would need to try 2^n combinations for n number of potential cuts.

Adding +1 to specific matrix elements

I'm currently coding an algorithm for 4-parametric RAINFLOW method. The idea of this method is to eliminate load cycles from a cycle history, which is normally given in a load (for example force) - time diagram. This is a very frequently used method in mechanical engineering to determine the life span of a product/element, that is exposed to a certain number of the load cycles.
However the result of this method is a so called FROM-TO table or FROM-TO matrix where the rows present the FROM and the columns present the TO number like shown in the picture below:
example of from-to table/matrix
This example is non-realistic as you normally get a file with million points of measurements which means, that some cycles won't occur only once(1) or twice (2) like its shown in the table, but they may occur thousands of times.
Now to the problem:
I coded the algorithm of the method and as a result formed a vector with FROM values and a vector with TO values, like this:
vek_from=[]
vek_to=[]
d=len(a)/2
for i in range(int(d)):
vek_from.append(a[2*i]) # FROM
vek_to.append(a[2*i+1]) # TO
a is the vector with all values, like a=[from, to, from, to,...]
Now I'm trying to form a matrix out of this, like this:
mat_from_to = np.zeros(shape=(int(d),int(d)))
MAT = np.zeros(shape=(int(d),int(d)))
s=int(d-1)
for i in range(s):
mat_from_to[vek_from[i]-2, vek_to[i]-2] += 1
So the problem is that I don't know how to code that when a load cycles occurs several times (it has the same from-to values), how to add +1 to the FROM-TO combination every time that happens, because with what I've coded, it only replaces the previous value with 1, so I can never exceed 1...
So to make explanation shorter, how to code that whenever a combination of FROM-TO values is made that determined the position of an element in the matrix, to add a +1 there...
Hopefully I didn't make it too complicated and someone will be happy to help me with this.
Regards,
Luka

Mandelbrot set on python using matplotlib + need some advices

this is my first post here, so i'm sorry if i didn't follow the rules
i recently learned python, i know the basics and i like writing famous sets and plot them, i've wrote codes for the hofstadter sequence, a logistic sequence and succeeded in both
now i've tried writing mandelbrot's sequence without any complex parameters, but actually doing it "by hand"
for exemple if Z(n) is my complexe(x+iy) variable and C(n) my complexe number (c+ik)
i write the sequence as {x(n)=x(n-1)^2-y(n-1)^2+c ; y(n)=2.x(n-1).y(n-1)+c}
from math import *
import matplotlib.pyplot as plt
def mandel(p,u):
c=5
k=5
for i in range(p):
c=5
k=k-10/p
for n in range(p):
c=c-10/p
x=0
y=0
for m in range (u):
x=x*x-y*y + c
y=2*x*y + k
if sqrt(x*x+y*y)>2:
break
if sqrt(x*x+y*y)<2:
X=X+[c]
Y=Y+[k]
print (round((i/p)*100),"%")
return (plt.plot(X,Y,'.')),(plt.show())
p is the width and number of complexe parameters i want, u is the number of iterations
this is what i get as a result :
i think it's just a bit close to what i want.
now for my questions, how can i make the function faster? and how can i make it better ?
thanks a lot !
A good place to start would be to profile your code.
https://docs.python.org/2/library/profile.html
Using the cProfile module or the command line profiler, you can find the inefficient parts of your code and try to optimize them. If I had to guess without personally profiling it, your array appending is probably inefficient.
You can either use a numpy array that is premade at an appropriate size, or in pure python you can make an array with a given size (like 50) and work through that entire array. When it fills up, append that array to your main array. This reduces the number of times the array has to be rebuilt. The same could be done with a numpy array.
Quick things you could do though
if sqrt(x*x+y*y)>2:
should become this
if x*x+y*y>4:
Remove calls to sqrt if you can, its faster to just exponentiate the other side by 2. Multiplication is cheaper than finding roots.
Another thing you could do is this.
print (round((i/p)*100),"%")
should become this
# print (round((i/p)*100),"%")
You want faster code?...remove things not related to actually plotting it.
Also, you break a for loop after a comparison then make the same comparison...Do what you want to after the comparison and then break it...No need to compute that twice.

Are there any cleverly efficient algorithms to perform a calculation over the space of partitionings of a string?

I'm working on a statistical project that involves iterating over every possible way to partition a collection of strings and running a simple calculation on each. Specifically, each possible substring has a probability associated with it, and I'm trying to get the sum across all partitions of the product of the substring probability in the partition.
For example, if the string is 'abc', then there would be probabilities for 'a', 'b', 'c', 'ab, 'bc' and 'abc'. There are four possible partitionings of the string: 'abc', 'ab|c', 'a|bc' and 'a|b|c'. The algorithm needs to find the product of the component probabilities for each partitioning, then sum the four resultant numbers.
Currently, I've written a python iterator that uses binary representations of integers for the partitions (eg 00, 01, 10, 11 for the example above) and simply runs through the integers. Unfortunately, this is immensely slow for strings longer than 20 or so characters.
Can anybody think of a clever way to perform this operation without simply running through every partition one at a time? I've been stuck on this for days now.
In response to some comments here is some more information:
The string can be just about anything, eg "foobar(foo2)" -- our alphabet is lowercase alphanumeric plus all three type of braces ("(","[","{"), hyphens and spaces.
The goal is to get the likelihood of the string given individual 'word' likelihoods. So L(S='abc')=P('abc') + P('ab')P('c') + P('a')P('bc') + P('a')P('b')P('c') (Here "P('abc')" indicates the probability of the 'word' 'abc', while "L(S='abc')" is the statistical likelihood of observing the string 'abc').
A Dynamic Programming solution (if I understood the question right):
def dynProgSolution(text, probs):
probUpTo = [1]
for i in range(1, len(text)+1):
cur = sum(v*probs[text[k:i]] for k, v in enumerate(probUpTo))
probUpTo.append(cur)
return probUpTo[-1]
print dynProgSolution(
'abc',
{'a': 0.1, 'b': 0.2, 'c': 0.3,
'ab': 0.4, 'bc': 0.5, 'abc': 0.6}
)
The complexity is O(N2) so it will easily solve the problem for N=20.
How why does this work:
Everything you will multiply by probs['a']*probs['b'] you will also multiply by probs['ab']
Thanks to the Distributive Property of multiplication and addition, you can sum those two together and multiply this single sum by all of its continuations.
For every possible last substring, it adds the sum of all splits ending with that by adding its probability multiplied by the sum of all probabilities of previous paths. (alternative phrasing would be appreciated. my python is better than my english..)
First, profile to find the bottleneck.
If the bottleneck is simply the massive number of possible partitions, I recommend parallelization, possibly via multiprocessing. If that's still not enough, you might look into a Beowulf cluster.
If the bottleneck is just that the calculation is slow, try shelling out to C. It's pretty easy to do via ctypes.
Also, I'm not really sure how you're storing the partitions, but you could probably squash memory consumption a pretty good bit by using one string and a suffix array. If your bottleneck is swapping and/or cache misses, that might be a big win.
Your substrings are going to be reused over and over again by the longer strings, so caching the values using a memoizing technique seems like an obvious thing to try. This is just a time-space trade off. The simplest implementation is to use a dictionary to cache values as you calculate them. Do a dictionary lookup for every string calculation; if it's not in the dictionary, calculate and add it. Subsequent calls will make use of the pre-computed value. If the dictionary lookup is faster than the calculation, you're in luck.
I realise you are using Python, but... as a side note that may be of interest, if you do this in Perl, you don't even have to write any code; the built in Memoize module will do the caching for you!
You may get a minor reduction of the amount of computation by a small refactoring based on associative properties of arithmetic (and string concatenation) though I'm not sure it will be a life-changer. The core idea would be as follows:
consider a longish string e.g. 'abcdefghik', 10-long, for definiteness w/o loss of generality. In a naive approach you'd be multiplying p(a) by the many partitions of the 9-tail, p(ab) by the many partitions of the 8-tail, etc; in particular p(a) and p(b) will be multiplying exactly the same partitions of the 8-tail (all of them) as p(ab) will -- 3 multiplications and two sums among them. So factor that out:
(p(ab) + p(a) * p(b)) * (partitions of the 8-tail)
and we're down to 2 multiplications and 1 sum for this part, having saved 1 product and 1 sum. to cover all partitions with a split point just right of 'b'. When it comes to partitions with a split just right of 'c',
(p(abc) + p(ab) * p(c) + p(a) * (p(b)*p(c)+p(bc)) * (partitions of the 7-tail)
the savings mount, partly thanks to the internal refactoring -- though of course one must be careful about double-counting. I'm thinking that this approach may be generalized -- start with the midpoint and consider all partitions that have a split there, separately (and recursively) for the left and right part, multiplying and summing; then add all partitions that DON'T have a split there, e.g. in the example, the halves being 'abcde' on the left and 'fghik' on the right, the second part is about all partitions where 'ef' are together rather than apart -- so "collapse" all probabilities by considering that 'ef' as a new 'superletter' X, and you're left with a string one shorter, 'abcdXghik' (of course the probabilities for the substrings of THAT map directly to the originals, e.g. the p(cdXg) in the new string is just exactly the p(cdefg) in the original).
You should look into the itertools module. It can create a generator for you that is very fast. Given your input string, it will provide you with all possible permutations. Depending on what you need, there is also a combinations() generator. I'm not quite sure if you're looking at "b|ca" when you're looking at "abc," but either way, this module may prove useful to you.

How to tractably solve the assignment optimisation task

I'm working on a script that takes the elements from companies and pairs them up with the elements of people. The goal is to optimize the pairings such that the sum of all pair values is maximized (the value of each individual pairing is precomputed and stored in the dictionary ctrPairs).
They're all paired in a 1:1, each company has only one person and each person belongs to only one company, and the number of companies is equal to the number of people. I used a top-down approach with a memoization table (memDict) to avoid recomputing areas that have already been solved.
I believe that I could vastly improve the speed of what's going on here but I'm not really sure how. Areas I'm worried about are marked with #slow?, any advice would be appreciated (the script works for inputs of lists n<15 but it gets incredibly slow for n > ~15)
def getMaxCTR(companies, people):
if(memDict.has_key((companies,people))):
return memDict[(companies,people)] #here's where we return the memoized version if it exists
if(not len(companies) or not len(people)):
return 0
maxCTR = None
remainingCompanies = companies[1:len(companies)] #slow?
for p in people:
remainingPeople = list(people) #slow?
remainingPeople.remove(p) #slow?
ctr = ctrPairs[(companies[0],p)] + getMaxCTR(remainingCompanies,tuple(remainingPeople)) #recurse
if(ctr > maxCTR):
maxCTR = ctr
memDict[(companies,people)] = maxCTR
return maxCTR
To all those who wonder about the use of learning theory, this question is a good illustration. The right question is not about a "fast way to bounce between lists and tuples in python" — the reason for the slowness is something deeper.
What you're trying to solve here is known as the assignment problem: given two lists of n elements each and n×n values (the value of each pair), how to assign them so that the total "value" is maximized (or equivalently, minimized). There are several algorithms for this, such as the Hungarian algorithm (Python implementation), or you could solve it using more general min-cost flow algorithms, or even cast it as a linear program and use an LP solver. Most of these would have a running time of O(n3).
What your algorithm above does is to try each possible way of pairing them. (The memoisation only helps to avoid recomputing answers for pairs of subsets, but you're still looking at all pairs of subsets.) This approach is at least Ω(n222n). For n=16, n3 is 4096 and n222n is 1099511627776. There are constant factors in each algorithm of course, but see the difference? :-) (The approach in the question is still better than the naive O(n!), which would be much worse.) Use one of the O(n^3) algorithms, and I predict it should run in time for up to n=10000 or so, instead of just up to n=15.
"Premature optimization is the root of all evil", as Knuth said, but so is delayed/overdue optimization: you should first carefully consider an appropriate algorithm before implementing it, not pick a bad one and then wonder what parts of it are slow. :-) Even badly implementing a good algorithm in Python would be orders of magnitude faster than fixing all the "slow?" parts of the code above (e.g., by rewriting in C).
i see two issues here:
efficiency: you're recreating the same remainingPeople sublists for each company. it would be better to create all the remainingPeople and all the remainingCompanies once and then do all the combinations.
memoization: you're using tuples instead of lists to use them as dict keys for memoization; but tuple identity is order-sensitive. IOW: (1,2) != (2,1) you better use sets and frozensets for this: frozenset((1,2)) == frozenset((2,1))
This line:
remainingCompanies = companies[1:len(companies)]
Can be replaced with this line:
remainingCompanies = companies[1:]
For a very slight speed increase. That's the only improvement I see.
If you want to get a copy of a tuple as a list you can do
mylist = list(mytuple)

Categories

Resources