Pyomo Model LP file with variable values - python

I've build a pyomo model, and via following I am writing the lp file of the model:
# write LP file
filename = os.path.join(os.path.dirname(__file__), 'model.lp')
model.write(filename, io_options={'symbolic_solver_labels': True})
I am getting the model.lp file in the folder. And it looks like following:
\* Source Pyomo model name=urbs *\
min
obj:
+1 costs(Environmental)
+1 costs(Fixed)
+1 costs(Fuel)
+1 costs(Invest)
+1 costs(Variable)
s.t.
c_e_res_vertex(1_Mid_Biomass_Stock)_:
+1 e_co_stock(1_Mid_Biomass_Stock)
-1 e_pro_in(1_Mid_Biomass_plant_Biomass)
= 0
c_e_res_vertex(1_Mid_Coal_Stock)_:
+1 e_co_stock(1_Mid_Coal_Stock)
-1 e_pro_in(1_Mid_Coal_plant_Coal)
= 0
My problem is, I would like to also save the variable values of the model.
Is there a way to force solver to write values of the variables of the model into the lp file?
or something, which does the same thing with different manner?

Two ways comes to my mind.
Good old search and replace
With your LP file, do a search and replace. For example, the variable x with indices 2 and 'productA' (in Pyomo: model.x[2,'productA']) is written in the LP file as x(2_productA). Knowing this, for each variable and for each of their indices, generate their name in the LP format, and search all occurences of these in the LP file, to replace them with their value.
If lpFileContent is the string that is contained in your LP file, this will look like:
for v in model.component_objects(Var, active=True):
varName = str(v)
varObject = getattr(model, varName)
for index in varObject:
indexStr = str(index)
# Convert your index string:
indexStr.replace(",","_")
indexStr.replace(" ","_")
indexStr.replace("'","") # Add more as you need.
#Replace by value:
lpFileContent.replace(varName + "(" + indexStr + ")", varObject[index].value)
with open("output.txt", "w") as outputFile
outputFile.write(lpFileContent)
Using exressions
When defining constraints, it is comon to do it this way (from Pyomo doc):
model.A = RangeSet(1,10)
model.a = Param(model.A, within=PositiveReals)
model.ToBuy = Var(model.A)
def bud_rule(model, i):
return model.a[i]*model.ToBuy[i] <= i
aBudget = Constraint(model.A, rule=bud_rule)
Then, it is always possible to retrieve the expression of this constraint by doing this little trick:
model.A = RangeSet(1,10)
model.a = Param(model.A, within=PositiveReals)
model.ToBuy = Var(model.A)
def bud_rule(model, i):
print(str(model.a[i]*model.ToBuy[i]) + " <= " + str(i))
return model.a[i]*model.ToBuy[i] <= i
aBudget = Constraint(model.A, rule=bud_rule)
which will yield something like
1 * model.ToBuy[1] <= 1
2 * model.ToBuy[2] <= 2
3 * model.ToBuy[3] <= 3
4 * model.ToBuy[4] <= 4
... #It goes on
10 * model.ToBuy[10] <= 10
I believe that this is also something you can use (with search and replace or by printing variable values while building your constraints again but after solving). It grants more ability to customize your output, is easy to debug, but will be extremely slow if the expression is long (like with a summation on thousands of elements).

Related

Longest common substring with rolling hash

I am implementing in Python3 an algorithm to find the longest substring of two strings s and t. Given s and t, I need to return (a,b,l) where l is the length of the longest common substring, a is the position in s where the longest substring starts, and b is the position in t where the longest substring starts. I have a working version of the algorithm but it is quite slow and I am not sure why; it is frustrating because I have found other implementations in python using pretty much the same logic that are many times faster. I am self-learning so any help would be greatly appreciated.
The approach is based on comparing hash values rather than directly comparing substrings and using binary search to find maximal length of common substrings. Here is the code for my hash function (m is a big prime and x is just some constant):
def polynomial_hash(my_string, m, x):
str_len = len(my_string)
result = 0
for i in range(str_len):
result = (result + ord(my_string[i]) * power_mod_p(x, i, m)) % m
return result
Given two strings s and t, I first find which string is shorter, without loss of generality, let s be the shorter string. First I need to find the hash values of substrings of a string. I use the following function, implemented as a generator:
def all_length_k_hashes(my_string, k, m, x):
current_position = len(my_string) - k
x_to_the_k = power_mod_p(x, k, m)
hash_value = polynomial_hash(my_string[current_position:], m, x)
yield (hash_value, current_position)
while current_position > 0:
current_position = current_position - 1
hash_value = ((hash_value * x) + ord(my_string[current_position]) - x_to_the_k*ord(my_string[current_position + k])) % m
yield (hash_value, current_position)
This function is simple, its first yield is the hash value of the final length k substring of the string, after that each of its iteration is the hash value of the next length k substring to its left (we move left by one position, for example for k=3 from abcdefghi to abcdefghi then from abcdefghi to abcdefghi). This should be able to calculate all the hash values of all length k substrings of my_string in O(|my_string|).
Now I find out if s and t has a length k substring in common, I use the following function:
def common_sub_string_length_k(shorter_str, longer_str, k, m, x):
short_str_dict = dict()
for hash_and_index in all_length_k_hashes(shorter_str, k, m, x):
short_str_dict.update({hash_and_index[0]: hash_and_index[1]})
hash_generator_longer_str = all_length_k_hashes(longer_str, k, m, x)
for hash_and_index in hash_generator_longer_str:
if hash_and_index[0] in short_str_dict:
return (short_str_dict[hash_and_index[0]], hash_and_index[1])
return False
What is happening in this function is: I create a Python empty dictionary and fill it with (key:values) such that each key is the hash value of a length k substring of the shorter string and its value is that substring's starting index, I call this 'short_str_dict'
Then, using all_length_k_hashes, I create a generator of hash values of substrings of length k of the longer string, then I iterate through this generator to check if there is a hash value that's in the 'short_str_dict', if there is, then the two strings have a substring of length k in common (assuming no hash collisions). This whole process should take time O(|shorter_string| + |longer_string|)
Finally, the following function repeatedly uses the previous process to find the maximal k, using a binary search technique:
def longest_common_substring(str_1, str_2):
m_1 = 309000599
m_2 = 988017827
x = randint(1, 10 ** 6)
len_str_1 = len(str_1)
len_str_2 = len(str_2)
if len_str_1 <= len_str_2:
short_str = str_1
long_str = str_2
switched = False
else:
short_str = str_2
long_str = str_1
switched = True
len_short_str = len(short_str)
len_long_str = len(long_str)
low = 0
high = len_short_str
mid = 0
longest_so_far = 0
longest_indices = (0,0)
while low <= high:
mid = (high + low) // 2
m1_result = common_sub_string_length_k(short_str, long_str, mid, m_1, x)
m2_result = common_sub_string_length_k(short_str, long_str, mid, m_2, x)
if m1_result is False or m2_result is False:
high = mid - 1
else:
longest_so_far = mid
longest_indices = m1_result
low = mid + 1
if switched:
return (longest_indices[1], longest_indices[0], longest_so_far)
else:
return (longest_indices[0], longest_indices[1], longest_so_far)
Two different hashes are used to reduce the probability of a collision. So in total, assuming no collisions, this whole process should take
O(log|shorter_string|) * O(|shorter_string| + |longer_string|).
Have I made any error? Is it slow because of the use of Python dictionaries? I really want to understand my mistake. Any help is greatly appreciated.

Error while writing optimization constraint in cplex python api

My goal is to write the following model using docplex.mp.model in python. which ptj takes binary variable {0,1}.
[summation from of Ptj from j=1 to t][t = 1,.....,8]
here is the code I wrote:
N = 8
(period_list = [t for t in range(1, no_of_period+1)])
(j = period_list)
p = Mode.binary_var_dict(period_list, name = 'p')
for t in period_list:
for j in range(1,t+1):
Model.add_constraints(Model.sum(p[t,j]) == 1)
but I got an error. Could anyone help me with this problem please?
Your code has numerous issues.
First, you need to create one instance of docplex.mp.model.Model to add constraints to: all your calls to Model.<fn> should be rewritten as mdl.<fn> as they are instance methods.
Second, the variable dict you create has periods as keys, that is, 1,2,..P
so querying p[t,j] is sure to crash with KeyError. If you need a square matrix of variables for each couple of periods, use Model.binary_var_matrix.
Third: Model.add_constraints (with a final S) expects an iterable, but you are passing one constraint, this is also sure to crash.
Lastly, using ranges starting at 1 is not the simplest nor the safest choice with Docplex.
Here is a code, freely derived from your sample, which I guess is close to what you need:
pr = range(1, no_of_period+1)
from docplex.mp.model import Model
m = Model()
p = m.binary_var_matrix(pr, pr, name = 'p')
m.add_constraints( (m.sum(p[t,j] for j in pr) == 1) for t in pr)
print(m.lp_string)
and the output is:
Minimize
obj:
Subject To
c1: p_1_1 + p_1_2 + p_1_3 = 1
c2: p_2_1 + p_2_2 + p_2_3 = 1
c3: p_3_1 + p_3_2 + p_3_3 = 1
[..snip..]

Longest subset of five-positions five-elements permutations, only one-element-position in common

I am trying to get the longest list of a set of five ordered position, 1 to 5 each, satisfying the condition that any two members of the list cannot share more than one identical position (index). I.e., 11111 and 12222 is permitted (only the 1 at index 0 is shared), but 11111 and 11222 is not permitted (same value at index 0 and 1).
I have tried a brute-force attack, starting with the complete list of permutations, 3125 members, and walking through the list element by element, rejecting the ones that do not match the criteria, in several steps:
step one: testing elements 2 to 3125 against element 1, getting a new shorter list L'
step one: testing elements 3 to N' against element 2', getting a shorter list yet L'',
and so on.
I get a 17 members solution, perfectly valid. The problem is that:
I know there are, at least, two 25-member valid solution found by a matter of good luck,
The solution by this brute-force method depends strongly on the initial order of the 3125 members list, so I have been able to find from 12- to 21-member solutions, shuffling the L0 list, but I have never hit the 25-member solutions.
Could anyone please put light on the problem? Thank you.
This is my approach so far
import csv, random
maxv = 0
soln=0
for p in range(0,1): #Intended to run multiple times
z = -1
while True:
z = z + 1
file1 = 'Step' + "%02d" % (z+0) + '.csv'
file2 = 'Step' + "%02d" % (z+1) + '.csv'
nextdata=[]
with open(file1, 'r') as csv_file:
data = list(csv.reader(csv_file))
#if file1 == 'Step00.csv': # related to p loop
# random.shuffle(data)
i = 0
while i <= z:
nextdata.append(data[i])
i = i + 1
for j in range(z, len(data)):
sum=0
for k in range(0,5):
if (data[z][k] == data[j][k]):
sum = sum + 1
if sum < 2:
nextdata.append(data[j])
ofile = open(file2, 'wb')
writer = csv.writer(ofile)
writer.writerows(nextdata)
ofile.close()
if (len(nextdata) < z + 1 + 1):
if (z+1)>= maxv:
maxv = z+1
print maxv
ofile = open("Solution"+"%02d" % soln + '.csv', 'wb')
writer = csv.writer(ofile)
writer.writerows(nextdata)
ofile.close()
soln = soln + 1
break
Here is a Picat model for the problem (as I understand it): http://hakank.org/picat/longest_subset_of_five_positions.pi It use constraint modelling and SAT solver.
Edit: Here is a MiniZinc model: http://hakank.org/minizinc/longest_subset_of_five_positions.mzn
The model (predicate go/0) check lengths of 2 to 100. All lengths between 2 and 25 has at least one solution (probably at lot more). So 25 is the longest sub sequence. Here is one 25 length solution:
{1,1,1,3,4}
{1,2,5,1,5}
{1,3,4,4,1}
{1,4,2,2,2}
{1,5,3,5,3}
{2,1,3,2,1}
{2,2,4,5,4}
{2,3,2,1,3}
{2,4,1,4,5}
{2,5,5,3,2}
{3,1,2,5,5}
{3,2,3,4,2}
{3,3,5,2,4}
{3,4,4,3,3}
{3,5,1,1,1}
{4,1,4,1,2}
{4,2,1,2,3}
{4,3,3,3,5}
{4,4,5,5,1}
{4,5,2,4,4}
{5,1,5,4,3}
{5,2,2,3,1}
{5,3,1,5,2}
{5,4,3,1,4}
{5,5,4,2,5}
There is a lot of different 25 lengths solutions (the predicate go2/0 checks that).
Here is the complete model (edited from the file above):
import sat.
main => go.
%
% Test all lengths from 2..100.
% 25 is the longest.
%
go ?=>
nolog,
foreach(M in 2..100)
println(check=M),
if once(check(M,_X)) then
println(M=ok)
else
println(M=not_ok)
end,
nl
end,
nl.
go => true.
%
% Check if there is a solution with M numbers
%
check(M, X) =>
N = 5,
X = new_array(M,N),
X :: 1..5,
foreach(I in 1..M, J in I+1..M)
% at most 1 same number in the same position
sum([X[I,K] #= X[J,K] : K in 1..N]) #<= 1,
% symmetry breaking: sort the sub sequence
lex_lt(X[I],X[J])
end,
solve([ff,split],X),
foreach(Row in X)
println(Row)
end,
nl.

Python CPLEX API: Conditional iteration on binary variables

I'm working on a graph-theoretical problem. Suppose we want to find a Graph G=(V,E), such that there exists a partition X of V containing at most k equivalence classes. A variable p_S takes value 1 exactly when S is a member of partition X, and zero otherwise. So we have a constraint that the sum over all variables p_S is at most k, for all subsets S of V.
So what I want to do is to iterate over all p_S that have value 1 and define more constraints based on the elements I draw out of S. These constraints would preserve that members of an equivalence class share some mutual property.
Is it possible to access the p_S variables that way and how could I do it?
ALternatively I know I can do without iterating on my binary variables if I'm allowed to use binary variables as coefficients in my constraints. Is that possible?
Thanks in advance!
The CPLEX Python API is index based. To iterate over all binary variables with a solution value set to 1 we need to query the variables types and the solution values and filter accordingly. Here is a simple example:
import sys
import cplex
def main(modelfile):
# Read in a model file.
c = cplex.Cplex()
c.read(modelfile)
# Solve the model and print the solution and status.
c.solve()
print("Solution value:", c.solution.get_objective_value())
print("Solution status: {0} ({1})".format(
c.solution.get_status_string(),
c.solution.get_status()))
# Display all binary variables that have a solution value of 1.
types = c.variables.get_types()
nvars = c.variables.get_num()
binvars = [idx for idx, typ
in zip(range(nvars), c.variables.get_types())
if typ == c.variables.type.binary]
inttol = c.parameters.mip.tolerances.integrality.get()
binvars_at_one = [idx for idx, val
in zip(binvars, c.solution.get_values(binvars))
if abs(val - 1.0) <= inttol]
print("Binary variables with a solution value equal to one:")
for varname in c.variables.get_names(binvars_at_one):
print(" ", varname)
if __name__ == "__main__":
if len(sys.argv) != 2:
raise ValueError("usage: {0} <model>".format(sys.argv[0]))
main(sys.argv[1])
For more, see the documentation for Cplex.variables and Cplex.solution.
linearization example:
from docplex.mp.model import Model
mdl = Model(name='binaryproduct')
x = mdl.binary_var(name='x')
y = mdl.binary_var(name='y')
z = mdl.binary_var(name='z')
# z==x*y
mdl.add_constraint(x+y<=1+z, 'ct1')
mdl.add_constraint(z<=x, 'ct2')
mdl.add_constraint(z<=y, 'ct3')
#end of z==x*y
mdl.solve()
for v in mdl.iter_binary_vars():
print(v," = ",v.solution_value)

Calculating Incremental Entropy for Data that is not real numbers

I have a set of data for which has an ID, timestamp, and identifiers. I have to go through it, calculate the entropy and save some other links for the data. At each step more identifiers are added to the identifiers dictionary and I have to re-compute the entropy and append it. I have really large amount of data and the program gets stuck due to growing number of identifiers and their entropy calculation after each step. I read the following solution but it is about the data consisting of numbers.
Incremental entropy computation
I have copied two functions from this page and the incremental calculation of entropy gives different values than the classical full entropy calculation at every step.
Here is the code I have:
from math import log
# ---------------------------------------------------------------------#
# Functions copied from https://stackoverflow.com/questions/17104673/incremental-entropy-computation
# maps x to -x*log2(x) for x>0, and to 0 otherwise
h = lambda p: -p*log(p, 2) if p > 0 else 0
# entropy of union of two samples with entropies H1 and H2
def update(H1, S1, H2, S2):
S = S1+S2
return 1.0*H1*S1/S+h(1.0*S1/S)+1.0*H2*S2/S+h(1.0*S2/S)
# compute entropy using the classic equation
def entropy(L):
n = 1.0*sum(L)
return sum([h(x/n) for x in L])
# ---------------------------------------------------------------------#
# Below is the input data (Actually I read it from a csv file)
input_data = [["1","2008-01-06T02:13:38Z","foo,bar"], ["2","2008-01-06T02:12:13Z","bar,blup"], ["3","2008-01-06T02:13:55Z","foo,bar"],
["4","2008-01-06T02:12:28Z","foo,xy"], ["5","2008-01-06T02:12:44Z","foo,bar"], ["6","2008-01-06T02:13:00Z","foo,bar"],
["7","2008-01-06T02:13:00Z","x,y"]]
total_identifiers = {} # To store the occurrences of identifiers. Values shows the number of occurrences
all_entropies = [] # Classical way of calculating entropy at every step
updated_entropies = [] # Incremental way of calculating entropy at every step
for item in input_data:
temp = item[2].split(",")
identifiers_sum = sum(total_identifiers.values()) # Sum of all identifiers
old_entropy = 0 if all_entropies[-1:] == [] else all_entropies[-1] # Get previous entropy calculation
for identifier in temp:
S_new = len(temp) # sum of new samples
temp_dictionaty = {a:1 for a in temp} # Store current identifiers and their occurrence
if identifier not in total_identifiers:
total_identifiers[identifier] = 1
else:
total_identifiers[identifier] += 1
current_entropy = entropy(total_identifiers.values()) # Entropy for current set of identifiers
updated_entropy = update(old_entropy, identifiers_sum, current_entropy, S_new)
updated_entropies.append(updated_entropy)
entropy_value = entropy(total_identifiers.values()) # Classical entropy calculation for comparison. This step becomes too expensive with big data
all_entropies.append(entropy_value)
print(total_identifiers)
print('Sum of Total Identifiers: ', identifiers_sum) # Gives 12 while the sum is 14 ???
print("All Classical Entropies: ", all_entropies) # print for comparison
print("All Updated Entropies: ", updated_entropies)
The other issue is that when I print "Sum of total_identifiers", it gives 12 instead of 14! (Due to very large amount of data, I read the actual file line by line and write the results directly to the disk and do not store it in the memory apart from the dictionary of identifiers).
The code above uses Theorem 4; it seems to me that you want to use Theorem 5 instead (from the paper in the next paragraph).
Note, however, that if the number of identifiers is really the problem then the incremental approach below isn't going to work either---at some point the dictionaries are going to get too large.
Below you can find a proof-of-concept Python implementation that follows the description from Updating Formulas and Algorithms for Computing Entropy and Gini Index from Time-Changing Data Streams.
import collections
import math
import random
def log2(p):
return math.log(p, 2) if p > 0 else 0
CountChange = collections.namedtuple('CountChange', ('label', 'change'))
class EntropyHolder:
def __init__(self):
self.counts_ = collections.defaultdict(int)
self.entropy_ = 0
self.sum_ = 0
def update(self, count_changes):
r = sum([change for _, change in count_changes])
residual = self._compute_residual(count_changes)
self.entropy_ = self.sum_ * (self.entropy_ - log2(self.sum_ / (self.sum_ + r))) / (self.sum_ + r) - residual
self._update_counts(count_changes)
return self.entropy_
def _compute_residual(self, count_changes):
r = sum([change for _, change in count_changes])
residual = 0
for label, change in count_changes:
p_new = (self.counts_[label] + change) / (self.sum_ + r)
p_old = self.counts_[label] / (self.sum_ + r)
residual += p_new * log2(p_new) - p_old * log2(p_old)
return residual
def _update_counts(self, count_changes):
for label, change in count_changes:
self.sum_ += change
self.counts_[label] += change
def entropy(self):
return self.entropy_
def naive_entropy(counts):
s = sum(counts)
return sum([-(r/s) * log2(r/s) for r in counts])
if __name__ == '__main__':
print(naive_entropy([1, 1]))
print(naive_entropy([1, 1, 1, 1]))
entropy = EntropyHolder()
freq = collections.defaultdict(int)
for _ in range(100):
index = random.randint(0, 5)
entropy.update([CountChange(index, 1)])
freq[index] += 1
print(naive_entropy(freq.values()))
print(entropy.entropy())
Thanks #blazs for providing the entropy_holder class. That solves the problem. So the idea is to import entropy_holder.py from (https://gist.github.com/blazs/4fc78807a96976cc455f49fc0fb28738) and use it to store the previous entropy and update at every step when new identifiers come.
So the minimum working code would look like this:
import entropy_holder
input_data = [["1","2008-01-06T02:13:38Z","foo,bar"], ["2","2008-01-06T02:12:13Z","bar,blup"], ["3","2008-01-06T02:13:55Z","foo,bar"],
["4","2008-01-06T02:12:28Z","foo,xy"], ["5","2008-01-06T02:12:44Z","foo,bar"], ["6","2008-01-06T02:13:00Z","foo,bar"],
["7","2008-01-06T02:13:00Z","x,y"]]
entropy = entropy_holder.EntropyHolder() # This class will hold the current entropy and counts of identifiers
for item in input_data:
for identifier in item[2].split(","):
entropy.update([entropy_holder.CountChange(identifier, 1)])
print(entropy.entropy())
This entropy by using the Blaz's incremental formulas is very close to the entropy calculated the classical way and saves from iterating over all the data again and again.

Categories

Resources