Translating Python Code from gurobipy to PuLP in Python - python

I'm new to PuLP and LP in general. While translating the code meant for gurobipi library so it can be used with PuLP, I am stuck at the following gurobipy code which creates the variables.
# Create variables.
# x[i, j] is 1 if the edge i->j is on the optimal tour, and 0 otherwise.
x = {}
for i in range(n):
for j in range(i+1):
x[i,j] = m.addVar(obj=dist[i][j], vtype=GRB.BINARY,
name='x'+str(i)+'_'+str(j))
x[j,i] = x[i,j]
m.addVar allows the objective coefficient to be defined usig the obj parameter. How can the same be done in PuLP? It's docs for pulp.LpVariable does not seem to have a similar parameter...
Also, are there any example code for solving the TSP in Python using PuLP? That will help a lot as a reference!
Here's my code so far, without looking at subtours. The results of the decision variables x_ij seems to be very wrong, being equal to 1.0 only when i == j. Is my attempt correct so far?
Result
0_0: 1.0
0_5: 1.0
1_1: 1.0
1_15: 1.0
2_2: 1.0
2_39: 1.0
3_3: 1.0
3_26: 1.0
4_4: 1.0
4_42: 1.0
5_5: 1.0
5_33: 1.0
6_6: 1.0
6_31: 1.0
7_7: 1.0
7_38: 1.0
8_8: 1.0
8_24: 1.0
9_9: 1.0
9_26: 1.0
10_4: 1.0
10_10: 1.0
11_11: 1.0
11_12: 1.0
12_11: 1.0
12_12: 1.0
13_13: 1.0
13_17: 1.0
14_14: 1.0
14_18: 1.0
15_1: 1.0
15_15: 1.0
16_3: 1.0
16_16: 1.0
17_13: 1.0
17_17: 1.0
18_14: 1.0
18_18: 1.0
19_19: 1.0
19_20: 1.0
20_4: 1.0
20_20: 1.0
21_21: 1.0
21_25: 1.0
22_22: 1.0
22_27: 1.0
23_21: 1.0
23_23: 1.0
24_8: 1.0
24_24: 1.0
25_21: 1.0
25_25: 1.0
26_26: 1.0
26_43: 1.0
27_27: 1.0
27_38: 1.0
28_28: 1.0
28_47: 1.0
29_29: 1.0
29_31: 1.0
30_30: 1.0
30_34: 1.0
31_29: 1.0
31_31: 1.0
32_25: 1.0
32_32: 1.0
33_28: 1.0
33_33: 1.0
34_30: 1.0
34_34: 1.0
35_35: 1.0
35_42: 1.0
36_36: 1.0
36_47: 1.0
37_36: 1.0
37_37: 1.0
38_27: 1.0
38_38: 1.0
39_39: 1.0
39_44: 1.0
40_40: 1.0
40_43: 1.0
41_41: 1.0
41_45: 1.0
42_4: 1.0
42_42: 1.0
43_26: 1.0
43_43: 1.0
44_39: 1.0
44_44: 1.0
45_15: 1.0
45_45: 1.0
46_40: 1.0
46_46: 1.0
47_28: 1.0
47_47: 1.0
...
PuLP Code
def get_dist(tsp):
with open(tsp, 'rb') as tspfile:
r = csv.reader(tspfile, delimiter='\t')
d = [row for row in r]
d = d[1:] # skip header row
locs = set([r[0] for r in d]) | set([r[1] for r in d])
loc_map = {l:i for i, l in enumerate(locs)}
idx_map = {i:l for i, l in enumerate(locs)}
dist = [(loc_map[r[0]], loc_map[r[1]], r[2]) for r in d]
return dist, idx_map
def dist_from_coords(dist, n):
points = []
for i in range(n):
points.append([0] * n)
for i, j, v in dist:
points[i][j] = points[j][i] = float(v)
return points
def find_tour():
tsp_file = `/Users/test/` + 'my-waypoints-dist-dur.tsv'
coords, idx_map = get_dist(tsp_file)
n = len(idx_map)
dist = dist_from_coords(coords, n)
# Define the problem
m = pulp.LpProblem('TSP', pulp.LpMinimize)
# Create variables
# x[i,j] is 1 if edge i->j is on the optimal tour, and 0 otherwise
# Also forbid loops
x = {}
for i in range(n):
for j in range(n):
lowerBound = 0
upperBound = 1
# Forbid loops
if i == j:
upperBound = 0
print i,i
x[i,j] = pulp.LpVariable('x' + str(i) + '_' + str(j), lowerBound, upperBound, pulp.LpBinary)
x[j,i] = x[i,j]
# Define the objective function to minimize
m += pulp.lpSum([dist[i][j] * x[i,j] for i in range(n) for j in range(n)])
# Add degree-2 constraint
for i in range(n):
m += pulp.lpSum([x[i,j] for j in range(n)]) == 2
status = m.solve()
print pulp.LpStatus[status]
for i in range(n):
for j in range(n):
if pulp.value(x[i,j]) >0:
print str(i) + '_' + str(j) + ': ' + str( pulp.value(x[i,j]) )
find_tour()
my-waypoints-dist-dur.tsv (Full version)
waypoint1 waypoint2 distance_m duration_s
Michigan State Capitol, Lansing, MI 48933 Rhode Island State House, 82 Smith Street, Providence, RI 02903 1242190 41580
Minnesota State Capitol, St Paul, MN 55155 New Mexico State Capitol, Santa Fe, NM 87501 1931932 64455

While creating the variables:
Changed the name of the variable to be slightly more pythonic in formatting
x[j,i] = x[i,j] is incorrect. This is the Python concept of references. All objects in Python have a reference and when you assign one variable to two names x[i,j] and x[j,i], this results in both of them pointing to the same object. If you modify x[i,j] in your formulation, x[j,i] will also change.
In terms of the Traveling Salesperson Problem, that means if you go from A-->B (ie x[A][B] == 1, then you also travel from B-->A. Which is why you're getting endless 1.0 values in your path variables.
Corrected variable definition then becomes:
x[i,j] = pulp.LpVariable('x_%s_%s'%(i,j), lowerBound, upperBound, pulp.LpBinary)
x[j,i] = pulp.LpVariable('x_%s_%s'%(j,i), lowerBound, upperBound, pulp.LpBinary)

Related

String formatting in general format

Exercise 7.3 from Think Python 2nd Edition:
To test the square root algorithm in this chapter, you could compare it with
math.sqrt. Write a function named test_square_root that prints a table like this:
1.0 1.0 1.0 0.0
2.0 1.41421356237 1.41421356237 2.22044604925e-16
3.0 1.73205080757 1.73205080757 0.0
4.0 2.0 2.0 0.0
5.0 2.2360679775 2.2360679775 0.0
6.0 2.44948974278 2.44948974278 0.0
7.0 2.64575131106 2.64575131106 0.0
8.0 2.82842712475 2.82842712475 4.4408920985e-16
9.0 3.0 3.0 0.0
The first column is a number, a; the second column is the square root of a computed with the function from Section 7.5; the third column is the square root computed by math.sqrt; the fourth column is the absolute value of the difference between the two estimates.
It took me a while to get to this point:
import math
def square_root(a):
x = a / 2
epsilon = 0.0000001
while True:
y = (x + a/x) / 2
if abs(y-x) < epsilon:
break
x = y
return y
def last_digit(number):
rounded = '{:.11f}'.format(number)
dig = str(rounded)[-1]
return dig
def test_square_root():
for a in range(1, 10):
if square_root(a) - int(square_root(a)) < .001:
f = 1
s = 13
elif last_digit(math.sqrt(a)) == '0':
f = 10
s = 13
else:
f = 11
s = 13
print('{0:.1f} {1:<{5}.{4}f} {2:<{5}.{4}f} {3}'.format(a, square_root(a), math.sqrt(a), abs(square_root(a)-math.sqrt(a)), f, s))
test_square_root()
That's my current output:
1.0 1.0 1.0 1.1102230246251565e-15
2.0 1.41421356237 1.41421356237 2.220446049250313e-16
3.0 1.73205080757 1.73205080757 0.0
4.0 2.0 2.0 0.0
5.0 2.2360679775 2.2360679775 0.0
6.0 2.44948974278 2.44948974278 8.881784197001252e-16
7.0 2.64575131106 2.64575131106 0.0
8.0 2.82842712475 2.82842712475 4.440892098500626e-16
9.0 3.0 3.0 0.0
I'm more focused now on achieving the right output, then I'll perfect the code itself, so here are my main problems:
Format the last column (I used {:.12g} once, but then the '0.0' turned to be only '0', so, what should I do?)
Fix the values of the last column. As you can see, there should be only two numbers greater than 0 (when a = 2 and 8), but there are two more (when a = 6 and 1), I printed them alone to see what was going on and the results were the same, I can't understand it.
Thanks for your help! :)

Printing outcomes for rolling 4 dice 10,000 times

The program I'm writing simulates rolling 4 dice and adds the result from each together into a "Total" column. I'm trying to print the outcomes for 10,000 dice rolls but for some reason the value of each dice drops to 0.0 somewhere in the program and it continues like this until the end. Could anyone tell me what's going wrong here and how to fix it? Thanks :)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(101)
four_dice = np.zeros([pow(10,4),5]) # 10,000 rows, 5 columns
n = 0
outcomes = [1,2,3,4,5,6]
for i in outcomes:
for j in outcomes:
for k in outcomes:
for l in outcomes:
four_dice[n,:] = [i,j,k,l,i+j+k+l]
n +=1
four_dice_df = pd.DataFrame(four_dice,columns=('1','2','3','4','Total'))
print(four_dice_df) #print the table
OUTPUT
1 2 3 4 Total
0 1.0 1.0 1.0 1.0 4.0
1 1.0 1.0 1.0 2.0 5.0
2 1.0 1.0 1.0 3.0 6.0
3 1.0 1.0 1.0 4.0 7.0
4 1.0 1.0 1.0 5.0 8.0
... ... ... ... ... ...
9995 0.0 0.0 0.0 0.0 0.0
9996 0.0 0.0 0.0 0.0 0.0
9997 0.0 0.0 0.0 0.0 0.0
9998 0.0 0.0 0.0 0.0 0.0
9999 0.0 0.0 0.0 0.0 0.0
[10000 rows x 5 columns]
Does this work for what you want?
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1,7,size=(10000,4)),columns = [1,2,3,4])
df['total'] = df.sum(axis=1)
You ran out of dice combinations. You made your table 10^4 rows long, but there are only 6^4 combinations. Any row from 1296 through 9999 will be 0, because that's the initialized value.
To fix this, cut your table at the proper value: pow(6, 4)
Response to OP comment:
Of course you can write a loop. In this case, the controlling factor should be the number of results you want. Then you generate permutations to fulfill your needs. The Pythonic way to do this is to use the itertools package: permutations will give you the rolls in order; cycle will repeat the sequence until you quit asking.
However, the more obvious way for your current programming is perhaps to simply count in base 6:
digits = [1, 1, 1, 1, 1]
for i in range(10000):
# Record your digits in the data frame
...
# Add one for the next iteration; roll over if the die is already 6
for idx, die in enumerate(digits):
if die < 6:
digits[idx] += 1
break
else: # Reset die to 1 and continue to next die
digits[idx] = 1
This will increment the dice, left to right, until you either have one that doesn't need a reset to 1, or run out of dice.
Another possibility is to copy any of the many base-conversion functions available on line. Convert your iteration counter i to base 6, take the lowest 4 digits (quantity of dice), and add 1 to each digit.

Missing values replaced with average of its neighbors (timeseries)

I want all missing values from dataset to replace with average of two nearest neighbors. Except of first and last cells and when neighbors are 0 (then I manually fix values). I coded this and it works, but the solution is not very smart. Is is another way to do it faster? Interpolate method is suitable for that? I'm not quite sure how does it work.
Input:
0 1 2 3 4 5
0 0.0 1596.0 1578.0 1567.0 1580.0 1649.0
1 1554.0 1506.0 0.0 1466.0 1469.0 1503.0
2 1588.0 1510.0 1495.0 1485.0 1489.0 0.0
3 1592.0 0.0 0.0 1571.0 1647.0 0.0
Output:
0 1 2 3 4 5
0 0.0 1596.0 1578.0 1567.0 1580.0 1649.0
1 1554.0 1506.0 1486.0 1466.0 1469.0 1503.0
2 1588.0 1510.0 1495.0 1485.0 1489.0 1540.5
3 1592.0 0.0 0.0 1571.0 1647.0 0.0
Code:
data_len = len(df)
first_col = str(df.columns[0])
last_col = str(df.columns[len(df.columns) - 1])
d = df.apply(lambda s: pd.to_numeric(s, errors="coerce"))
m = d.eq(0) | d.isna()
s = m.stack()
list = s[s].index.tolist() #list of indeces of missing values
count = len(list)
for el in list:
if (el == ('0', first_col) or el == (str(data_len - 1), last_col)):
continue
next = df.at[str(int(el[0]) + 1), first_col] if el[1] == last_col else df.at[el[0], str(int(el[1]) + 1)]
prev = df.at[str(int(el[0]) - 1), last_col] if el[1] == first_col else df.at[el[0], str(int(el[1]) - 1)]
if prev == 0 or next == 0:
continue
df.at[el[0],el[1]] = (prev + next)/2
JSON of example:
{"0":{"0":0.0,"1":1554.0,"2":1588.0,"3":0.0},"1":{"0":1596.0,"1":1506.0,"2":1510.0,"3":0.0},"2":{"0":1578.0,"1":0.0,"2":1495.0,"3":1561.0},"3":{"0":1567.0,"1":1466.0,"2":1485.0,"3":1571.0},"4":{"0":1580.0,"1":1469.0,"2":1489.0,"3":1647.0},"5":{"0":1649.0,"1":1503.0,"2":0.0,"3":0.0}}
Here's one approach using shift to average the neighbour's values and slice assigning back to the dataframe:
m = df==0
r = (df.shift(axis=1)+df.shift(-1,axis=1))/2
df.iloc[1:-1,1:-1] = df.mask(m,r)
print(df)
0 1 2 3 4 5
0 0.0 1596.0 1578.0 1567.0 1580.0 1649.0
1 1554.0 1506.0 1486.0 1466.0 1469.0 1503.0
2 1588.0 1510.0 1495.0 1485.0 1489.0 0.0
3 0.0 0.0 1561.0 1571.0 1647.0 0.0

is there an alternative to using mod "%" in python

I am trying to cycle through a list of numbers (mostly decimals), but I want to return both 0.0 and the max number.
for example
maxNum = 3.0
steps = 5
increment = 0
time = 10
while increment < time:
print increment * (maxNum / steps)% maxNum
increment+=1
#
I am getting this as an output
0.0
0.6
1.2
1.8
2.4
0.0
but I want 3.0 as the largest number and to start back at 0.0 I.E.
0.0
0.6
1.2
1.8
2.4
3.0
0.0
Note, I have to avoid logical loops for the calculation part.
You could create the numbers that you want then use itertools.cycle to cycle through them:
import itertools
nums = itertools.cycle(0.6*i for i in range(6))
for t in range(10):
print(next(nums))
Output:
0.0
0.6
1.2
1.7999999999999998
2.4
3.0
0.0
0.6
1.2
1.7999999999999998
Only small change did the trick:
maxNum = 3.0
steps = 5
i = 0
times = 10
step = maxNum / steps
while (i < times):
print(step * (i % (steps + 1)))
i += 1
0.0
0.6
1.2
1.7999999999999998
2.4
3.0
0.0
0.6
1.2
1.7999999999999998
You could make a if statement that looks ahead if the next printed number is 0.0 then print the maxNum
maxNum = 3.0
steps = 5
increment = 0
time = 10
while increment < time:
print(round(increment * (maxNum / steps)% maxNum, 2))
increment+=1
if (round(increment * (maxNum / steps)% maxNum, 2)) == 0.0:
print(maxNum)
0.0
0.6
1.2
1.8
2.4
3.0
0.0
0.6
1.2
1.8
2.4
3.0

Why is my program stopping when doing a seemingly infinite loop?

This must be really obvious but I am currently doing a little tutorial that features this code snippet:
n=0
a=1
while a>0:
n=n+1
a=(1.0+2.0**(-n))-1.0
print (n)
And I've tried to run it but it keeps getting stuck at n=53. Why? I just assumed that while would always be true ...
If you change the last line to print(n, a) you can see what's happening more clearly:
n = 0
a = 1
while a > 0:
n = n + 1
a = (1.0 + 2.0 ** (-n)) - 1.0
print(n, a)
Output:
1 0.5
2 0.25
3 0.125
4 0.0625
# ...
50 8.881784197001252e-16
51 4.440892098500626e-16
52 2.220446049250313e-16
53 0.0
As you can see, a is half the size each time through the loop. Eventually, 2.0 ** (-n) is so small that floating point math (which has limited precision) is unable to tell the difference between 1.0 and 1.0 + 2.0 ** (-n):
>>> 1.0 + 2.0 ** -51
1.0000000000000004
>>> 1.0 + 2.0 ** -52
1.0000000000000002
>>> 1.0 + 2.0 ** -53
1.0
… and when that happens, subtracting 1.0 from 1.0 gives you 0.0, and the while loop terminates.

Categories

Resources