Bin packing Python query with variable bin cost and sizes - python

I'm currently working on a problem that requires a variation of the bin packing problem. My problem is different in that the number of bins is finite. I have three bins, the smallest one costs the least to put an object into, the medium bin is slightly more expensive than the small box, and the third box has theoretically unlimited capacity but is prohibitively more expensive to place an item into.
I was able to find a Python script online that solves the bin problem in a similar manner. My question is how can I rewrite the script to get closer to my original problem? The script in question uses identical bins.
I've included some lines at the very bottom to discuss how I would prefer the bin to look. Furthermore, is there a way to set up separate constraints for each bin? Thanks for all the help!
from openopt import *
N = 30 #Length of loan dataset
items = []
for i in range(N):
small_vm = {
'name': 'small%d' % i,
'cpu': 2,
'mem': 2048,
'disk': 20,
'n': 1
}
med_vm = {
'name': 'medium%d' % i,
'cpu': 4,
'mem': 4096,
'disk': 40,
'n': 1
}
large_vm = {
'name': 'large%d' % i,
'cpu': 8,
'mem': 8192,
'disk': 80,
'n': 1
}
items.append(small_vm)
items.append(med_vm)
items.append(large_vm)
bins = {
'cpu': 48*4, # 4.0 overcommit with cpu
'mem': 240000,
'disk': 2000,
}
p = BPP(items, bins, goal = 'min')
r = p.solve('glpk', iprint = 0)
print(r.xf)
print(r.values) # per each bin
print "total vms is " + str(len(items))
print "servers used is " + str(len(r.xf))
for i,s in enumerate(r.xf):
print "server " + str(i) + " has " + str(len(s)) + " vms"
##OP Interjection: Ideally my bins would look something like:
bin1 = {
'size': 10000,
'cost': 0.01*item_weight,
}
bin2 = {
'size': 20000,
'cost': 0.02*item_weight,
}
bin3 = {
'size': 100000,
'cost': 0.3*item_weight,
}

The variant of the bin packing problem with variable bin sizes you are describing is at least np-hard.
I do not know the package openopt, the project website seems to be down. Openopt seems to use GLPK to solve the problem as a mixed-integer program. You do not have direct access to the model formulation, since BPP() is an abstraction. You may need to modify the openopt package to add constraints for the individual bins.
It is generally easy to add the variable bin sizes as a constraint, extending this formulation you would need to add index i to capacity V, so that each bin has a different capacity.
I would recommend to look at some maintained libraries to model and solve this problem: There is the library PuLP, CyLP and SCIP. The latter is not free for commercial use I think though.
Since bin packing is a very common problem, I found an example for the PuLP library. It uses the CoinOR Solver by default I think, you can also use different commercial ones.
easy_install pulp
After installing PuLP, which should be possible with easy_install you can extend on this example.
I modified the example according to your problem:
from pulp import *
items = [("a", 5), ("b", 6), ("c", 7)]
itemCount = len(items)
maxBins = 3
binCapacity = [11, 15, 10]
binCost = [10, 30, 20]
y = pulp.LpVariable.dicts('BinUsed', range(maxBins), lowBound = 0, upBound = 1, cat = pulp.LpInteger)
possible_ItemInBin = [(itemTuple[0], binNum) for itemTuple in items for binNum in range(maxBins)]
x = pulp.LpVariable.dicts('itemInBin', possible_ItemInBin, lowBound = 0, upBound = 1, cat = pulp.LpInteger)
# Model formulation
prob = LpProblem("Bin Packing Problem", LpMinimize)
# Objective
prob += lpSum([binCost[i] * y[i] for i in range(maxBins)])
# Constraints
for j in items:
prob += lpSum([x[(j[0], i)] for i in range(maxBins)]) == 1
for i in range(maxBins):
prob += lpSum([items[j][1] * x[(items[j][0], i)] for j in range(itemCount)]) <= binCapacity[i]*y[i]
prob.solve()
print("Bins used: " + str(sum(([y[i].value() for i in range(maxBins)]))))
for i in x.keys():
if x[i].value() == 1:
print("Item {} is packed in bin {}.".format(*i))
This implementation has the strong advantage that you have complete control over your model formulation and you are not restricted by some layer of abstraction like BPP() in the case of openopt.

Related

Uncapacited Facility location covering

I am trying to solve this problem using pulp :
This is my code, There is a problem, because the result should be to only keep the second Location :
# Import PuLP modeler functions
from pulp import *
# Set of locations J
Locations = ["A", "B","C"]
# Set of demands I
Demands = ["1", "2", "3", "4", "5"]
# Set of distances ij
dt = [ # Demands I
# 1 2 3 4 5
[2, 23, 30, 54, 1], # A Locations J
[3, 1, 2, 2, 3], # B
[50,65,80,90,100] # C distances are very long
]
# Max value to get covered
s = 5
# Theses binaries values should be generated by code from the dt array ... I write it down directly for simplification.
# Demand I is served by location J If distance is <= 5 ( 0 = KO , 1 = OK)
covered = [
[1,0,0,0,1],
[1,1,1,1,1] # This shows that we only need Location B , not A
[0,0,0,0,0] # This shows we can't use Location C, it's too far
]
# Creates the 'prob' variable to contain the problem data
prob = LpProblem("Set covering", LpMinimize)
# # Problem variables
J = LpVariable.dicts("location", Locations, cat='Binary')
# The distance data is made into a dictionary
distances = makeDict([Locations, Demands], covered, 0)
# The objective function
# Minimize J, which is the number of locations
prob += lpSum(J["A"]+J["B"]+J["C"])
# The constraint
# Is it covered or not ?
for w in Locations:
for b in Demands:
if(distances[w][b] > 0):
prob += int(distances[w][b]) * J[w] >= 1
# Or eventually this instead :
#for w in Locations:
# prob += (lpSum([distances[w][b] * J[w] for b in Demands]) >= 1)
# or that :
# prob += 1 * J["A"] >= 1
# prob += 1 * J["A"] >= 1
# prob += 1 * J["B"] >= 1
# prob += 1 * J["B"] >= 1
# prob += 1 * J["B"] >= 1
# prob += 1 * J["B"] >= 1
# prob += 1 * J["B"] >= 1
# The problem data is written to an .lp file
prob.writeLP("SetCovering.lp")
# The problem is solved using PuLP's choice of Solver
prob.solve()
# The status of the solution is printed to the screen
print("Status:", LpStatus[prob.status])
# Each of the variables is printed with it's resolved optimum value
for v in prob.variables():
print(v.name, "=", v.varValue)
# The optimised objective function value is printed to the screen
print("Total Locations = ", value(prob.objective))
# Show constraints
constraints = prob.constraints
print(constraints)
#Status: Optimal
#location_A = 1.0
#location_B = 1.0
#location_C = 0.0
#Total Locations = 2.0
The result should be :
location_A = 0.0
location_B = 1.0
location_C = 0.0
because location B covers all of our needs.
I wonder where is the problem, there is the maths code , I hope I wrote enough:
Thanks , it's nice if you have a solution, I have also tried lpSum with no luck
Edit : Modified the code a few, you can see 'optimal solution', but It's not the solution I want + Added a "Location_C"
EDIT : This is my new code, added a secondary continuous pulp dict for arcs(links) generation (ser_customer) . The solver should only pick Fac-2 in this case, because it's near all of the customers, and other facilities are way too far:
# Lists (sets / Array) of Customers and Facilities
Customer = [1,2,3,4,5]
Facility = ['Fac-1', 'Fac-2', 'Fac-3']
# Dictionary of distances in kms
distance = {'Fac-1' : {1 : 54, 2 : 76, 3 : 5, 4 : 76, 5 : 76},
'Fac-2' : {1 : 1, 2 : 3, 3 : 1, 4 : 8, 5 : 1},
'Fac-3' : {1 : 45, 2 : 23, 3 : 54, 4 : 87, 5 : 88}
}
# Setting the Problem
prob = LpProblem("pb", LpMinimize)
# Defining our Decision Variables
use_facility = LpVariable.dicts("Use Facility", Facility, 0, 1, LpBinary)
ser_customer = LpVariable.dicts("Service", [(i,j) for i in Customer for j in Facility], 0)
# Setting the Objective Function = Minimize amount of facilities and arcs
prob += lpSum(use_facility['Fac-1']+use_facility['Fac-2']+use_facility['Fac-3']) + lpSum(distance[j][i]*ser_customer[(i,j)] for j in Facility for i in Customer)
# Constraints,At least 1 arc must exist between facilities and customers
for i in Customer:
prob += lpSum(ser_customer[(i,j)] for j in Facility) >= 1
prob.solve()
# Print the solution of Decision Variables
for v in prob.variables():
print(v.name, "=", v.varValue)
# Print the solution of Binary Decision Variables
Tolerance = 0.0001
for j in Facility:
if use_facility[j].varValue > Tolerance:
print("Establish Facility at site = ", j)
The result seems to show good arcs(links), but there is no facility selection, I wonder if somebody have any idea, is there any way to force use_facility[index] to be > 0 , Is adding arcs decisions variables a good idea ? I have tried to moove the arcs as a constraint too instead of being into the objective function, with no luck. :
Service_(1,_'Fac_1') = 0.0
Service_(1,_'Fac_2') = 1.0
Service_(1,_'Fac_3') = 0.0
Service_(2,_'Fac_1') = 0.0
Service_(2,_'Fac_2') = 1.0
Service_(2,_'Fac_3') = 0.0
Service_(3,_'Fac_1') = 0.0
Service_(3,_'Fac_2') = 1.0
Service_(3,_'Fac_3') = 0.0
Service_(4,_'Fac_1') = 0.0
Service_(4,_'Fac_2') = 1.0
Service_(4,_'Fac_3') = 0.0
Service_(5,_'Fac_1') = 0.0
Service_(5,_'Fac_2') = 1.0
Service_(5,_'Fac_3') = 0.0
Use_Facility_Fac_1 = 0.0
Use_Facility_Fac_2 = 0.0
Use_Facility_Fac_3 = 0.0
I also have tried the AirSquid solution, ,I think I maybe miss sources decisions variables who should be minimized but don' t know how to do, I guess covered are arcs (links), anyway It is a good exercise, harder than a simple product mix, hi hi :
prob = LpProblem('source minimzer', LpMinimize)
dist_limit = 5
sources = ['A', 'B','C'] # the source locations
# note this is zero-indexed to work with the list indexes in dist dictionary...
destinations = list(range(5)) # the demand locations 0, 1, 2, 3, 4
dist = { 'A': [2, 23, 30, 54, 1],
'B': [3, 1, 2, 2, 3],
'C':[24,54,12,56,76]}
covered = LpVariable.dicts('covered', [(s, d) for s in sources for d in destinations], cat='Binary')
# The objective function
# Minimize the number of sources
prob += lpSum(covered[s, d])
# set up constraint to limit covered if the destination is "reachable"
for s in sources:
for d in destinations:
prob += covered[s, d] * dist[s][d] <= dist_limit
# add one more constraint to make sure that every destination is "covered"...
# The problem is solved using PuLP's choice of Solver
prob.solve()
# The status of the solution is printed to the screen
print("Status:", LpStatus[prob.status])
# The optimised objective function value is printed to the screen
print("Location Selection = ", prob.objective)
The solution displayed, while it should print "B" :
Status: Optimal
Total Locations = covered_('C',_4)
You are on the right track! A couple things will help...
First, you overlooked a key piece of information in your output in that the solver says your formulation is INFEASIBLE!
Status: Infeasible
So whatever came out in the variables is gibberish and you must figure that part out first.
So, why is it infeasible? Take a look at your constraint. You are trying to force the impossible if your distance value is zero this cannot be true:
prob += int(distances[w][b]) * J[w] >= 1
So, you need to reformulate! You are missing a concept here. You actually need 2 constraints for this problem.
You need to constrain the selection of a source-destination if the route is too long
You need to enforce that every destination is covered.
You also need a double-indexed decision variable. Why? Well, lets say that source 'A' covers destination 1, 2; and 'B' covers 2, 3, 4, 5.... You will be able to know that all the destinations are "covered" with one variable, but you will not know which sources were used, so you need to keep track of both to get the full picture.
Here is a start, along with a couple edits. I'd suggest the variable names source and destination as that is kinda standard. You do not have a specific demand in this particular problem, just the need for a connection. You might also want to use dictionaries more than nested lists, I think it is clearer. Below is an example start with the first constraint. Note the trick here in limiting the covered variable. If the distance is less than the limit, s, then this constraint is satisfiable. For instance, if the distance is 3:
3 * 1 <= s
Anyhow, here is a recommended start. The other constraint is not implemented. You will need to sum across all the sources to ensure the destination is "covered". Comment back if your are stuck.
prob = LpProblem('source minimzer', LpMinimize)
dist_limit = 5
sources = ['A', 'B'] # the source locations
# note this is zero-indexed to work with the list indexes in dist dictionary...
destinations = list(range(5)) # the demand locations 0, 1, 2, 3, 4
dist = { 'A': [2, 23, 30, 54, 1],
'B': [3, 1, 2, 2, 3]}
covered = LpVariable.dicts('covered', [(s, d) for s in sources for d in destinations], cat='Binary')
# set up constraint to limit covered if the destination is "reachable"
for s in sources:
for d in destinations:
prob += covered[s, d] * dist[s][d] <= dist_limit
# add one more constraint to make sure that every destination is "covered"...

Absolute value formulation for an optimization problem with PuLP

I shared below a simplified version of the problem I'm trying to solve. There must be something wrong in my formulation, perhaps regarding the decision variables. All of the flow is sent to Destination1 by the model, but I am attempting to build a model that would evenly distribute the flow. When I force Destination2 to receive flow with an additional constraint, the objective value improves, so I'm not sure why such a solution is not found instead of the less optimal.
I appreciate your thoughts and am happy to answer any questions about this model.
Warehouses = ["A","B","C","D"]
origin_supply = {"A": 53, "B": 62, "C": 45, "D": 65}
Destinations = ['Destination1','Destination2']
Routes = [(o,d) for o in origin_supply for d in destinations]
model = LpProblem("Testing-absolute-value-objective", LpMinimize)
supply = [53,62,45,65]
destination_mean = sum(supply) / len(destinations)
# decision variables
route_vars = LpVariable.dicts("Route",(Warehouses,Destinations),cat = "Integer", lowBound = 0)
sum_for_diff = LpVariable.dicts("sum",(Destinations),cat = "Continuous")
sum_for_diff_abs = LpVariable.dicts("sum_abs",(Destinations),cat = "Continuous", lowBound = 0)
# objective function is to minimize the absolute value of the difference supplied to the two destinations
obj_func = lpSum(sum_for_diff_abs)
# constraints
# absolute value constraints for the difference
for d in destinations:
model += sum_for_diff_abs[d] >= sum_for_diff[d]
model += sum_for_diff_abs[d] >= -sum_for_diff[d]
# The supply constraints (in this case all supply must be sent)
for w in Warehouses:
model += lpSum([route_vars[w][d] for d in Destinations]) == origin_supply[w]
# calculate the difference from the average amount sent to each destination
# the reasoning is that in the full model there will be many destinations, so this logic could scale
for d in Destinations:
model += sum_for_diff[d] == lpSum( route_vars[w][d] for w in Warehouses) - destination_mean
model.solve()
print(LpStatus[model.status])
print(pulp.value(obj_func))
for v in model.variables():
print (v.name + " = " + str(v.varValue))
You are not setting the objective function.
This line
obj_func = lpSum(sum_for_diff_abs)
should be
model+= lpSum(sum_for_diff_abs)

Linear programming solution for minimum number of resources

I am trying to solve this problem using linear programming using Pulp in python.
We have mango packs with each having different number of mangoes.
We should be able to serve the demand using the minimum number of packets and if possible serve the whole bag.
# Packet Names and the count of mangoes in each packet.
mangoe_packs = {
"pack_1": 2,
"pack_2": 3,
"pack_3": 3,
"pack_4": 2
}
For example,
Based on the demand we should get the correct packets. Ie., if the demand is 2, we give the packet with 2 mangoes. If the demand is 5, we serve packets with 2 and 3 mangoes. If your demand is 2 and we don't have any packet with 2 mangoes we can serve packet with 3 mangoes. In this case, we will have one remnant mango. Our purpose is to have the least number of remnant mangoes while serving the demand.
# Packet Names and the count of mangoes in each packet.
mangoe_packs = {
"pack_1": 2,
"pack_2": 3,
"pack_3": 3,
"pack_4": 2
}
Based on the data provided above,
If the demand is 2, The solution is pack_2 (can be pack_4 also).
If the demand is 4, The solution is pack_2 + pack_4.
If the demand is 5, The solution is pack_1 + pack_2
I am new to Linear programming and stuck at the problem. Tried few solutions and they are not working.
I am unable to come up with the correct objective function and constraints to solve this problem. Need help with that. Thank you.
Here is the code I tried.
from pulp import *
prob = LpProblem("MangoPacks", LpMinimize)
# Number of Mangoes in each packet.
mangoe_packs = {
"pack_1": 2,
"pack_2": 3,
"pack_3": 3,
"pack_4": 2
}
# Define demand variable.
demand = LpVariable("Demand", lowBound=2, HighBound=2, cat="Integer")
pack_count = LpVariable.dicts("Packet Count",
((i, j) for i in mangoe_packs.values() for j in ingredients),
lowBound=0,
cat='Integer')
pulp += (
lpSum([
pack_count[(pack)]
for pack, mango_count in mangoe_packs.iteritems()])
)
pulp += lpSum([j], for pack, j in mangoe_packs.iteritems()]) == 350 * 0.05
status = prob.solve()
Thank you.
Here are some considerations:
The variables of the problem are whether or not a pack should be opened. These variables are thus either 0 or 1 (keep closed, or open).
The main objective of the problem is to minimise the number of remnant mangoes. Or otherwise put: to minimise the total number of mangoes that are in the opened packs. This is the sum of the values of the input dictionary, but only of those entries where the corresponding LP variable is 1. Of course, a multiplication (with 0 or 1) can be used here.
In case of a tie, the number of opened packs should be minimised. This is simply the sum of the above mentioned variables. In order to combine this into one, single objective, multiply the value of the first objective with the total number of packets and add the value of this second objective to it. That way you get the right order in competing solutions.
The only constraint is that the sum of the number of mangoes in the opened packs is at least the number given in the input.
So here is an implementation:
def optimise(mango_packs, mango_count):
pack_names = list(mango_packs.keys())
prob = LpProblem("MangoPacks", LpMinimize)
# variables: names of the mango packs. We can either open them or not (0/1)
lp_vars = LpVariable.dicts("Open", pack_names, 0, 1, "Integer")
# objective: minimise total count of mangoes in the selected packs (so to
# minimise remnants). In case of a tie, minimise the number of opened packs.
prob += (
lpSum([mango_packs[name]*lp_vars[name] for name in pack_names]) * len(mango_packs)
+ lpSum([lp_vars[name] for name in pack_names])
)
# constraint: the opened packs need to amount to a minimum number of mangoes
prob += lpSum([mango_packs[name]*lp_vars[name] for name in pack_names]) >= mango_count
prob.solve()
In order to visualise the result, you could add the following in the above function:
print("Status:", LpStatus[prob.status])
# Each of the variables is printed with it's resolved optimum value
for i, v in enumerate(prob.variables()):
print("{}? {}".format(v.name, ("no","yes")[int(v.varValue)]))
Call the function like this:
# Packet Names and the count of mangoes in each packet.
mango_packs = {
"pack_1": 10,
"pack_2": 2,
"pack_3": 2,
"pack_4": 2
}
optimise(mango_packs, 5)
Output (when you added those print statements)
Status: Optimal
Open_pack_1? no
Open_pack_2? yes
Open_pack_3? yes
Open_pack_4? yes
See it run here -- give it some time to temporarily install the pulp module.
Here is a simple model that minimizes the total number of remnant mangoes. Instead of specifying the exact packages available the model just specifies the number of packages available per size (here 5 of size 2 and 15 of size 4):
from pulp import *
# PROBLEM DATA:
demand = [3, 7, 2, 5, 9, 3, 2, 4, 7, 5] # demand per order
packages = [0, 5, 0, 15] # available packages of different sizes
O = range(len(demand))
P = range(len(packages))
# DECLARE PROBLEM OBJECT:
prob = LpProblem('Mango delivery', LpMinimize)
# VARIABLES
assigned = pulp.LpVariable.dicts('assigned',
((o, p) for o in O for p in P), 0, max(demand), cat='Integer') # number of packages of different sizes per order
supply = LpVariable.dicts('supply', O, 0, max(demand), cat='Integer') # supply per order
remnant = LpVariable.dicts('remnant', O, 0, len(packages)-1, cat='Integer') # extra delivery per order
# OBJECTIVE
prob += lpSum(remnant) # minimize the total extra delivery
# CONSTRAINTS
for o in O:
prob += supply[o] == lpSum([p*assigned[(o, p)] for p in P])
prob += remnant[o] == supply[o] - demand[o]
for p in P:
# don't use more packages than available
prob += packages[p] >= lpSum([assigned[(o, p)] for o in O])
# SOLVE & PRINT RESULTS
prob.solve()
print(LpStatus[prob.status])
print('obj = ' + str(value(prob.objective)))
print('#remnants = ' + str(sum(int(remnant[o].varValue) for o in O)))
print('demand = ' + str(demand))
print('supply = ' + str([int(supply[o].varValue) for o in O]))
print('remnant = ' + str([int(remnant[o].varValue) for o in O]))
If the demand cannot be fulfilled this model will be infeasible. Another option in this case would be to maximize the number of orders fulfilled with a penalty for remnant mangoes. Here is the adapted model:
from pulp import *
# PROBLEM DATA:
demand = [3, 7, 2, 5, 9, 3, 2, 4, 7, 5] # demand per order
packages = [0, 5, 0, 5] # available packages of different sizes
O = range(len(demand))
P = range(len(packages))
M = max(demand) # a big enough number
# DECLARE PROBLEM OBJECT:
prob = LpProblem('Mango delivery', LpMaximize)
# VARIABLES
assigned = pulp.LpVariable.dicts('assigned',
((o, p) for o in O for p in P), 0, max(demand), cat='Integer') # number of packages of different sizes per order
supply = LpVariable.dicts('supply', O, 0, max(demand), cat='Integer') # supply per order
remnant = LpVariable.dicts('remnant', O, 0, len(packages)-1, cat='Integer') # extra delivery per order
served = LpVariable.dicts('served', O, cat='Binary') # whether an order is served
diff = LpVariable.dicts('diff', O, -M, len(packages)-1, cat='Integer') # difference between demand and supply
# OBJECTIVE
# primary objective is serve orders, secondary to minimize remnants
prob += 100*lpSum(served) - lpSum(remnant) # maximize served orders with a penalty for remnants
# CONSTRAINTS
for o in O:
prob += supply[o] == lpSum([p*assigned[(o, p)] for p in P])
prob += diff[o] == supply[o] - demand[o]
for p in P:
# don't use more packages than available
prob += packages[p] >= lpSum([assigned[(o, p)] for o in O])
for o in O:
# an order is served if supply >= demand
# formulation adapted from https://cs.stackexchange.com/questions/69531/greater-than-condition-in-integer-linear-program-with-a-binary-variable
prob += M*served[o] >= diff[o] + 1
prob += M*(served[o]-1) <= diff[o]
prob += lpSum([assigned[(o, p)] for p in P]) <= M*served[o]
for o in O:
# if order is served then remnant is supply - demand
# otherwise remnant is zero
prob += remnant[o] >= diff[o]
prob += remnant[o] <= diff[o] + M*(1-served[o])
# SOLVE & PRINT RESULTS
prob.solve()
print(LpStatus[prob.status])
print('obj = ' + str(value(prob.objective)))
print('#served = ' + str(sum(int(served[o].varValue) for o in O)))
print('#remnants = ' + str(sum(int(remnant[o].varValue) for o in O)))
print('served = ' + str([int(served[o].varValue) for o in O]))
print('demand = ' + str(demand))
print('supply = ' + str([int(supply[o].varValue) for o in O]))
print('remnant = ' + str([int(remnant[o].varValue) for o in O]))

Fill order from smaller packages?

The input is an integer that specifies the amount to be ordered.
There are predefined package sizes that have to be used to create that order.
e.g.
Packs
3 for $5
5 for $9
9 for $16
for an input order 13 the output should be:
2x5 + 1x3
So far I've the following approach:
remaining_order = 13
package_numbers = [9,5,3]
required_packages = []
while remaining_order > 0:
found = False
for pack_num in package_numbers:
if pack_num <= remaining_order:
required_packages.append(pack_num)
remaining_order -= pack_num
found = True
break
if not found:
break
But this will lead to the wrong result:
1x9 + 1x3
remaining: 1
So, you need to fill the order with the packages such that the total price is maximal? This is known as Knapsack problem. In that Wikipedia article you'll find several solutions written in Python.
To be more precise, you need a solution for the unbounded knapsack problem, in contrast to popular 0/1 knapsack problem (where each item can be packed only once). Here is working code from Rosetta:
from itertools import product
NAME, SIZE, VALUE = range(3)
items = (
# NAME, SIZE, VALUE
('A', 3, 5),
('B', 5, 9),
('C', 9, 16))
capacity = 13
def knapsack_unbounded_enumeration(items, C):
# find max of any one item
max1 = [int(C / item[SIZE]) for item in items]
itemsizes = [item[SIZE] for item in items]
itemvalues = [item[VALUE] for item in items]
# def totvalue(itemscount, =itemsizes, itemvalues=itemvalues, C=C):
def totvalue(itemscount):
# nonlocal itemsizes, itemvalues, C
totsize = sum(n * size for n, size in zip(itemscount, itemsizes))
totval = sum(n * val for n, val in zip(itemscount, itemvalues))
return (totval, -totsize) if totsize <= C else (-1, 0)
# Try all combinations of bounty items from 0 up to max1
bagged = max(product(*[range(n + 1) for n in max1]), key=totvalue)
numbagged = sum(bagged)
value, size = totvalue(bagged)
size = -size
# convert to (iten, count) pairs) in name order
bagged = ['%dx%d' % (n, items[i][SIZE]) for i, n in enumerate(bagged) if n]
return value, size, numbagged, bagged
if __name__ == '__main__':
value, size, numbagged, bagged = knapsack_unbounded_enumeration(items, capacity)
print(value)
print(bagged)
Output is:
23
['1x3', '2x5']
Keep in mind that this is a NP-hard problem, so it will blow as you enter some large values :)
You can use itertools.product:
import itertools
remaining_order = 13
package_numbers = [9,5,3]
required_packages = []
a=min([x for i in range(1,remaining_order+1//min(package_numbers)) for x in itertools.product(package_numbers,repeat=i)],key=lambda x: abs(sum(x)-remaining_order))
remaining_order-=sum(a)
print(a)
print(remaining_order)
Output:
(5, 5, 3)
0
This simply does the below steps:
Get value closest to 13, in the list with all the product values.
Then simply make it modify the number of remaining_order.
If you want it output with 'x':
import itertools
from collections import Counter
remaining_order = 13
package_numbers = [9,5,3]
required_packages = []
a=min([x for i in range(1,remaining_order+1//min(package_numbers)) for x in itertools.product(package_numbers,repeat=i)],key=lambda x: abs(sum(x)-remaining_order))
remaining_order-=sum(a)
print(' '.join(['{0}x{1}'.format(v,k) for k,v in Counter(a).items()]))
print(remaining_order)
Output:
2x5 + 1x3
0
For you problem, I tried two implementations depending on what you want, in both of the solutions I supposed you absolutely needed your remaining to be at 0. Otherwise the algorithm will return you -1. If you need them, tell me I can adapt my algorithm.
As the algorithm is implemented via dynamic programming, it handles good inputs, at least more than 130 packages !
In the first solution, I admitted we fill with the biggest package each time.
I n the second solution, I try to minimize the price, but the number of packages should always be 0.
remaining_order = 13
package_numbers = sorted([9,5,3], reverse=True) # To make sure the biggest package is the first element
prices = {9: 16, 5: 9, 3: 5}
required_packages = []
# First solution, using the biggest package each time, and making the total order remaining at 0 each time
ans = [[] for _ in range(remaining_order + 1)]
ans[0] = [0, 0, 0]
for i in range(1, remaining_order + 1):
for index, package_number in enumerate(package_numbers):
if i-package_number > -1:
tmp = ans[i-package_number]
if tmp != -1:
ans[i] = [tmp[x] if x != index else tmp[x] + 1 for x in range(len(tmp))]
break
else: # Using for else instead of a boolean value `found`
ans[i] = -1 # -1 is the not found combinations
print(ans[13]) # [0, 2, 1]
print(ans[9]) # [1, 0, 0]
# Second solution, minimizing the price with order at 0
def price(x):
return 16*x[0]+9*x[1]+5*x[2]
ans = [[] for _ in range(remaining_order + 1)]
ans[0] = ([0, 0, 0],0) # combination + price
for i in range(1, remaining_order + 1):
# The not found packages will be (-1, float('inf'))
minimal_price = float('inf')
minimal_combinations = -1
for index, package_number in enumerate(package_numbers):
if i-package_number > -1:
tmp = ans[i-package_number]
if tmp != (-1, float('inf')):
tmp_price = price(tmp[0]) + prices[package_number]
if tmp_price < minimal_price:
minimal_price = tmp_price
minimal_combinations = [tmp[0][x] if x != index else tmp[0][x] + 1 for x in range(len(tmp[0]))]
ans[i] = (minimal_combinations, minimal_price)
print(ans[13]) # ([0, 2, 1], 23)
print(ans[9]) # ([0, 0, 3], 15) Because the price of three packages is lower than the price of a package of 9
In case you need a solution for a small number of possible
package_numbers
but a possibly very big
remaining_order,
in which case all the other solutions would fail, you can use this to reduce remaining_order:
import numpy as np
remaining_order = 13
package_numbers = [9,5,3]
required_packages = []
sub_max=np.sum([(np.product(package_numbers)/i-1)*i for i in package_numbers])
while remaining_order > sub_max:
remaining_order -= np.product(package_numbers)
required_packages.append([max(package_numbers)]*np.product(package_numbers)/max(package_numbers))
Because if any package is in required_packages more often than (np.product(package_numbers)/i-1)*i it's sum is equal to np.product(package_numbers). In case the package max(package_numbers) isn't the one with the samllest price per unit, take the one with the smallest price per unit instead.
Example:
remaining_order = 100
package_numbers = [5,3]
Any part of remaining_order bigger than 5*2 plus 3*4 = 22 can be sorted out by adding 5 three times to the solution and taking remaining_order - 5*3.
So remaining order that actually needs to be calculated is 10. Which can then be solved to beeing 2 times 5. The rest is filled with 6 times 15 which is 18 times 5.
In case the number of possible package_numbers is bigger than just a handful, I recommend building a lookup table (with one of the others answers' code) for all numbers below sub_max which will make this immensely fast for any input.
Since no declaration about the object function is found, I assume your goal is to maximize the package value within the pack's capability.
Explanation: time complexity is fixed. Optimal solution may not be filling the highest valued item as many as possible, you have to search all possible combinations. However, you can reuse the possible optimal solutions you have searched to save space. For example, [5,5,3] is derived from adding 3 to a previous [5,5] try so the intermediate result can be "cached". You may either use an array or you may use a set to store possible solutions. The code below runs the same performance as the rosetta code but I think it's clearer.
To further optimize, use a priority set for opts.
costs = [3,5,9]
value = [5,9,16]
volume = 130
# solutions
opts = set()
opts.add(tuple([0]))
# calc total value
cost_val = dict(zip(costs, value))
def total_value(opt):
return sum([cost_val.get(cost,0) for cost in opt])
def possible_solutions():
solutions = set()
for opt in opts:
for cost in costs:
if cost + sum(opt) > volume:
continue
cnt = (volume - sum(opt)) // cost
for _ in range(1, cnt + 1):
sol = tuple(list(opt) + [cost] * _)
solutions.add(sol)
return solutions
def optimize_max_return(opts):
if not opts:
return tuple([])
cur = list(opts)[0]
for sol in opts:
if total_value(sol) > total_value(cur):
cur = sol
return cur
while sum(optimize_max_return(opts)) <= volume - min(costs):
opts = opts.union(possible_solutions())
print(optimize_max_return(opts))
If your requirement is "just fill the pack" it'll be even simpler using the volume for each item instead.

Best way in Python to determine all possible intersections in a matrix?

So if I have a matrix (list of lists) where each column represents a unique word, each row represents a distinct document, and every entry is a 1 or 0, indicating whether or not the word for a given column exists in the document for a given row.
What I'd like to know is how to determine all the possible combinations of words and documents where more than one word is in common with more than one document. The result might look something like:
[ [[Docid_3, Docid_5], ['word1', 'word17', 'word23']],
[[Docid_3, Docid_9, Docid_334], ['word2', 'word7', 'word23', 'word68', 'word982']],
...
and so on for each possible combination. Would love a solution that provides the complete set of combinations and one that yields only the combinations that are not a subset of another, so from the example, not [[Docid_3, Docid_5], ['word1', 'word17']] since it's a complete subset of the first example.
I feel like there is an elegant solution that just isn't coming to mind and the beer isn't helping.
Thanks.
Normalize the text. You only want strings made of string.lowercase. Split/strip on everything else.
Make sets out of this.
Use something like this to get all possible groupings of all sizes:
def get_all_lengths_combinations_of(elements):
for no_of_items in range(2, len(elements)+1):
for items in itertools.combinations(elements, no_of_items)
yield items
I'm sure the real itertools wizards will come up with something better, possibly involving izip().
Remember you should be able to use the set.intersection() method like this:
set.intersection(*list_of_sets_to_intersect)
First, build a mapping from document ID to set of words -- your matrix of 0 and 1 is quite an unwieldy structure to process directly. If I read you correctly, the "column headings" (words) are the first list in the matrix (minus presumably the first item) and the "row headings" (docids) are the first items of each row (minus presumably the first row). Then (assuming Python 2.6 or better):
def makemap(matrix):
im = iter(matrix)
words = next(im)[1:]
themap = {}
for row in im:
mapent = set()
docid = row[0]
for w, bit in zip(words, row[1:]):
try:
if bit: mapent.add(w)
except:
print 'w is %r' % (w,)
raise
themap[docid] = mapent
return themap
Now you need to check all feasible subsets of documents -- the total number of subsets is huge so you really want to prune that search tree as much as you can, and brute-force generation of all subsets (e.g. by looping on itertools.combinations for various lengths) will not perform any pruning of course.
I would start with all 2-combinations (all pairs of docids -- itertools.combinations is fine for this of course) and make the first batch (those pairs which have 2+ words in commons) of "feasible 2-length subsets". That can go in another mapping with tuples or frozensets of docids as the keys.
Then, to make the feasible (N+1)-length subsets, I would only try to extend existing feasible N-length subsets by one more docid each (checking the total intersection is still 2+ long of course). This at least does some pruning rather than blindly trying all the 2**N subsets (or even just the 2**N - N - 1 subsets of length at least two;-).
It might perhaps be possible to do even better by recording all docids that proved unable to extend a certain N-length subset -- no point in trying those against any of the (N+1)-length subsets derived from it. This is worth trying as a second level of pruning/optimization.
There may be further tweaks yet you might do for further pruning but offhand none immediately comes to mind, so that's where I'd start from. (For readability, I'm not bothering below with using microoptimizations such as iteritems in lieu of items, frozensets in lieu of tuples, etc -- they're probably marginal given those sequences are all O(N) vs the exponential size of computed structures, although of course worth trying in the tuning/optimizing phase).
def allfeasiblesubsets(matrix):
mapping = makemap(matrix)
docids = sorted(mapping.keys())
feasible_len2 = {}
dont_bother = dict((d, set([d])) for d in docids)
for d1, d2 in itertools.combinations(docids, 2):
commonw = mapping[d1].intersection(mapping[d2])
if len(commonw) >= 2:
feasible_len2[d1, d2] = commonw
else:
dont_bother[d1].add(d2)
dont_bother[d2].add(d1)
all_feasible = [feasible_len2]
while all_feasible[-1]:
feasible_Np1 = {}
for ds, ws in all_feasible[-1].items():
md = max(ds)
for d, w in mapping.items():
if d <= md or any(d in dont_bother[d1] for d1 in ds):
continue
commonw = w.intersection(ws)
if len(commonw) >= 2:
feasible_Np1[ds+(d,)] = commonw
all_feasible.append(feasible_Np1)
return all_feasible[:-1]
You'll notice I've applied only a mild form of my suggested "further pruning" -- dont_bother only records "incompatibilities" (<2 words in common) between one docid and others -- this may help if there are several pairs of such "incompatible docids", and is simple and reasonably unobtrusive, but is not as powerful in pruning as the harder "full" alternative. I'm also trying to keep all keys in the feasible* dicts as sorted tuples of docids (as the itertools.combinations originally provides for the pairs) to avoid duplications and therefore redundant work.
Here's the small example I've tried (in the same file as these functions after, of course, the import for itertools and collections):
mat = [ ['doc']+'tanto va la gatta al lardo che ci lascia lo zampino'.split(),
['uno', 0, 0, 0, 1, 0, 1, 0, 0, 0, 1],
['due', 1, 0, 0, 0, 0, 1, 0, 1, 0, 1],
['tre', 1, 0, 0, 0, 0, 0, 0, 1, 0, 1],
['qua', 0, 0, 0, 1, 0, 1, 0, 1, 0, 1]]
mm = makemap(mat)
print mm
afs = allfeasiblesubsets(mat)
print afs
The results, which appear OK, are:
{'qua': set(['gatta', 'lo', 'ci', 'lardo']), 'tre': set(['lo', 'ci', 'tanto']), 'due': set(['lo', 'ci', 'lardo', 'tanto']), 'uno': set(['gatta', 'lo', 'lardo'])}
[{('due', 'tre'): set(['lo', 'ci', 'tanto']), ('due', 'uno'): set(['lo', 'lardo']), ('qua', 'uno'): set(['gatta', 'lo', 'lardo']), ('due', 'qua'): set(['lo', 'ci', 'lardo']), ('qua', 'tre'): set(['lo', 'ci'])}, {('due', 'qua', 'tre'): set(['lo', 'ci']), ('due', 'qua', 'uno'): set(['lo', 'lardo'])}]
but of course there might still be bugs lurking since I haven't tested it thoroughly. BTW, I hope it's clear that the result as supplied here (a list of dicts for various increasing lengths, each dict having the ordered tuple forms of the docids-sets as keys and the sets of their common words as values) can easily be post-processed into any other form you might prefer, such as nested lists.
(Not that it matters, but the text I'm using in the example is an old Italian proverb;-).
Take a look at
SO what-tried-and-true-algorithms-for-suggesting-related-articles-are-out-there
For real problem sizes, say > 100 docs, 10000 words,
get the nice bitarray module
(which says, by the way,
"the same algorithm in Python ... is about 20 times slower than in C").
On "only the combinations that are not a subset of another":
define a hit22 as a 2x2 submatrix with 11 11,
a hit23 as a 2x3 submatrix with 111 111 (2 docs, 3 words in common), and so on.
A given hit22 may be in many hit2n s — 2 docs, n words,
and also in many hitn2 s — n docs, 2 words. Looks fun.
Added Monday 14Jun: little functions using bitarray.
(Intro / python modules for real doc classification ? Dunno.)
""" docs-words with bitarray, randombits """
# google "document classification" (tutorial | python) ...
# https://stackoverflow.com/questions/1254627/what-tried-and-true-algorithms-for-suggesting-related-articles-are-out-there
from __future__ import division
import random
import sys
from bitarray import bitarray # http://pypi.python.org/pypi/bitarray
__date__ = "14jun 2010 denis"
ndoc = 100
nbits = 1000
exec "\n".join( sys.argv[1:] ) # run this.py ndoc= ...
random.seed(1)
me = __file__.split('/') [-1]
print "%s ndoc=%d nbits=%d" % (me, ndoc, nbits)
# bitarray stuff --
def bitslist( bits ):
""" 011001 -> [1,2,5] """
return [ j for j in range(len(bits)) if bits[j] ]
hex_01 = {
"0": "0000", "1": "0001", "2": "0010", "3": "0011",
"4": "0100", "5": "0101", "6": "0110", "7": "0111",
"8": "1000", "9": "1001", "a": "1010", "b": "1011",
"c": "1100", "d": "1101", "e": "1110", "f": "1111",
}
def to01( x, len_ ):
x = "%x" % x
s = "".join( hex_01[c] for c in x )
return (len_ - len(s)) * "0" + s
def randombits( nbits ):
""" -> bitarray 1/16 1, 15/16 0 """
hibit = 1 << (nbits - 1)
r = (random.randint( 0, hibit - 1 )
& random.randint( 0, hibit - 1 )
& random.randint( 0, hibit - 1 )
& random.randint( 0, hibit - 1 )) # prob 1/16
return bitarray( to01( r, nbits ))
#...............................................................................
doc = [ randombits(nbits) for j in range(ndoc) ] # ndoc x nbits
def mostsimilarpair():
""" -> (sim, j, k) most similar pair of docs """
mostsim = (-1,-1,-1)
for j in range(ndoc):
for k in range(j+1, ndoc):
# allpairs[j,k] -> scipy.cluster.hier ?
sim = (doc[j] & doc[k]).count() # nr bits (words) in common, crude
mostsim = max( mostsim, (sim,j,k) )
return mostsim
sim, jdoc, kdoc = mostsimilarpair()
print "The 2 most similar docs:" ,
print "doc %d has %d words," % ( jdoc, doc[jdoc].count() ) ,
print "doc %d has %d," % ( kdoc, doc[kdoc].count() )
print "%d words in common: %s" % ( sim, bitslist( doc[jdoc] & doc[kdoc] ))
print ""
#...............................................................................
def docslike( jdoc, thresh ):
""" -> (doc index, sim >= thresh) ... """
for j in range(ndoc):
if j == jdoc: continue
sim = (doc[j] & doc[jdoc]).count()
if sim >= thresh:
yield (j, sim)
thresh = sim // 2
print "Docs like %d, with >= %d words in common:" % (jdoc, thresh)
for j, sim in docslike( jdoc, thresh ):
print "%3d %s" % ( j, bitslist( doc[j] & doc[jdoc] ))
"""
The 2 most similar docs: doc 72 has 66 words, doc 84 has 60,
12 words in common: [11, 51, 119, 154, 162, 438, 592, 696, 800, 858, 860, 872]
Docs like 72, with >= 6 words in common:
2 [3, 171, 258, 309, 592, 962]
...
"""
How many documents? How many unique words? How much RAM do you have?
What do you want to produce in the following scenario: document A has words 1, 2, 3; B has 1, 2; C has 2, 3

Categories

Resources