Optimizing product assembly / disassembly - python

I have a store that contains items. Each item is either a component (which is atomal) or a product which consists of various components (but never of 2 or more of the same components).
Now, when I want to get a product out of the store, there are various scenarios:
The store contains the necessary number of the product.
The store contains components of which I can assemble the product.
The store contains products that share components with the required product. I can disassemble those and assemble the required item.
Any combination of the above.
Below you can see my code so far (getAssemblyPath). It does find a way to assemble the required item if it is possible, but it does not optimize the assembly path.
I want to optimize the path in two ways:
First, choose the path which takes the least number of assembly/disassembly actions.
Second, if there are various such paths, choose the path which leave the least amount of disassembled components in the store.
Now, here I am at a complete loss of how to get this optimization done (I am not even sure if this is a question for SO or for Maths).
How can I alter getAssemblyPath so that it meets my optimization requirements?
My code so far:
#! /usr/bin/python
class Component:
def __init__ (self, name): self.__name = name
def __repr__ (self): return 'Component {}'.format (self.__name)
class Product:
def __init__ (self, name, components):
self.__name = name
self.__components = components
#property
def components (self): return self.__components
def __repr__ (self): return 'Product {}'.format (self.__name)
class Store:
def __init__ (self): self.__items = {}
def __iadd__ (self, item):
item, count = item
if not item in self.__items: self.__items [item] = 0
self.__items [item] += count
return self
#property
def items (self): return (item for item in self.__items.items () )
#property
def products (self): return ( (item, count) for item, count in self.__items.items () if isinstance (item, Product) )
#property
def components (self): return ( (item, count) for item, count in self.__items.items () if isinstance (item, Component) )
def getAssemblyPath (self, product, count):
if product in self.__items:
take = min (count, self.__items [product] )
print ('Take {} of {}'.format (take, product) )
count -= take
if not count: return
components = dict ( (comp, count) for comp in product.components)
for comp, count in self.components:
if comp not in components: continue
take = min (count, components [comp] )
print ('Take {} of {}'.format (take, comp) )
components [comp] -= take
if not components [comp]: del components [comp]
if not components: return
for prod, count in self.products:
if prod == product: continue
shared = set (prod.components) & set (components.keys () )
dis = min (max (components [comp] for comp in shared), count)
print ('Disassemble {} of {}.'.format (dis, prod) )
for comp in shared:
print ('Take {} of {}.'.format (dis, comp) )
components [comp] -= take
if not components [comp]: del components [comp]
if not components: return
print ('Missing components:')
for comp, count in components.items ():
print ('{} of {}.'.format (count, comp) )
c1 = Component ('alpha')
c2 = Component ('bravo')
c3 = Component ('charlie')
c4 = Component ('delta')
p1 = Product ('A', [c1, c2] )
p2 = Product ('B', [c1, c2, c3] )
p3 = Product ('C', [c1, c3, c4] )
store = Store ()
store += (c2, 100)
store += (c4, 100)
store += (p1, 100)
store += (p2, 100)
store += (p3, 10)
store.getAssemblyPath (p3, 20)
This outputs:
Take 10 of Product C
Take 10 of Component delta
Disassemble 10 of Product A.
Take 10 of Component alpha.
Disassemble 10 of Product B.
Take 10 of Component charlie.
Which works, but it does unnecessarily disassemble product A, as product B contains both of the required components alpha and charlie.
--
EDIT:
Answering the very sensible questions of Blckknght:
When you say you want "the least number of assembly/disassembly actions", do you mean the smallest number of items, or the smallest number of different products?
An "asm/disasm action" is the action of assembling or disassembling one product, no matter how many components are involved. I am looking for least number of touched items, no matter whether they be distinct or not.
That is, is is better to dissassemble 20 of Product A than to dissassemble 10 of Product A and an additional 5 of Product B?
The latter is closer to optimum.
Further, you say you want to avoid leaving many components behind, but in your current code all disassembled components that are not used by the requested Product are lost. Is that deliberate (that is, do you want to be throwing away the other components), or is it a bug?
The method getAssemblyPath only determines the path of how to get the items. It does not touch the actual store. At no moment it assigns to self.__items. Think of it as a function that issues an order to the store keep of what he must do in the (inmediate) future, in order to get the required amount of the required item out of his store.
--
EDIT 2:
The first obvious (or at least obvious to me) way to tackle this issue, is to search first those products, that share the maximum amount of components with the required product, as you get more required components out of each disassembly. But unfortunately this doesn't necessary yield the optimum path. Take for instance:
Product A consisting of components α, β, γ, δ, ε and ζ.
Product B consisting of components α, β, η, δ, ε and θ.
Product C consisting of components α, β, γ, ι, κ and λ.
Product D consisting of components μ, ν, ξ, δ, ε and ζ.
We have in store 0 of A, 100 of B, 100 of C and 100 of D. We require 10 of A. Now if we look first for the products that shares most components with A, we will find B. We disassemble 10 of B getting 10 each of α, β, δ and ε. But then we need to disassemble 10 of C (to get γ) and 10 of D (to get ζ). These would be 40 actions (30 disassembling and 10 assembling).
But the optimum way would be to disassemble 10 of C and 10 of D (30 actions, 20 disassembling and 10 assembling).
--
EDIT 3:
You don't need to post python code to win the bounty. Just explain the algorithm to me and show that it does indeed yield the optimum path, or one of the optima if several exist.

Here is how I would solve this problem. I wanted to write code for this but I don't think I have time.
You can find an optimal solution recursively. Make a data structure that represents the state of the parts store and the current request. Now, for each part you need, make a series of recursive calls that try the various ways to fill the order. The key is that by trying a way to fill the order, you are getting part of the work done, so the recursive call is now a slightly simpler version of the same problem.
Here's a specific example, based on your example. We need to fill orders for product 3 (p3) which is made of components c1, c3, and c4. Our order is for 20 of p3, and we have 10 p3 in stock so we trivially fill the order for the first 10 of p3. Now our order is for 10 of p3, but we can look at it as an order for 10 of c1, 10 of c3, and 10 of c4. For the first recursive call we disassemble a p1, and fill an order for a single c1 and place an extra c2 in the store; so this recursive call is for 9 of c1, 10 of c3, and 10 of c4, with an updated availability in the store. For the second recursive call we disassemble a p2, and fill an order for a c1 and a c4, and put an extra c2 into the store; so this recursive call is for 9 of c1, 10 of c3, and 9 of c4, with an updated availability in the store.
Since each call reduces the problem, the recursive series of calls will terminate. The recursive calls should return a cost metric, which either signals that the call failed to find a solution or else signals how much the found solution cost; the function chooses the best solution by choosing the solution with the lowest cost.
I'm not sure, but you might be able to speed this up by memoizing the calls. Python has a really nifty builtin new in the 3.x series, functools.lru_cache(); since you tagged your question as "Python 3.2" this is available to you.
What is memoization and how can I use it in Python?
The memoization works by recognizing that the function has already been called with the same arguments, and just returning the same solution as before. So it is a cache mapping arguments to answers. If the arguments include non-essential data (like how many of component c2 are in the store) then the memoization is less likely to work. But if we imagine we have products p1 and p9, and p9 contains components c1 and c9, then for our purposes disassembling one of p1 or one of p9 should be equivalent: they have the same disassembly cost, and they both produce a component we need (c1) and one we don't need (c2 or c9). So if we get the recursive call arguments right, the memoization could just return an instant answer when we get around to trying p9, and it could save a lot of time.
Hmm, now that I think about it, we probably can't use functools.lru_cache() but we can just memoize on our own. We can make a cache of solutions: a dictionary mapping tuples to values, and build tuples that just have the arguments we want cached. Then in our function, the first thing we do is check the cache of solutions, and if this call is equivalent to a cached solution, just return it.
EDIT: Here's the code I have written so far. I haven't finished debugging it so it probably doesn't produce the correct answer yet (I'm not certain because it takes a long time and I haven't let it finish running). This version is passing in dictionaries, which won't work well with my ideas about memoizing, but I wanted to get a simple version working and then worry about speeding it up.
Also, this code takes apart products and adds them to the store as components, so the final solution will first say something like "Take apart 10 product A" and then it will say "Take 20 component alpha" or whatever. In other words, the component count could be considered high since it doesn't distinguish between components that were already in the store and components that were put there by disassembling products.
I'm out of time for now and won't work on it for a while, sorry.
#!/usr/bin/python3
class Component:
def __init__ (self, name): self.__name = name
#def __repr__ (self): return 'Component {}'.format (self.__name)
def __repr__ (self): return 'C_{}'.format (self.__name)
class Product:
def __init__ (self, name, components):
self.__name = name
self.__components = components
#property
def components (self): return self.__components
#def __repr__ (self): return 'Product {}'.format (self.__name)
def __repr__ (self): return 'P_{}'.format (self.__name)
class Store:
def __init__ (self): self.__items = {}
def __iadd__ (self, item):
item, count = item
if not item in self.__items: self.__items [item] = 0
self.__items [item] += count
return self
#property
def items (self): return (item for item in self.__items.items () )
#property
def products (self): return ( (item, count) for item, count in self.__items.items () if isinstance (item, Product) )
#property
def components (self): return ( (item, count) for item, count in self.__items.items () if isinstance (item, Component) )
def get_assembly_path (self, product, count):
store = self.__items.copy()
if product in store:
take = min (count, store [product] )
s_trivial = ('Take {} of {}'.format (take, product) )
count -= take
if not count:
print(s_trivial)
return
dict_decr(store, product, take)
product not in store
order = {item:count for item in product.components}
cost, solution = solver(order, store)
if cost is None:
print("No solution.")
return
print("Solution:")
print(s_trivial)
for item, count in solution.items():
if isinstance(item, Component):
print ('Take {} of {}'.format (count, item) )
else:
assert isinstance(item, Product)
print ('Disassemble {} of {}'.format (count, item) )
def getAssemblyPath (self, product, count):
if product in self.__items:
take = min (count, self.__items [product] )
print ('Take {} of {}'.format (take, product) )
count -= take
if not count: return
components = dict ( (comp, count) for comp in product.components)
for comp, count in self.components:
if comp not in components: continue
take = min (count, components [comp] )
print ('Take {} of {}'.format (take, comp) )
components [comp] -= take
if not components [comp]: del components [comp]
if not components: return
for prod, count in self.products:
if prod == product: continue
shared = set (prod.components) & set (components.keys () )
dis = min (max (components [comp] for comp in shared), count)
print ('Disassemble {} of {}.'.format (dis, prod) )
for comp in shared:
print ('Take {} of {}.'.format (dis, comp) )
components [comp] -= take
if not components [comp]: del components [comp]
if not components: return
print ('Missing components:')
for comp, count in components.items ():
print ('{} of {}.'.format (count, comp) )
def str_d(d):
lst = list(d.items())
lst.sort(key=str)
return "{" + ", ".join("{}:{}".format(k, v) for (k, v) in lst) + "}"
def dict_incr(d, key, n):
if key not in d:
d[key] = n
else:
d[key] += n
def dict_decr(d, key, n):
assert d[key] >= n
d[key] -= n
if d[key] == 0:
del(d[key])
def solver(order, store):
"""
order is a dict mapping component:count
store is a dict mapping item:count
returns a tuple: (cost, solution)
cost is a cost metric estimating the expense of the solution
solution is a dict that maps item:count (how to fill the order)
"""
print("DEBUG: solver: {} {}".format(str_d(order), str_d(store)))
if not order:
solution = {}
cost = 0
return (cost, solution)
solutions = []
for item in store:
if not isinstance(item, Component):
continue
print("...considering: {}".format(item))
if not item in order:
continue
else:
o = order.copy()
s = store.copy()
dict_decr(o, item, 1)
dict_decr(s, item, 1)
if not o:
# we have found a solution! Return it
solution = {}
solution[item] = 1
cost = 1
print("BASIS: solver: {} {} / {} {}".format(str_d(order), str_d(store), cost, str_d(solution)))
return (cost, solution)
else:
cost, solution = solver(o, s)
if cost is None:
continue # this was a dead end
dict_incr(solution, item, 1)
cost += 1
solutions.append((cost, solution))
for item in store:
if not isinstance(item, Product):
continue
print("...Product components: {} {}".format(item, item.components))
assert isinstance(item, Product)
if any(c in order for c in item.components):
print("...disassembling: {}".format(item))
o = order.copy()
s = store.copy()
dict_decr(s, item, 1)
for c in item.components:
dict_incr(s, c, 1)
cost, solution = solver(o, s)
if cost is None:
continue # this was a dead end
cost += 1 # cost of disassembly
solutions.append((cost, solution))
else:
print("DEBUG: ignoring {}".format(item))
if not solutions:
print("DEBUG: *dead end*")
return (None, None)
print("DEBUG: finding min of: {}".format(solutions))
return min(solutions)
c1 = Component ('alpha')
c2 = Component ('bravo')
c3 = Component ('charlie')
c4 = Component ('delta')
p1 = Product ('A', [c1, c2] )
p2 = Product ('B', [c1, c2, c3] )
p3 = Product ('C', [c1, c3, c4] )
store = Store ()
store += (c2, 100)
store += (c4, 100)
store += (p1, 100)
store += (p2, 100)
store += (p3, 10)
#store.getAssemblyPath (p3, 20)
store.get_assembly_path(p3, 20)

Optimal path for N products <=> optimal path for single product.
Indeed, if we need to optimally assemble N of product X, after we optimally (using current stock) assemble one product, question becomes to optimally assemble (N-1) of product X using remaining stock.
=> Therefore, it is sufficient to provide algorithm of optimally assembling ONE product X at a time.
Assume we need components x1,..xn for the product (here we only include components not available as components in stock)
For each component xk, find all products that have this component. We will get a list of products for each component - products A1(1),..,A1(i1) have component x1, products A(1),..,
A(i2) have component x2, and so forth (some products can be contained in several lists A1,A2,..,An lists).
If any of the lists is empty - there is no solution.
We need minimal set of products, such that a product from that set is contained in each of the lists. The simplest, but not computationally efficient solution is by brute force - try all sets and pick minimal:
Take union of A1,..,An - call it A (include only unique products in the union).
a. Take single product from A, if it is contained in all A1,..,An - we need only one disassembly (this product).
b. Try all combinations of two products from A, if any combination (a1,a2) satisfies condition that either a1 or a2 is contained in each of the lists A1,..,An - it is a solution.
...
for sure, there is a solution at depth n - one component from each of the lists A1,..,An. If we found no solution prior, this is the best solution.
Now, we only need to think about better strategy then brute force check, which I think is possible - I need to think about it, but this brute force approach for sure finds strictly optimal solution.
EDIT:
More accurate solution is to sort lists by length. Then when checking set of K products for being solution - only all possible combinations of 1 item from each list from first K lists need to be checked, if no solution there - there is no minimal set of depth K that solves the problem. That type of check will be also computationally no that bad - perhaps it can work????

I think the key here is to establish the potential costs of each purchase case, so that the proper combination of purchase cases optimally minimize a cost function. (Then its simply reduced to a knapsack problem)
What follows is probably not optimal but here is an example of what I mean:
1.Any product that is the end product "costs" it's actual cost (in currency).
2.Any component or product that can be assembled into the end product (given other separate products/components) but does not require being dissembled costs it's real price (in currency) plus a small tax( tbd).
3.Any component or product that can facilitate assembly of the end product but requires being dissembled costs it's price in currency plus a small tax for the assembly into the end product and another small tax for each dis-assembly needed. (maybe the same value as the assembly tax?).
Note: these "taxes" will apply to all sub-products that occupy the same case.
... and so on for other possible cases
Then, find all possible combinations of components and products available at the storefront that are capable of being assembled into the end product. Place these "assembly lists" into a cost sorted list determined by your chosen cost function. After that, start creating as many of the first (lowest cost) "assembly list" as you can (by checking if all items in assembly list are still available at the store - i.e. you have already used them for a previous assembly). Once you cannot create any more of this case, pop it from the list. Repeat until all the end products you need are "built".
Note: Every time you "assemble" an end product you will need to decriment a global counter for each product in the current "assembly list".
Hope this get's the discussion moving in the right direction. Good luck!

Related

How to set multiple duplicate items in Python resource allocation problem, using binpacking module? [duplicate]

For an application I'm working on I need something like a packing algorithm implemented in Python see here for more details. The basic idea is that I have n objects of varying sizes that I need to fit into n bins, where the number of bins is limited and the size of both objects and bins is fixed. The objects / bins can be either 1d or 2d, interested in seeing both. (I think 3d objects is probably more than I need.)
I know there are a variety of algorithms out there that address this problem, such asBest Fit Decreasing and First Fit Decreasing, but I was hoping there might be an implementation in Python (or PHP/C++/Java, really I'm not that picky). Any ideas?
https://bitbucket.org/kent37/python-tutor-samples/src/f657aeba5328/BinPacking.py
""" Partition a list into sublists whose sums don't exceed a maximum
using a First Fit Decreasing algorithm. See
http://www.ams.org/new-in-math/cover/bins1.html
for a simple description of the method.
"""
class Bin(object):
""" Container for items that keeps a running sum """
def __init__(self):
self.items = []
self.sum = 0
def append(self, item):
self.items.append(item)
self.sum += item
def __str__(self):
""" Printable representation """
return 'Bin(sum=%d, items=%s)' % (self.sum, str(self.items))
def pack(values, maxValue):
values = sorted(values, reverse=True)
bins = []
for item in values:
# Try to fit item into a bin
for bin in bins:
if bin.sum + item <= maxValue:
#print 'Adding', item, 'to', bin
bin.append(item)
break
else:
# item didn't fit into any bin, start a new bin
#print 'Making new bin for', item
bin = Bin()
bin.append(item)
bins.append(bin)
return bins
if __name__ == '__main__':
import random
def packAndShow(aList, maxValue):
""" Pack a list into bins and show the result """
print 'List with sum', sum(aList), 'requires at least', (sum(aList)+maxValue-1)/maxValue, 'bins'
bins = pack(aList, maxValue)
print 'Solution using', len(bins), 'bins:'
for bin in bins:
print bin
print
aList = [10,9,8,7,6,5,4,3,2,1]
packAndShow(aList, 11)
aList = [ random.randint(1, 11) for i in range(100) ]
packAndShow(aList, 11)

Adding a cache array to recursive knapsack solution?

I'm familiar with the naive recursive solution to the knapsack problem. However, this solution simply spits out the max value that can be stored in the knapsack given its weight constraints. What I'd like to do is add some form of metadata cache (namely which items have/not been selected, using a "one-hot" array [0,1,1]).
Here's my attempt:
class Solution:
def __init__(self):
self.array = []
def knapSack(self,W, wt, val, n):
index = n-1
if n == 0 or W == 0 :
return 0
if (wt[index] > W):
self.array.append(0)
choice = self.knapSack(W, wt, val, index)
else:
option_A = val[index] + self.knapSack( W-wt[index], wt, val, index)
option_B = self.knapSack(W, wt, val, index)
if option_A > option_B:
self.array.append(1)
choice = option_A
else:
self.array.append(0)
choice = option_B
print(int(option_A > option_B)) #tells you which path was traveled
return choice
# To test above function
val = [60, 100, 120]
wt = [10, 20, 30]
W = 50
n = len(val)
# print(knapSack(W, wt, val, n))
s = Solution()
s.knapSack(W, wt, val, n)
>>>
1
1
1
1
1
1
220
s.array
>>>
[1, 1, 1, 1, 1, 1]
As you can see, s.array returns [1,1,1,1,1,1] and this tells me a few things. (1), even though there are only three items in the problem set, the knapSack method has been called twice for each item and (2) this is because every item flows through the else statement in the method, so option_A and option_B are each computed for each item (explaining why the array length is 6 not 3.)
I'm confused as to why 1 has been appended in every recursive loop. The item at index 0 would is not selected in the optimal solution. To answer this question, please provide:
(A) Why the current solution is behaving this way
(B) How the code can be restructured such that a one-hot "take or don't take" vector can be captured, representing whether a given item goes in the knapsack or not.
Thank you!
(A) Why the current solution is behaving this way
self.array is an instance attribute that is shared by all recursion paths. On one path or another each item is taken and so a one is appended to the list.
option_A = val[index]... takes an item but doesn't append a one to the list.
option_B = self..... skips an item but doesn't append a zero to the list.
if option_A > option_B: When you make this comparison you have lost the information that made it - the items that were taken/discarded in the branch;
in the suites you just append a one or a zero regardless of how many items made those values.
The ones and zeroes then represent whether branch A (1) or branch B (0) was successful in the current instance of the function.
(B) How the code can be restructured such that a one-hot "take or don't take" vector can be captured, representing whether a given item goes in the knapsack or not.
It would be nice to know what you have taken after running through the analysis, I suspect that is what you are trying to do with self.array. You expressed an interest in OOP: instead of keeping track with lists of numbers using indices to select numbers from the lists, make objects to represent the items work with those. Keep the objects in containers and use the functionality of the container to add or remove items/objects from it. Consider how you are going to use a container before choosing one.
Don't put the function in a class.
Change the function's signature to accept
available weight,
a container of items to be considered,
a container holding the items currently in the sack (the current sack).
Use a collections.namedtuple or a class for the items having value and weight attributes.
Item = collections.namedtuple('Item',['wt','val'])
When an item is taken add it to the current sack.
When recursing
if going down the take path add the return value from the call to the current sack
remove the item that was just considered from the list of items to be considered argument.
if taken subtract the item's weight from the available weight argument
When comparing two branches you will need to add up the values of each item the current sack.
return the sack with the highest value
carefully consider the base case
Make the items to be considered like this.
import collections
Item = collections.namedtuple('Item',['wt','val'])
items = [Item(wght,value) for wght,value in zip(wt,val)]
Add up values like this.
value = sum(item.val for item in current_sack)
# or
import operator
val = operator.itemgetter('val')
wt = operator.itemgetter('wt')
value = sum(map(val,current_sack)
Your solution enhanced with debugging prints for the curious.
class Solution:
def __init__(self):
self.array = []
self.other_array = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
def knapSack(self,W, wt, val, n,j=0):
index = n-1
deep = f'''{' '*j*3}'''
print(f'{deep}level {j}')
print(f'{deep}{W} available: considering {wt[index]},{val[index]}, {n})')
# minor change here but has no affect on the outcome0
#if n == 0 or W == 0 :
if n == 0:
print(f'{deep}Base case found')
return 0
print(f'''{deep}{wt[index]} > {W} --> {wt[index] > W}''')
if (wt[index] > W):
print(f'{deep}too heavy')
self.array.append(0)
self.other_array[index] = 0
choice = self.knapSack(W, wt, val, index,j+1)
else:
print(f'{deep}Going down the option A hole')
option_A = val[index] + self.knapSack( W-wt[index], wt, val, index,j+1)
print(f'{deep}Going down the option B hole')
option_B = self.knapSack(W, wt, val, index,j+1)
print(f'{deep}option A:{option_A} option B:{option_B}')
if option_A > option_B:
print(f'{deep}option A wins')
self.array.append(1)
self.other_array[index] = 1
choice = option_A
else:
print(f'{deep}option B wins')
self.array.append(0)
self.other_array[index] = 0
choice = option_B
print(f'{deep}level {j} Returning value={choice}')
print(f'{deep}---------------------------------------------')
return choice

Sorting from smallest to biggest

I would like to sort several points from smallest to biggest however.
I will wish to get this result:
Drogba 2 pts
Owen 4 pts
Henry 6 pts
However, my ranking seems to be reversed for now :-(
Henry 6 pts
Owen 4 pts
Drogba 2 pts
I think my problem is with my function Bubblesort ?
def Bubblesort(name, goal1, point):
swap = True
while swap:
swap = False
for i in range(len(name)-1):
if goal1[i+1] > goal1[i]:
goal1[i], goal1[i+1] = goal1[i+1], goal1[i]
name[i], name[i+1] = name[i+1], name[i]
point[i], point[i + 1] = point[i + 1], point[i]
swap = True
return name, goal1, point
def ranking(name, point):
for i in range(len(name)):
print(name[i], "\t" , point[i], " \t ")
name = ["Henry", "Owen", "Drogba"]
point = [0]*3
goal1 = [68, 52, 46]
gain = [6,4,2]
name, goal1, point = Bubblesort( name, goal1, point )
for i in range(len(name)):
point[i] += gain[i]
ranking (name, point)
In your code:
if goal1[i+1] > goal1[i]:
that checks if it is greater. You need to swap it if the next one is less, not greater.
Change that to:
if goal1[i+1] < goal1[i]:
A bunch of issues:
def Bubblesort - PEP8 says function names should be lowercase, ie def bubblesort
You are storing your data as a bunch of parallel lists; this makes it harder to work on and think about (and sort!). You should transpose your data so that instead of having a list of names, a list of points, a list of goals you have a list of players, each of whom has a name, points, goals.
def bubblesort(name, goal1, point): - should look like def bubblesort(items) because bubblesort does not need to know that it is getting names and goals and points and sorting on goals (specializing it that way keeps you from reusing the function later to sort other things). All it needs to know is that it is getting a list of items and that it can compare pairs of items using >, ie Item.__gt__ is defined.
Instead of using the default "native" sort order, Python sort functions usually let you pass an optional key function which allows you to tell it what to sort on - that is, sort on key(items[i]) > key(items[j]) instead of items[i] > items[j]. This is often more efficient and/or convenient than reshuffling your data to get the sort order you want.
for i in range(len(name)-1): - you are iterating more than needed. After each pass, the highest value in the remaining list gets pushed to the top (hence "bubble" sort, values rise to the top of the list like bubbles). You don't need to look at those top values again because you already know they are higher than any of the remaining values; after the nth pass, you can ignore the last n values.
actually, the situation is a bit better than that; you will often find runs of values which are already in sorted order. If you keep track of the highest index that actually got swapped, you don't need to go beyond that on your next pass.
So your sort function becomes
def bubblesort(items, *, key=None):
"""
Return items in sorted order
"""
# work on a copy of the list (don't destroy the original)
items = list(items)
# process key values - cache the result of key(item)
# so it doesn't have to be called repeatedly
keys = items if key is None else [key(item) for item in items]
# initialize the "last item to sort on the next pass" index
last_swap = len(items) - 1
# sort!
while last_swap:
ls = 0
for i in range(last_swap):
j = i + 1
if keys[i] > keys[j]:
# have to swap keys and items at the same time,
# because keys may be an alias for items
items[i], items[j], keys[i], keys[j] = items[j], items[i], keys[j], keys[i]
# made a swap - update the last_swap index
ls = i
last_swap = ls
return items
You may not be sure that this is actually correct, so let's test it:
from random import sample
def test_bubblesort(tries = 1000):
# example key function
key_fn = lambda item: (item[2], item[0], item[1])
for i in range(tries):
# create some sample data to sort
data = [sample("abcdefghijk", 3) for j in range(10)]
# no-key sort
assert bubblesort(data) == sorted(data), "Error: bubblesort({}) gives {}".format(data, bubblesort(data))
# keyed sort
assert bubblesort(data, key=key_fn) == sorted(data, key=key_fn), "Error: bubblesort({}, key) gives {}".format(data, bubblesort(data, key_fn))
test_bubblesort()
Now the rest of your code becomes
class Player:
def __init__(self, name, points, goals, gains):
self.name = name
self.points = points
self.goals = goals
self.gains = gains
players = [
Player("Henry", 0, 68, 6),
Player("Owen", 0, 52, 4),
Player("Drogba", 0, 46, 2)
]
# sort by goals
players = bubblesort(players, key = lambda player: player.goals)
# update points
for player in players:
player.points += player.gains
# show the result
for player in players:
print("{player.name:<10s} {player.points:>2d} pts".format(player=player))
which produces
Drogba 2 pts
Owen 4 pts
Henry 6 pts

Sorting a List with Relative Positional Data

This is more of a conceptual programming question, so bear with me:
Say you have a list of scenes in a movie, and each scene may or may not make reference to past/future scenes in the same movie. I'm trying to find the most efficient algorithm of sorting these scenes. There may not be enough information for the scenes to be completely sorted, of course.
Here's some sample code in Python (pretty much pseudocode) to clarify:
class Reference:
def __init__(self, scene_id, relation):
self.scene_id = scene_id
self.relation = relation
class Scene:
def __init__(self, scene_id, references):
self.id = scene_id
self.references = references
def __repr__(self):
return self.id
def relative_sort(scenes):
return scenes # Algorithm in question
def main():
s1 = Scene('s1', [
Reference('s3', 'after')
])
s2 = Scene('s2', [
Reference('s1', 'before'),
Reference('s4', 'after')
])
s3 = Scene('s3', [
Reference('s4', 'after')
])
s4 = Scene('s4', [
Reference('s2', 'before')
])
print relative_sort([s1, s2, s3, s4])
if __name__ == '__main__':
main()
The goal is to have relative_sort return [s4, s3, s2, s1] in this case.
If it's helpful, I can share my initial attempt at the algorithm; I'm a little embarrassed at how brute-force it is. Also, if you're wondering, I'm trying to decode the plot of the film "Mulholland Drive".
FYI: The Python tag is only here because my pseudocode was written in Python.
The algorithm you are looking for is a topological sort:
In the field of computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks.
You can compute this pretty easily using a graph library, for instance, networkx, which implements topological_sort. First we import the library and list all of the relationships between scenes -- that is, all of the directed edges in the graph
>>> import networkx as nx
>>> relations = [
(3, 1), # 1 after 3
(2, 1), # 2 before 1
(4, 2), # 2 after 4
(4, 3), # 3 after 4
(4, 2) # 4 before 2
]
We then create a directed graph:
>>> g = nx.DiGraph(relations)
Then we run a topological sort:
>>> nx.topological_sort(g)
[4, 3, 2, 1]
I have included your modified code in my answer, which solves the current (small) problem, but without a larger sample problem, I'm not sure how well it would scale. If you provide the actual problem you're trying to solve, I'd love to test and refine this code until it works on that problem, but without test data I won't optimize this solution any further.
For starters, we track references as sets, not lists.
Duplicates don't really help us (if "s1" before "s2", and "s1" before "s2", we've gained no information)
This also lets us add inverse references with abandon (if "s1" comes before "s2", then "s2" comes after "s1").
We compute a min and max position:
Min position based on how many scenes we come after
This could be extended easily: If we come after two scenes with a min_pos of 2, our min_pos is 4 (If one is 2, other must be 3)
Max position based on how many things we come before
This could be extended similarly: If we come before two scenes with a max_pos of 4, our max_pos is 2 (If one is 4, other must be 3)
If you decide to do this, just replace pass in tighten_bounds(self) with code to try to tighten the bounds for a single scene (and set anything_updated to true if it works).
The magic is in get_possible_orders
Generates all valid orderings if you iterate over it
If you only want one valid ordering, it doesn't take the time to create them all
Code:
class Reference:
def __init__(self, scene_id, relation):
self.scene_id = scene_id
self.relation = relation
def __repr__(self):
return '"%s %s"' % (self.relation, self.scene_id)
def __hash__(self):
return hash(self.scene_id)
def __eq__(self, other):
return self.scene_id == other.scene_id and self.relation == other.relation
class Scene:
def __init__(self, title, references):
self.title = title
self.references = references
self.min_pos = 0
self.max_pos = None
def __repr__(self):
return '%s (%s,%s)' % (self.title, self.min_pos, self.max_pos)
inverse_relation = {'before': 'after', 'after': 'before'}
def inverted_reference(scene, reference):
return Reference(scene.title, inverse_relation[reference.relation])
def is_valid_addition(scenes_so_far, new_scene, scenes_to_go):
previous_ids = {s.title for s in scenes_so_far}
future_ids = {s.title for s in scenes_to_go}
for ref in new_scene.references:
if ref.relation == 'before' and ref.scene_id in previous_ids:
return False
elif ref.relation == 'after' and ref.scene_id in future_ids:
return False
return True
class Movie:
def __init__(self, scene_list):
self.num_scenes = len(scene_list)
self.scene_dict = {scene.title: scene for scene in scene_list}
self.set_max_positions()
self.add_inverse_relations()
self.bound_min_max_pos()
self.can_tighten = True
while self.can_tighten:
self.tighten_bounds()
def set_max_positions(self):
for scene in self.scene_dict.values():
scene.max_pos = self.num_scenes - 1
def add_inverse_relations(self):
for scene in self.scene_dict.values():
for ref in scene.references:
self.scene_dict[ref.scene_id].references.add(inverted_reference(scene, ref))
def bound_min_max_pos(self):
for scene in self.scene_dict.values():
for ref in scene.references:
if ref.relation == 'before':
scene.max_pos -= 1
elif ref.relation == 'after':
scene.min_pos += 1
def tighten_bounds(self):
anything_updated = False
for scene in self.scene_dict.values():
pass
# If bounds for any scene are tightened, set anything_updated back to true
self.can_tighten = anything_updated
def get_possible_orders(self, scenes_so_far):
if len(scenes_so_far) == self.num_scenes:
yield scenes_so_far
raise StopIteration
n = len(scenes_so_far)
scenes_left = set(self.scene_dict.values()) - set(scenes_so_far)
valid_next_scenes = set(s
for s in scenes_left
if s.min_pos <= n <= s.max_pos)
# valid_next_scenes = sorted(valid_next_scenes, key=lambda s: s.min_pos * self.num_scenes + s.max_pos)
for s in valid_next_scenes:
if is_valid_addition(scenes_so_far, s, scenes_left - {s}):
for valid_complete_sequence in self.get_possible_orders(scenes_so_far + (s,)):
yield valid_complete_sequence
def get_possible_order(self):
return self.get_possible_orders(tuple()).__next__()
def relative_sort(lst):
try:
return [s.title for s in Movie(lst).get_possible_order()]
except StopIteration:
return None
def main():
s1 = Scene('s1', {Reference('s3', 'after')})
s2 = Scene('s2', {
Reference('s1', 'before'),
Reference('s4', 'after')
})
s3 = Scene('s3', {
Reference('s4', 'after')
})
s4 = Scene('s4', {
Reference('s2', 'before')
})
print(relative_sort([s1, s2, s3, s4]))
if __name__ == '__main__':
main()
As others have pointed out, you need a topological sort. A depth first traversal of the directed graph where the order relation forms the edges is all you need. Visit in post order. This the reverse of a topo sort. So to get the topo sort, just reverse the result.
I've encoded your data as a list of pairs showing what's known to go before what. This is just to keep my code short. You can just as easily traverse your list of classes to create the graph.
Note that for topo sort to be meaningful, the set being sorted must satisfy the definition of a partial order. Yours is fine. Order constraints on temporal events naturally satisfy the definition.
Note it's perfectly possible to create a graph with cycles. There's no topo sort of such a graph. This implementation doesn't detect cycles, but it would be easy to modify it to do so.
Of course you can use a library to get the topo sort, but where's the fun in that?
from collections import defaultdict
# Before -> After pairs dictating order. Repeats are okay. Cycles aren't.
# This is OP's data in a friendlier form.
OrderRelation = [('s3','s1'), ('s2','s1'), ('s4','s2'), ('s4','s3'), ('s4','s2')]
class OrderGraph:
# nodes is an optional list of items for use when some aren't related at all
def __init__(self, relation, nodes=[]):
self.succ = defaultdict(set) # Successor map
heads = set()
for tail, head in relation:
self.succ[tail].add(head)
heads.add(head)
# Sources are nodes that have no in-edges (tails - heads)
self.sources = set(self.succ.keys()) - heads | set(nodes)
# Recursive helper to traverse the graph and visit in post order
def __traverse(self, start):
if start in self.visited: return
self.visited.add(start)
for succ in self.succ[start]: self.__traverse(succ)
self.sorted.append(start) # Append in post-order
# Return a reverse post-order visit, which is a topo sort. Not thread safe.
def topoSort(self):
self.visited = set()
self.sorted = []
for source in self.sources: self.__traverse(source)
self.sorted.reverse()
return self.sorted
Then...
>>> print OrderGraph(OrderRelation).topoSort()
['s4', 's2', 's3', 's1']
>>> print OrderGraph(OrderRelation, ['s1', 'unordered']).topoSort()
['s4', 's2', 's3', 'unordered', 's1']
The second call shows that you can optionally pass values to be sorted in a separate list. You may but don't have mention values already in the relation pairs. Of course those not mentioned in order pairs are free to appear anywhere in the output.

Calculating items included in branch and bound knapsack

Using a branch and bound algorithm I have evaluated the optimal profit from a given set of items, but now I wish to find out which items are included in this optimal solution. I'm evaluating the profit value of the optimal knapsack as follows (adapted from here):
import Queue
class Node:
def __init__(self, level, profit, weight):
self.level = level # The level within the tree (depth)
self.profit = profit # The total profit
self.weight = weight # The total weight
def solveKnapsack(weights, profits, knapsackSize):
numItems = len(weights)
queue = Queue.Queue()
root = Node(-1, 0, 0)
queue.put(root)
maxProfit = 0
bound = 0
while not queue.empty():
v = queue.get() # Get the next item on the queue
uLevel = v.level + 1
u = Node(uLevel, v.profit + e[uLevel][1], v.weight + e[uLevel][0])
bound = getBound(u, numItems, knapsackSize, weights, profits)
if u.weight <= knapsackSize and u.profit > maxProfit:
maxProfit = uProfit
if bound > maxProfit:
queue.put(u)
u = Node(uLevel, v.profit, v.weight)
bound = getBound(u, numItems, knapsackSize, weights, profits)
if (bound > maxProfit):
queue.put(u)
return maxProfit
# This is essentially the brute force solution to the fractional knapsack
def getBound(u, numItems, knapsackSize, weight, profit):
if u.weight >= knapsackSize: return 0
else:
upperBound = u.profit
totalWeight = u.weight
j = u.level + 1
while j < numItems and totalWeight + weight[j] <= C:
upperBound += profit[j]
totalWeight += weights[j]
j += 1
if j < numItems:
result += (C - totalWeight) * profit[j]/weight[j]
return upperBound
So, how can I get the items that form the optimal solution, rather than just the profit?
I got this working using your code as the starting point. I defined my Node class as:
class Node:
def __init__(self, level, profit, weight, bound, contains):
self.level = level # current level of our node
self.profit = profit
self.weight = weight
self.bound = bound # max (optimistic) value our node can take
self.contains = contains # list of items our node contains
I then started my knapsack solver similarly, but initalized root = Node(0, 0, 0, 0.0, []). The value root.bound could be a float, which is why I initalized it to 0.0, while the other values (at least in my problem) are all integers. The node contains nothing so far, so I started it off with an empty list. I followed a similar outline to your code, except that I stored the bound in each node (not sure this was necessary), and updated the contains list using:
u.contains = v.contains[:] # copies the items in the list, not the list location
# Initialize u as Node(uLevel, uProfit, uWeight, 0.0, uContains)
u.contains.append(uLevel) # add the current item index to the list
Note that I only updated the contains list in the "taking the item" node. This is the first initialization in your main loop, preceding the first if bound > maxProfit: statement. I updated the contains list in the if: statement right before this, when you update the value of maxProfit:
if u.weight <= knapsackSize and u.value > maxProfit:
maxProfit = u.profit
bestList = u.contains
This stores the indices of the items you are taking to bestList. I also added the condition if v.bound > maxProfit and v.level < items-1 to the main loop right after v = queue.get() so that I do not keep going after I reach the last item, and I do not loop through branches that are not worth exploring.
Also, if you want to get a binary list output showing which items are selected by index, you could use:
taken = [0]*numItems
for item in bestList:
taken[item] = 1
print str(taken)
I had some other differences in my code, but this should enable you to get your chosen item list out.
I have been thinking about this for some time. Apparently, you have to add some methods inside your Node class that will assign the node_path and add the current level to it. You call your methods inside your loop and assign the path_list to your optimal_item_list when your node_weight is less than the capacity and its value is greater than the max_profit, ie where you assign the maxProfit. You can find the java implementation here

Categories

Resources