Adding a cache array to recursive knapsack solution?

Adding a cache array to recursive knapsack solution? - python

I'm familiar with the naive recursive solution to the knapsack problem. However, this solution simply spits out the max value that can be stored in the knapsack given its weight constraints. What I'd like to do is add some form of metadata cache (namely which items have/not been selected, using a "one-hot" array [0,1,1]).
Here's my attempt:
class Solution:
def __init__(self):
self.array = []
def knapSack(self,W, wt, val, n):
index = n-1
if n == 0 or W == 0 :
return 0
if (wt[index] > W):
self.array.append(0)
choice = self.knapSack(W, wt, val, index)
else:
option_A = val[index] + self.knapSack( W-wt[index], wt, val, index)
option_B = self.knapSack(W, wt, val, index)
if option_A > option_B:
self.array.append(1)
choice = option_A
else:
self.array.append(0)
choice = option_B
print(int(option_A > option_B)) #tells you which path was traveled
return choice
# To test above function
val = [60, 100, 120]
wt = [10, 20, 30]
W = 50
n = len(val)
# print(knapSack(W, wt, val, n))
s = Solution()
s.knapSack(W, wt, val, n)
>>>
1
1
1
1
1
1
220
s.array
>>>
[1, 1, 1, 1, 1, 1]
As you can see, s.array returns [1,1,1,1,1,1] and this tells me a few things. (1), even though there are only three items in the problem set, the knapSack method has been called twice for each item and (2) this is because every item flows through the else statement in the method, so option_A and option_B are each computed for each item (explaining why the array length is 6 not 3.)
I'm confused as to why 1 has been appended in every recursive loop. The item at index 0 would is not selected in the optimal solution. To answer this question, please provide:
(A) Why the current solution is behaving this way
(B) How the code can be restructured such that a one-hot "take or don't take" vector can be captured, representing whether a given item goes in the knapsack or not.
Thank you!

(A) Why the current solution is behaving this way
self.array is an instance attribute that is shared by all recursion paths. On one path or another each item is taken and so a one is appended to the list.
option_A = val[index]... takes an item but doesn't append a one to the list.
option_B = self..... skips an item but doesn't append a zero to the list.
if option_A > option_B: When you make this comparison you have lost the information that made it - the items that were taken/discarded in the branch;
in the suites you just append a one or a zero regardless of how many items made those values.
The ones and zeroes then represent whether branch A (1) or branch B (0) was successful in the current instance of the function.
(B) How the code can be restructured such that a one-hot "take or don't take" vector can be captured, representing whether a given item goes in the knapsack or not.
It would be nice to know what you have taken after running through the analysis, I suspect that is what you are trying to do with self.array. You expressed an interest in OOP: instead of keeping track with lists of numbers using indices to select numbers from the lists, make objects to represent the items work with those. Keep the objects in containers and use the functionality of the container to add or remove items/objects from it. Consider how you are going to use a container before choosing one.
Don't put the function in a class.
Change the function's signature to accept
available weight,
a container of items to be considered,
a container holding the items currently in the sack (the current sack).
Use a collections.namedtuple or a class for the items having value and weight attributes.
Item = collections.namedtuple('Item',['wt','val'])
When an item is taken add it to the current sack.
When recursing
if going down the take path add the return value from the call to the current sack
remove the item that was just considered from the list of items to be considered argument.
if taken subtract the item's weight from the available weight argument
When comparing two branches you will need to add up the values of each item the current sack.
return the sack with the highest value
carefully consider the base case
Make the items to be considered like this.
import collections
Item = collections.namedtuple('Item',['wt','val'])
items = [Item(wght,value) for wght,value in zip(wt,val)]
Add up values like this.
value = sum(item.val for item in current_sack)
# or
import operator
val = operator.itemgetter('val')
wt = operator.itemgetter('wt')
value = sum(map(val,current_sack)
Your solution enhanced with debugging prints for the curious.
class Solution:
def __init__(self):
self.array = []
self.other_array = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
def knapSack(self,W, wt, val, n,j=0):
index = n-1
deep = f'''{' '*j*3}'''
print(f'{deep}level {j}')
print(f'{deep}{W} available: considering {wt[index]},{val[index]}, {n})')
# minor change here but has no affect on the outcome0
#if n == 0 or W == 0 :
if n == 0:
print(f'{deep}Base case found')
return 0
print(f'''{deep}{wt[index]} > {W} --> {wt[index] > W}''')
if (wt[index] > W):
print(f'{deep}too heavy')
self.array.append(0)
self.other_array[index] = 0
choice = self.knapSack(W, wt, val, index,j+1)
else:
print(f'{deep}Going down the option A hole')
option_A = val[index] + self.knapSack( W-wt[index], wt, val, index,j+1)
print(f'{deep}Going down the option B hole')
option_B = self.knapSack(W, wt, val, index,j+1)
print(f'{deep}option A:{option_A} option B:{option_B}')
if option_A > option_B:
print(f'{deep}option A wins')
self.array.append(1)
self.other_array[index] = 1
choice = option_A
else:
print(f'{deep}option B wins')
self.array.append(0)
self.other_array[index] = 0
choice = option_B
print(f'{deep}level {j} Returning value={choice}')
print(f'{deep}---------------------------------------------')
return choice

Related

Sorting from smallest to biggest

I would like to sort several points from smallest to biggest however.
I will wish to get this result:
Drogba 2 pts
Owen 4 pts
Henry 6 pts
However, my ranking seems to be reversed for now :-(
Henry 6 pts
Owen 4 pts
Drogba 2 pts
I think my problem is with my function Bubblesort ?
def Bubblesort(name, goal1, point):
swap = True
while swap:
swap = False
for i in range(len(name)-1):
if goal1[i+1] > goal1[i]:
goal1[i], goal1[i+1] = goal1[i+1], goal1[i]
name[i], name[i+1] = name[i+1], name[i]
point[i], point[i + 1] = point[i + 1], point[i]
swap = True
return name, goal1, point
def ranking(name, point):
for i in range(len(name)):
print(name[i], "\t" , point[i], " \t ")
name = ["Henry", "Owen", "Drogba"]
point = [0]*3
goal1 = [68, 52, 46]
gain = [6,4,2]
name, goal1, point = Bubblesort( name, goal1, point )
for i in range(len(name)):
point[i] += gain[i]
ranking (name, point)

In your code:
if goal1[i+1] > goal1[i]:
that checks if it is greater. You need to swap it if the next one is less, not greater.
Change that to:
if goal1[i+1] < goal1[i]:

A bunch of issues:
def Bubblesort - PEP8 says function names should be lowercase, ie def bubblesort
You are storing your data as a bunch of parallel lists; this makes it harder to work on and think about (and sort!). You should transpose your data so that instead of having a list of names, a list of points, a list of goals you have a list of players, each of whom has a name, points, goals.
def bubblesort(name, goal1, point): - should look like def bubblesort(items) because bubblesort does not need to know that it is getting names and goals and points and sorting on goals (specializing it that way keeps you from reusing the function later to sort other things). All it needs to know is that it is getting a list of items and that it can compare pairs of items using >, ie Item.__gt__ is defined.
Instead of using the default "native" sort order, Python sort functions usually let you pass an optional key function which allows you to tell it what to sort on - that is, sort on key(items[i]) > key(items[j]) instead of items[i] > items[j]. This is often more efficient and/or convenient than reshuffling your data to get the sort order you want.
for i in range(len(name)-1): - you are iterating more than needed. After each pass, the highest value in the remaining list gets pushed to the top (hence "bubble" sort, values rise to the top of the list like bubbles). You don't need to look at those top values again because you already know they are higher than any of the remaining values; after the nth pass, you can ignore the last n values.
actually, the situation is a bit better than that; you will often find runs of values which are already in sorted order. If you keep track of the highest index that actually got swapped, you don't need to go beyond that on your next pass.
So your sort function becomes
def bubblesort(items, *, key=None):
"""
Return items in sorted order
"""
# work on a copy of the list (don't destroy the original)
items = list(items)
# process key values - cache the result of key(item)
# so it doesn't have to be called repeatedly
keys = items if key is None else [key(item) for item in items]
# initialize the "last item to sort on the next pass" index
last_swap = len(items) - 1
# sort!
while last_swap:
ls = 0
for i in range(last_swap):
j = i + 1
if keys[i] > keys[j]:
# have to swap keys and items at the same time,
# because keys may be an alias for items
items[i], items[j], keys[i], keys[j] = items[j], items[i], keys[j], keys[i]
# made a swap - update the last_swap index
ls = i
last_swap = ls
return items
You may not be sure that this is actually correct, so let's test it:
from random import sample
def test_bubblesort(tries = 1000):
# example key function
key_fn = lambda item: (item[2], item[0], item[1])
for i in range(tries):
# create some sample data to sort
data = [sample("abcdefghijk", 3) for j in range(10)]
# no-key sort
assert bubblesort(data) == sorted(data), "Error: bubblesort({}) gives {}".format(data, bubblesort(data))
# keyed sort
assert bubblesort(data, key=key_fn) == sorted(data, key=key_fn), "Error: bubblesort({}, key) gives {}".format(data, bubblesort(data, key_fn))
test_bubblesort()
Now the rest of your code becomes
class Player:
def __init__(self, name, points, goals, gains):
self.name = name
self.points = points
self.goals = goals
self.gains = gains
players = [
Player("Henry", 0, 68, 6),
Player("Owen", 0, 52, 4),
Player("Drogba", 0, 46, 2)
]
# sort by goals
players = bubblesort(players, key = lambda player: player.goals)
# update points
for player in players:
player.points += player.gains
# show the result
for player in players:
print("{player.name:<10s} {player.points:>2d} pts".format(player=player))
which produces
Drogba 2 pts
Owen 4 pts
Henry 6 pts

different result from recursive and dynamic programming

Working on below problem,
Problem,
Given a m * n grids, and one is allowed to move up or right, find the different paths between two grid points.
I write a recursive version and a dynamic programming version, but they return different results, and any thoughts what is wrong?
Source code,
from collections import defaultdict
def move_up_right(remaining_right, remaining_up, prefix, result):
if remaining_up == 0 and remaining_right == 0:
result.append(''.join(prefix[:]))
return
if remaining_right > 0:
prefix.append('r')
move_up_right(remaining_right-1, remaining_up, prefix, result)
prefix.pop(-1)
if remaining_up > 0:
prefix.append('u')
move_up_right(remaining_right, remaining_up-1, prefix, result)
prefix.pop(-1)
def move_up_right_v2(remaining_right, remaining_up):
# key is a tuple (given remaining_right, given remaining_up),
# value is solutions in terms of list
dp = defaultdict(list)
dp[(0,1)].append('u')
dp[(1,0)].append('r')
for right in range(1, remaining_right+1):
for up in range(1, remaining_up+1):
for s in dp[(right-1,up)]:
dp[(right,up)].append(s+'r')
for s in dp[(right,up-1)]:
dp[(right,up)].append(s+'u')
return dp[(right, up)]
if __name__ == "__main__":
result = []
move_up_right(2,3,[],result)
print result
print '============'
print move_up_right_v2(2,3)

In version 2 you should be starting your for loops at 0 not at 1. By starting at 1 you are missing possible permutations where you traverse the bottom row or leftmost column first.
Change version 2 to:
def move_up_right_v2(remaining_right, remaining_up):
# key is a tuple (given remaining_right, given remaining_up),
# value is solutions in terms of list
dp = defaultdict(list)
dp[(0,1)].append('u')
dp[(1,0)].append('r')
for right in range(0, remaining_right+1):
for up in range(0, remaining_up+1):
for s in dp[(right-1,up)]:
dp[(right,up)].append(s+'r')
for s in dp[(right,up-1)]:
dp[(right,up)].append(s+'u')
return dp[(right, up)]
And then:
result = []
move_up_right(2,3,[],result)
set(move_up_right_v2(2,3)) == set(result)
True
And just for fun... another way to do it:
from itertools import permutations
list(map(''.join, set(permutations('r'*2+'u'*3, 5))))

The problem with the dynamic programming version is that it doesn't take into account the paths that start from more than one move up ('uu...') or more than one move right ('rr...').
Before executing the main loop you need to fill dp[(x,0)] for every x from 1 to remaining_right+1 and dp[(0,y)] for every y from 1 to remaining_up+1.
In other words, replace this:
dp[(0,1)].append('u')
dp[(1,0)].append('r')
with this:
for right in range(1, remaining_right+1):
dp[(right,0)].append('r'*right)
for up in range(1, remaining_up+1):
dp[(0,up)].append('u'*up)

More on dynamic programming

Two weeks ago I posted THIS question here about dynamic programming. User Andrea Corbellini answered precisely what I wanted, but I wanted to take the problem one more step further.
This is my function
def Opt(n):
if len(n) == 1:
return 0
else:
return sum(n) + min(Opt(n[:i]) + Opt(n[i:])
for i in range(1, len(n)))
Let's say you would call
Opt( [ 1,2,3,4,5 ] )
The previous question solved the problem of computing the optimal value. Now,
instead of the computing the optimum value 33 for the above example, I want to print the way we got to the most optimal solution (path to the optimal solution). So, I want to print the indices where the list got cut/divided to get to the optimal solution in the form of a list. So, the answer to the above example would be :
[ 3,2,1,4 ] ( Cut the pole/list at third marker/index, then after second index, then after first index and lastly at fourth index).
That is the answer should be in the form of a list. The first element of the list will be the index where the first cut/division of the list should happen in the optimal path. The second element will be the second cut/division of the list and so on.
There can also be a different solution:
[ 3,4,2,1 ]
They both would still lead you to the correct output. So, it doesn't matter which one you printed. But, I have no idea how to trace and print the optimal path taken by the Dynamic Programming solution.
By the way, I figured out a non-recursive solution to that problem that was solved in my previous question. But, I still can't figure out to print the path for the optimal solution. Here is the non-recursive code for the previous question, it might be helpful to solve the current problem.
def Opt(numbers):
prefix = [0]
for i in range(1,len(numbers)+1):
prefix.append(prefix[i-1]+numbers[i-1])
results = [[]]
for i in range(0,len(numbers)):
results[0].append(0)
for i in range(1,len(numbers)):
results.append([])
for j in range(0,len(numbers)):
results[i].append([])
for i in range(2,len(numbers)+1): # for all lenghts (of by 1)
for j in range(0,len(numbers)-i+1): # for all beginning
results[i-1][j] = results[0][j]+results[i-2][j+1]+prefix[j+i]-prefix[j]
for k in range(1,i-1): # for all splits
if results[k][j]+results[i-2-k][j+k+1]+prefix[j+i]-prefix[j] < results[i-1][j]:
results[i-1][j] = results[k][j]+results[i-2-k][j+k+1]+prefix[j+i]-prefix[j]
return results[len(numbers)-1][0]

Here is one way of printing the selected :
I used the recursive solution using memoization provided by #Andrea Corbellini in your previous question. This is shown below:
cache = {}
def Opt(n):
# tuple objects are hashable and can be put in the cache.
n = tuple(n)
if n in cache:
return cache[n]
if len(n) == 1:
result = 0
else:
result = sum(n) + min(Opt(n[:i]) + Opt(n[i:])
for i in range(1, len(n)))
cache[n] = result
return result
Now, we have the cache values for all the tuples including the selected ones.
Using this, we can print the selected tuples as shown below:
selectedList = []
def printSelected (n, low):
if len(n) == 1:
# No need to print because it's
# already printed at previous recursion level.
return
minVal = math.Inf
minTupleLeft = ()
minTupleRight = ()
splitI = 0
for i in range(1, len(n)):
tuple1ToI = tuple (n[:i])
tupleiToN = tuple (n[i:])
if (cache[tuple1ToI] + cache[tupleiToN]) < minVal:
minVal = cache[tuple1ToI] + cache[tupleiToN]
minTupleLeft = tuple1ToI
minTupleRight = tupleiToN
splitI = low + i
print minTupleLeft, minTupleRight, minVal
print splitI # OP just wants the split index 'i'.
selectedList.append(splitI) # or add to the list as requested by OP
printSelected (list(minTupleLeft), low)
printSelected (list(minTupleRight), splitI)
You call the above method like shown below:
printSelected (n, 0)

How do I test a sum tree?

I have 2 lists. One contains values, the other contains the levels those values hold in a sum tree. (the lists have same length)
For example:
[40,20,5,15,10,10] and [0,1,2,2,1,1]
Those lists correctly correspond because
- 40
- - 20
- - - 5
- - - 15
- - 10
- - 10
(20+10+10) == 40 and (5+15) == 20
I need to check if a given list of values and a list of its levels corresponds correctly. So far I have managed to put together this function, but for some reason it's not returning True for correct lists array and numbers. Input numbers here would be [40,20,5,15,10,10] and array would be [0,1,2,2,1,1]
def testsum(array, numbers):
k = len(array)
target = [0]*k
subsum = [0]*k
for x in range(0, k):
if target[array[x]]!=subsum[array[x]]:
return False
target[array[x]]=numbers[x]
subsum[array[x]]=0
if array[x]>0:
subsum[array[x]-1]+=numbers[x]
for x in range(0, k):
if(target[x]!=subsum[x]):
print(x, target[x],subsum[x])
return False
return True

I got this running using itertools.takewhile to grab the subtree under each level. Toss that into a recursive function and assert that all recursions pass.
I've slightly improved my initial implementation by grabbing a next_v and next_l and testing early to see if the current node is a parent node and only building subtree if there's something to build. That inequality check is much cheaper than iterating through the whole vs_ls zip.
import itertools
def testtree(values, levels):
if len(values) == 1:
# Last element, always true!
return True
vs_ls = zip(values, levels)
test_v, test_l = next(vs_ls)
next_v, next_l = next(vs_ls)
if next_l > test_l:
subtree = [v for v,l in itertools.takewhile(
lambda v_l: v_l[1] > test_l,
itertools.chain([(next_v, next_l)], vs_ls))
if l == test_l+1]
if sum(subtree) != test_v and subtree:
#TODO test if you can remove the "and subtree" check now!
print("{} != {}".format(subtree, test_v))
return False
return testtree(values[1:], levels[1:])
if __name__ == "__main__":
vs = [40, 20, 15, 5, 10, 10]
ls = [0, 1, 2, 2, 1, 1]
assert testtree(vs, ls) == True
It unfortunately adds a lot of complexity to the code since it pulls out the first value that we need, which necessitates an extra itertools.chain call. That's not ideal. Unless you're expecting to get very large lists for values and levels, it might be worthwhile to do vs_ls = list(zip(values, levels)) and approach this list-wise rather than iterator-wise. e.g...
...
vs_ls = list(zip(values, levels))
test_v, test_l = vs_ls[0]
next_v, next_l = vs_ls[1]
...
subtree = [v for v,l in itertools.takewhile(
lambda v_l: v_l[1] > test_l,
vs_ls[1:]) if l == test_l+1]
I still think the fastest way is probably to iterate once with an approach almost like a state machine and grab all the possible subtrees, then check them all individually. Something like:
from collections import namedtuple
Tree = namedtuple("Tree", ["level_num", "parent", "children"])
# equivalent to
# # class Tree:
# # def __init__(self, level_num: int,
# # parent: int,
# # children: list):
# # self.level_num = level_num
# # self.parent = parent
# # self.children = children
def build_trees(values, levels):
trees = [] # list of Trees
pending_trees = []
vs_ls = zip(values, levels)
last_v, last_l = next(vs_ls)
test_l = last_l + 1
for v, l in zip(values, levels):
if l > last_l:
# we've found a new tree
if l != last_l + 1:
# What do you do if you get levels like [0, 1, 3]??
raise ValueError("Improper leveling: {}".format(levels))
test_l = l
# Stash the old tree and start a new one.
pending_trees.append(cur_tree)
cur_tree = Tree(level_num=last_l, parent=last_v, children=[])
elif l < test_l:
# tree is finished
# Store the finished tree and grab the last one we stashed.
trees.append(cur_tree)
try:
cur_tree = pending_trees.pop()
except IndexError:
# No trees pending?? That's weird....
# I can't think of any case that this should happen, so maybe
# we should be raising ValueError here, but I'm not sure either
cur_tree = Tree(level_num=-1, parent=-1, children=[])
elif l == test_l:
# This is a child value in our current tree
cur_tree.children.append(v)
# Close the pending trees
trees.extend(pending_trees)
return trees
This should give you a list of Tree objects, each of which having the following attributes
level_num := level number of parent (as found in levels)
parent := number representing the expected sum of the tree
children := list containing all the children in that level
After you do that, you should be able to simply check
all([sum(t.children) == t.parent for t in trees])
But note that I haven't been able to test this second approach.

Calculating items included in branch and bound knapsack

Using a branch and bound algorithm I have evaluated the optimal profit from a given set of items, but now I wish to find out which items are included in this optimal solution. I'm evaluating the profit value of the optimal knapsack as follows (adapted from here):
import Queue
class Node:
def __init__(self, level, profit, weight):
self.level = level # The level within the tree (depth)
self.profit = profit # The total profit
self.weight = weight # The total weight
def solveKnapsack(weights, profits, knapsackSize):
numItems = len(weights)
queue = Queue.Queue()
root = Node(-1, 0, 0)
queue.put(root)
maxProfit = 0
bound = 0
while not queue.empty():
v = queue.get() # Get the next item on the queue
uLevel = v.level + 1
u = Node(uLevel, v.profit + e[uLevel][1], v.weight + e[uLevel][0])
bound = getBound(u, numItems, knapsackSize, weights, profits)
if u.weight <= knapsackSize and u.profit > maxProfit:
maxProfit = uProfit
if bound > maxProfit:
queue.put(u)
u = Node(uLevel, v.profit, v.weight)
bound = getBound(u, numItems, knapsackSize, weights, profits)
if (bound > maxProfit):
queue.put(u)
return maxProfit
# This is essentially the brute force solution to the fractional knapsack
def getBound(u, numItems, knapsackSize, weight, profit):
if u.weight >= knapsackSize: return 0
else:
upperBound = u.profit
totalWeight = u.weight
j = u.level + 1
while j < numItems and totalWeight + weight[j] <= C:
upperBound += profit[j]
totalWeight += weights[j]
j += 1
if j < numItems:
result += (C - totalWeight) * profit[j]/weight[j]
return upperBound
So, how can I get the items that form the optimal solution, rather than just the profit?

I got this working using your code as the starting point. I defined my Node class as:
class Node:
def __init__(self, level, profit, weight, bound, contains):
self.level = level # current level of our node
self.profit = profit
self.weight = weight
self.bound = bound # max (optimistic) value our node can take
self.contains = contains # list of items our node contains
I then started my knapsack solver similarly, but initalized root = Node(0, 0, 0, 0.0, []). The value root.bound could be a float, which is why I initalized it to 0.0, while the other values (at least in my problem) are all integers. The node contains nothing so far, so I started it off with an empty list. I followed a similar outline to your code, except that I stored the bound in each node (not sure this was necessary), and updated the contains list using:
u.contains = v.contains[:] # copies the items in the list, not the list location
# Initialize u as Node(uLevel, uProfit, uWeight, 0.0, uContains)
u.contains.append(uLevel) # add the current item index to the list
Note that I only updated the contains list in the "taking the item" node. This is the first initialization in your main loop, preceding the first if bound > maxProfit: statement. I updated the contains list in the if: statement right before this, when you update the value of maxProfit:
if u.weight <= knapsackSize and u.value > maxProfit:
maxProfit = u.profit
bestList = u.contains
This stores the indices of the items you are taking to bestList. I also added the condition if v.bound > maxProfit and v.level < items-1 to the main loop right after v = queue.get() so that I do not keep going after I reach the last item, and I do not loop through branches that are not worth exploring.
Also, if you want to get a binary list output showing which items are selected by index, you could use:
taken = [0]*numItems
for item in bestList:
taken[item] = 1
print str(taken)
I had some other differences in my code, but this should enable you to get your chosen item list out.

I have been thinking about this for some time. Apparently, you have to add some methods inside your Node class that will assign the node_path and add the current level to it. You call your methods inside your loop and assign the path_list to your optimal_item_list when your node_weight is less than the capacity and its value is greater than the max_profit, ie where you assign the maxProfit. You can find the java implementation here

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding a cache array to recursive knapsack solution? - python

Related

Sorting from smallest to biggest

different result from recursive and dynamic programming

More on dynamic programming

How do I test a sum tree?

Calculating items included in branch and bound knapsack

Categories

Resources