I have a list named self.items where the elements are:
items = [dict(id=0, w=4, v=12),
dict(id=1, w=6, v=10),
dict(id=2, w=5, v=8),
dict(id=3, w=7, v=11),
dict(id=4, w=3, v=14),
dict(id=5, w=1, v=7),
dict(id=6, w=6, v=9)]
With this I had to do a list of lists, where every element has all the possible combinations including the empty case, so finally my list of lists has more or less this appearence:
[[],[{id:0,w:4,v:12}],....,[{id:0,w:4,v:12}, {id:1,w:6,v:10}]....]
Now I have to found a recursive function to search what combination of elements has the max weight permitted and the max value.
def recursive(self, n, max_weight):
""" Recursive Knapsack
:param n: Number of elements
:param max_weight: Maximum weight allowed
:return: max_value
"""
self.iterations += 1
result = 0
if max_weight > self.max_weight: #they gave me self.max_weight as a variable of __init__ method which shows me what is the maximum weight permitted
self.recursive(self, self.items+self.iterations, max_weight)
if max_weight < self.max_weight:
self.recursive(self, self.items+self.iterations, max_weight)
else:
result = self.items['v']+result
return result
I think that my error is in this line:
result = self.items['v']+result
But I cannot find it.
I just have found the solution to this recursive problem:
(I'm from Spain so the variable "cantidad" also means "quantity")
def recursive(self, n, max_weight):
""" Recursive Knapsack
:param n: Number of elements
:param max_weight: Maximum weight allowed
:return: max_valu
"""
self.iterations += 1
result = 0
cantidad = 0
quantity = 0
if max_weight == 0 or n == 1:
if max_weight >= self.items[n-1]['w'] :
cantidad= self.items[n-1]['v']
return max(cantidad,quantity)
else:
if max_weight >= self.items[n-1]['w']:
cantidad = self.items[n-1]['v']+self.recursive(n-1,max_weight-self.items[n-1]['w'])
quantity = self.recursive(n-1,max_weight)
result = max(cantidad, quantity)
return result
I put this code into a program proportioned by the university I am studying at and it returns to me the correct result:
Method: recursive
Iterations:107
Max value:44 expected max_value:44
Related
I have objects that store values are dataframes. I have been able to compare if values from two dataframes are within 10% of each other. However, I am having difficulty extending this to multiple dataframes. Moreover, I am wondering how I should apporach this problem if dataframes are not the same size?
def add_well_peak(self, *other):
if len(self.Bell) == len(other.Bell): #if dataframes ARE the same size
for k in range(len(self.Bell)):
for j in range(len(other.Bell)):
if int(self.Size[k]) - int(self.Size[k])*(1/10) <= int(other.Size[j]) <= int(self.Size[k]) + int(self.Size[k])*(1/10):
#average all
For example, in the image below, there are objects that contain dataframes (i.e., self, other1, other2). The colors represent matches (i.e, values that are within 10% of each other). If a match exist, then average the values. If a match does not exist still include the unmatch number. I want to be able to generalize this for any number of objects greater or equal than 2 (other 1, other 2, other 3, other ....). Any help would be appreciated. Please let me know if anything is unclear. This is my first time posting. Thanks again.
matching data
Results:
Using my solution on the dataframes of your image, I get the following:
Threshold outlier = 0.2:
0
0 1.000000
1 1493.500000
2 5191.333333
3 35785.333333
4 43586.500000
5 78486.000000
6 100000.000000
Threshold outlier = 0.5:
0 1
0 1.000000 NaN
1 1493.500000 NaN
2 5191.333333 NaN
3 43586.500000 35785.333333
4 78486.000000 100000.000000
Explanations:
The lines are averaged peaks, the columns representing the different values obtained for these peaks. I assumed the average emanating from the biggest number of elements was the legitimate one, and the rest within the THRESHOLD_OUTLIER were the outliers (should be sorted, the more probable you are as a legitimate peak, the more you are on the left (the 0th column is the most probable)). For instance, on line 3 of the 0.5 outlier threshold results, 43586.500000 is an average coming from 3 dataframes, while 35785.333333 comes from only 2, thus the most probable is the first one.
Issues:
The solution is quite complicated. I assume a big part of it could be removed, but I can't see how for the moment, and as it works, I'll certainly leave the optimization to you.
Still, I tried commenting my best, and if you have any question, do not hesitate!
Files:
CombinationLib.py
from __future__ import annotations
from typing import Dict, List
from Errors import *
class Combination():
"""
Support class, to make things easier.
Contains a string `self.combination` which is a binary number stored as a string.
This allows to test every combination of value (i.e. "101" on the list `[1, 2, 3]`
would signify grouping `1` and `3` together).
There are some methods:
- `__add__` overrides the `+` operator
- `compute_degree` gives how many `1`s are in the combination
- `overlaps` allows to verify if combination overlaps (use the same value twice)
(i.e. `100` and `011` don't overlap, while `101` and `001` do)
"""
def __init__(self, combination:str) -> Combination:
self.combination:str = combination
self.degree:int = self.compute_degree()
def __add__(self, other: Combination) -> Combination:
if self.combination == None:
return other.copy()
if other.combination == None:
return self.copy()
if self.overlaps(other):
raise CombinationsOverlapError()
result = ""
for c1, c2 in zip(self.combination, other.combination):
result += "1" if (c1 == "1" or c2 == "1") else "0"
return Combination(result)
def __str__(self) -> str:
return self.combination
def compute_degree(self) -> int:
if self.combination == None:
return 0
degree = 0
for bit in self.combination:
if bit == "1":
degree += 1
return degree
def copy(self) -> Combination:
return Combination(self.combination)
def overlaps(self, other:Combination) -> bool:
for c1, c2 in zip(self.combination, other.combination):
if c1 == "1" and c1 == c2:
return True
return False
class CombinationNode():
"""
The main class.
The main idea was to build a tree of possible "combinations of combinations":
100-011 => 111
|---010-001 => 111
|---001-010 => 111
At each node, the combination applied to the current list of values was to be acceptable
(all within THREASHOLD_AVERAGING).
Also, the shorter a path, the better the solution as it means it found a way to average
a lot of the values, with the minimum amount of outliers possible, maybe by grouping
the outliers together in a way that makes sense, ...
- `populate` fills the tree automatically, with every solution possible
- `path` is used mainly on leaves, to obtain the path taken to arrive there.
"""
def __init__(self, combination:Combination) -> CombinationNode:
self.combination:Combination = combination
self.children:List[CombinationNode] = []
self.parent:CombinationNode = None
self.total_combination:Combination = combination
def __str__(self) -> str:
list_paths = self.recur_paths()
list_paths = [",".join([combi.combination.combination for combi in path]) for path in list_paths]
return "\n".join(list_paths)
def add_child(self, child:CombinationNode) -> None:
if child.combination.degree > self.combination.degree and not self.total_combination.overlaps(child.combination):
raise ChildDegreeExceedParentDegreeError(f"{child.combination} > {self.combination}")
self.children.append(child)
child.parent = self
child.total_combination += self.total_combination
def path(self) -> List[CombinationNode]:
path = []
current = self
while current.parent != None:
path.append(current)
current = current.parent
path.append(current)
return path[::-1]
def populate(self, combination_dict:Dict[int, List[Combination]]) -> None:
missing_degrees = len(self.combination.combination)-self.total_combination.degree
if missing_degrees == 0:
return
for i in range(min(self.combination.degree, missing_degrees), 0, -1):
for combination in combination_dict[i]:
if not self.total_combination.overlaps(combination):
self.add_child(CombinationNode(combination))
for child in self.children:
child.populate(combination_dict)
def recur_paths(self) -> List[List[CombinationNode]]:
if len(self.children) == 0:
return [self.path()]
paths = []
for child in self.children:
for path in child.recur_paths():
paths.append(path)
return paths
Errors.py
class ChildDegreeExceedParentDegreeError(Exception):
pass
class CombinationsOverlapError(Exception):
pass
class ToImplementError(Exception):
pass
class UncompletePathError(Exception):
pass
main.py
from typing import Dict, List, Set, Tuple, Union
import pandas as pd
from CombinationLib import *
best_depth:int = -1
best_path:List[CombinationNode] = []
THRESHOLD_OUTLIER = 0.2
THRESHOLD_AVERAGING = 0.1
def verif_averaging_pct(combination:Combination, values:List[float]) -> bool:
"""
For a given combination of values, we must have all the values within
THRESHOLD_AVERAGING of the average of the combination
"""
avg = 0
for c,v in zip(combination.combination, values):
if c == "1":
avg += v
avg /= combination.degree
for c,v in zip(combination.combination, values):
if c == "1"and (v > avg*(1+THRESHOLD_AVERAGING) or v < avg*(1-THRESHOLD_AVERAGING)):
return False
return True
def recursive_check(node:CombinationNode, depth:int, values:List[Union[float, int]]) -> None:
"""
Here is where we preferencially ask for a small number of bigger groups
"""
global best_depth
global best_path
# If there are more groups than the current best way to do, stop
if best_depth != -1 and depth > best_depth:
return
# If all the values of the combination are not within THRESHOLD_AVERAGING, stop
if not verif_averaging_pct(node.combination, values):
return
# If we finished the list of combinations, and this way is the best, keep it, stop
if len(node.children) == 0:
if best_depth == -1 or depth < best_depth:
best_depth = depth
best_path = node.path()
return
# If we are still not finished (not every value has been used), continue
for cnode in node.children:
recursive_check(cnode, depth+1, values)
def groups_from_list(values:List[Union[float, int]]) -> List[List[Union[float, int]]]:
"""
From a list of values, get the smallest list of groups of elements
within THRESHOLD_AVERAGING of each other.
It implies that we will try and recursively find the biggest group possible
within the unsused values (i.e. groups with combinations of size [3, 1] are prefered
over [2, 2])
"""
global best_depth
global best_path
groups:List[List[float]] = []
# Generate all the combinations (I used binary for this)
combination_dict:Dict[int, List[Combination]] = {}
for i in range(1, 2**len(values)):
combination = format(i, f"0{len(values)}b") # Here is the binary conversion
counter = 0
for c in combination:
if c == "1":
counter += 1
if counter not in combination_dict:
combination_dict[counter] = []
combination_dict[counter].append(Combination(combination))
# Generate of the combinations of combinations that use all values (without using one twice)
combination_trees:List[List[CombinationNode]] = []
for key in combination_dict:
for combination in combination_dict[key]:
cn = CombinationNode(combination)
cn.populate(combination_dict)
combination_trees.append(cn)
best_depth = -1
best_path = None
for root in combination_trees:
recursive_check(root, 0, values)
# print(",".join([combination.combination.combination for combination in best_path]))
for combination in best_path:
temp = []
for c,v in zip(combination.combination.combination, values):
if c == "1":
temp.append(v)
groups.append(temp)
return groups
def averages_from_groups(gs:List[List[Union[float, int]]]) -> List[float]:
"""Computing the averages of each group"""
avgs:List[float] = []
for group in gs:
avg = 0
for elt in group:
avg += elt
avg /= len(group)
avgs.append(avg)
return avgs
def end_check(ds:List[pd.DataFrame], ids:List[int]) -> bool:
"""Check if we finished consuming all the dataframes"""
for d,i in zip(ds, ids):
if i < len(d[0]):
return False
return True
def search(group:List[Union[float, int]], values_list:List[Union[float, int]]) -> List[int]:
"""Obtain all the indices corresponding to a set of values"""
# We will get all the indices in values_list of the values in group
# If a value is present in group, all the occurences of this value will be too,
# so we can use a set and search every occurence for each value.
indices:List[int] = []
group_set = set(group)
for value in group_set:
for i,v in enumerate(values_list):
if value == v:
indices.append(i)
return indices
def threshold_grouper(total_list:List[Union[float, int]]) -> pd.DataFrame:
"""Building a 2D pd.DataFrame with the averages (x) and the outliers (y)"""
result_list:List[List[Union[float, int]]] = [[total_list[0]]]
result_index = 0
total_index = 1
while total_index < len(total_list):
# Only checking if the bigger one is within THRESHOLD_OUTLIER of the little one.
# If it is the case, the opposite is true too.
# If yes, it is an outlier
if result_list[result_index][0]*(1+THRESHOLD_OUTLIER) >= total_list[total_index]:
result_list[result_index].append(total_list[total_index])
# Else it is a new peak
else:
result_list.append([total_list[total_index]])
result_index += 1
total_index += 1
result:pd.DataFrame = pd.DataFrame(result_list)
return result
def dataframes_merger(dataframes:List[pd.DataFrame]) -> pd.DataFrame:
"""Merging the dataframes, with THRESHOLDS"""
# Store the averages for the within 10% cells, in ascending order
result = []
# Keep tabs on where we are regarding each dataframe (needed for when we skip cells)
curr_indices:List[int] = [0 for _ in range(len(dataframes))]
# Repeat until all the cells in every dataframe has been seen once
while not end_check(dataframes, curr_indices):
# Get the values of the current indices in the dataframes
curr_values = [dataframe[0][i] for dataframe,i in zip(dataframes, curr_indices)]
# Get the largest 10% groups from the current list of values
groups = groups_from_list(curr_values)
# Compute the average of these groups
avgs = averages_from_groups(groups)
# Obtain the minimum average...
avg_min = min(avgs)
# ... and its index
avg_min_index = avgs.index(avg_min)
# Then get the group corresponding to the minimum average
avg_min_group = groups[avg_min_index]
# Get the indices of the values included in this group
indices_to_increment = search(avg_min_group, curr_values)
# Add the average to the result merged list
result.append(avg_min)
# For every element in the average we added, increment the corresponding index
for index in indices_to_increment:
curr_indices[index] += 1
# Re-assemble the dataframe, taking the threshold% around average into account
result = threshold_grouper(result)
print(result)
df1 = pd.DataFrame([1, 1487, 5144, 35293, 78486, 100000])
df2 = pd.DataFrame([1, 1500, 5144, 36278, 45968, 100000])
df3 = pd.DataFrame([1, 5286, 35785, 41205, 100000])
dataframes_merger([df3, df2, df1])
I created my first Python program and I suspect something is wrong. The execution time of the testovanie() method was 2 hour. In Java same code was time 10 min.
The implementation must be in two classes. And the implementation of each algorithm must be as written (if there is no problem).
Can you help me fix the execution time?
First Class
class Algoritmy:
"""
The Algorithms class creates an array of Integer numbers. Contains methods for working with the field (sorting).
"""
def __init__(self, velkostPola):
"""
Constructor that initializes attributes.
:param velkostPola: array size.
"""
self.velkostPola = velkostPola
self.poleCisel = []
def nacitajZoSuboru(self, nazov):
"""
The method reads integer values from a file and writes them to the field.
:param nazov: a string that contains the name of the file from which the values are read into the field.
:type nazov: string
"""
f = open(nazov, 'r+')
self.poleCisel = f.readlines()
self.poleCisel = [int(i) for i in self.poleCisel]
f.close()
def toString(self):
"""
A method that serves as a text representation (listing) of the entire array of Numbers fields.
"""
for x in self.poleCisel:
print(x)
def bubbleSort(self):
"""
The method sorts arrays according to a bubble algorithm with complexity n ^ 2.
Compares adjacent values if the first value is greater than the second value
replace them. The algorithm is repeated until it sorts the entire field from the smallest to the largest.
"""
n = len(self.poleCisel)
for i in range(0, n):
for j in range(0, n - i - 1):
if self.poleCisel[j] > self.poleCisel[j + 1]:
self.vymena(j,j+1)
def insertionSort(self):
"""
The method classifies the array according to the insertion algorithm with complexity n ^ 2.
Compares the next value with the previous one and places it after
a value that is less than. Repeats until it sorts the entire field from
smallest to largest.
"""
n = len(self.poleCisel)
for i in range(n):
pom = self.poleCisel[i]
j = i - 1
while (j >= 0) and (pom < self.poleCisel[j]):
self.poleCisel[j + 1] = self.poleCisel[j]
j -= 1
self.poleCisel[j + 1] = pom
def quickSort(self, najm, najv):
"""
The method sorts arrays according to a quick algorithm with complexity n * (log2 n).
The algorithm chooses the pivot. The elements are so smaller that there are smaller and on the left side
larger elements on the right.
:param najm: najmI celociselna lowest index
:type najm: integer
:param najv: najvI highest index
:type najv: integer
"""
i = najm
j = najv
pivot = self.poleCisel[najm + (najv - najm) // 2]
while i <= j:
while self.poleCisel[i] < pivot:
i += 1
while self.poleCisel[j] > pivot:
j -= 1
if i <= j:
self.vymena(i, j)
i += 1
j -= 1
if najm < j:
self.quickSort(najm, j)
if i < najv:
self.quickSort(i, najv)
def vymena(self, i, j):
"""
An auxiliary procedure that ensures the exchange of element i for element j in the array.
:param i: jeden prvok pola
:type i: integer
:param j: druhy prvok pola
:type j: integer
"""
pom = self.poleCisel[i]
self.poleCisel[i] = self.poleCisel[j]
self.poleCisel[j] = pom
def selectionSort(self):
"""
The method classifies the array according to a selection algorithm with complexity n ^ 2.
The algorithm finds the largest value and exchanges it with the last element. He will find
always the highest value among unsorted elements and exchanges it with
the last unsorted element.
"""
for i in reversed(range(0, len(self.poleCisel))):
prvy = 0
for j in range(0, i):
if self.poleCisel[j] > self.poleCisel[prvy]:
prvy = j
self.vymena(prvy,i)
def shellSort(self, n):
"""
The method classifies the array according to a shell algorithm with complexity n ^ 2
Gradually, the elements distant from each other are compared by a space - at the beginning there is a space = n / 2,
where n is the size of the field we are sorting. If the left element being compared is larger than the right one being compared,
so for replacement. Then the gap is reduced and the procedure is repeated.
:param n: size of array
:type n: integer
"""
medzera = n // 2
while medzera > 0:
i = medzera
for i in range(0, n):
pom = self.poleCisel[i]
j = i
while (j >= medzera) and (self.poleCisel[j - medzera] > pom):
self.poleCisel[j] = self.poleCisel[j - medzera]
j = j - medzera
self.poleCisel[j] = pom
medzera = medzera // 2
def heapSort(self):
"""
The method sorts arrays according to the heap algorithm with complexity n * (log n).
The algorithm adds elements to the heap where it stores them at the end of the heap. Unless
the previous element is larger, then the elements are replaced until it is
predecessor smaller. This is repeated until a sorted field is created.
"""
n = len(self.poleCisel)
for k in reversed(range(1, n // 2)):
self.maxHeapify(k, n)
while True:
self.vymena(0,n-1)
n = n - 1
self.maxHeapify(1, n)
if (n < 1):
break
def maxHeapify(self, otecI, n):
"""
This method serves to preserve the properties of Heap.
:param otecI: index otca
:type otecI: integer
:param n: nastavenie vacsich prvkov
:type n: integer
"""
otec = self.poleCisel[otecI - 1]
while otecI <= n // 2:
lavySyn = otecI + otecI
if (lavySyn < n) and (self.poleCisel[lavySyn - 1] < self.poleCisel[lavySyn]):
lavySyn += 1
if otec >= self.poleCisel[lavySyn - 1]:
break
else:
self.poleCisel[otecI - 1] = self.poleCisel[lavySyn - 1]
otecI = lavySyn
self.poleCisel[otecI - 1] = otec
Second Class
from Algoritmy import Algoritmy
import time
class Praca:
def __init__(self):
self.casB = []
self.casQ = []
self.casS = []
self.casI = []
self.casSh = []
self.casH = []
def vypisPriemer(self):
"""
A method that calculates and prints the averages of the algorithm duration from the time field.
"""
sumB = 0;sumQ = 0;sumS = 0;sumI = 0;sumSh = 0;sumH = 0
for j in range(0, 200):
sumB += self.casB[j]
sumQ += self.casQ[j]
sumS += self.casS[j]
sumI += self.casI[j]
sumSh += self.casSh[j]
sumH += self.casH[j]
priemerB = sumB / 200
priemerQ = sumQ / 200
priemerS = sumS / 200
priemerI = sumI / 200
priemerSh = sumSh / 200
priemerH = sumH / 200
print("Bubble Sort alg. priemer: %10.9f" %priemerB)
print("Quick Sort alg. priemer: %10.9f"%priemerQ)
print("Selection Sort alg. priemer: %10.9f"%priemerS)
print("Insertion Sort alg. priemer: %10.9f"%priemerI)
print("Shell Sort alg. priemer: %10.9f"%priemerSh)
print("Heap Sort alg. priemer: %10.9f"%priemerH)
def replikacie(self,velkost, nazovS):
"""
The method is aimed at performing 200 replications for each single algorithm.
Collects and stores the execution time of the algorithm in the field.
:param velkost: array size
:type velkost: integer
:param nazovS: file name
:type nazovS: string
"""
self.casB.clear()
self.casQ.clear()
self.casS.clear()
self.casI.clear()
self.casSh.clear()
self.casH.clear()
praca = Algoritmy(velkost)
for i in range(0, 200):
praca.nacitajZoSuboru(nazovS)
zaciatok=time.time()
praca.bubbleSort()
self.casB.append(time.time() - zaciatok)
praca.nacitajZoSuboru(nazovS)
zaciatok=time.time()
praca.quickSort(0, praca.velkostPola-1)
self.casQ.append(time.time() - zaciatok)
praca.nacitajZoSuboru(nazovS)
zaciatok=time.time()
praca.selectionSort()
self.casS.append(time.time() - zaciatok)
praca.nacitajZoSuboru(nazovS)
zaciatok=time.time()
praca.insertionSort()
self.casI.append(time.time() - zaciatok)
praca.nacitajZoSuboru(nazovS)
zaciatok=time.time()
praca.shellSort(praca.velkostPola)
self.casSh.append(time.time() - zaciatok)
praca.nacitajZoSuboru(nazovS)
zaciatok=time.time()
praca.heapSort()
self.casH.append(time.time() - zaciatok)
def testovanie(self):
"""
Testing
"""
self.replikacie(10000,"neutr10000.txt")
print("Neutriedene 10000")
self.vypisPriemer()
def main(self):
zaciatok = time.time()
self.testovanie()
print(time.time() - zaciatok)
"""
Run
"""
if __name__ == '__main__':
praca = Praca()
praca.main()
If you have any improvements, don't be shy to tell me, if I said it's my first Python program. Be nice to me :)
A more condensed MRE would make it easier to comment on the specific statements, but my guess is that your example just illustrates that Python is slow for certain use cases.
This kind of number crunching in pure-Python loops is the nightmare scenario for Python, at least for the most popular CPython implementation.
There are, however, different ways you could speed this up if you diverge a bit from pure CPython:
Use PyPy JIT to run your program instead of CPython. PyPy usually speeds your code ~3-5x, but for numeric stuff like yours you can get an even more impressive speed bump.
Use numeric libraries to vectorize your code and/or offload common operations to optimized routines (written in C, Fortran or even assembly). Numpy is a popular choice.
Rewrite your program, or at least the "hottest" code paths, in Cython cdef functions and classes, see, e.g., https://cython.readthedocs.io/en/latest/src/tutorial/cython_tutorial.html.
You may want to check out Numba, but I have no experience with it.
I'm familiar with the naive recursive solution to the knapsack problem. However, this solution simply spits out the max value that can be stored in the knapsack given its weight constraints. What I'd like to do is add some form of metadata cache (namely which items have/not been selected, using a "one-hot" array [0,1,1]).
Here's my attempt:
class Solution:
def __init__(self):
self.array = []
def knapSack(self,W, wt, val, n):
index = n-1
if n == 0 or W == 0 :
return 0
if (wt[index] > W):
self.array.append(0)
choice = self.knapSack(W, wt, val, index)
else:
option_A = val[index] + self.knapSack( W-wt[index], wt, val, index)
option_B = self.knapSack(W, wt, val, index)
if option_A > option_B:
self.array.append(1)
choice = option_A
else:
self.array.append(0)
choice = option_B
print(int(option_A > option_B)) #tells you which path was traveled
return choice
# To test above function
val = [60, 100, 120]
wt = [10, 20, 30]
W = 50
n = len(val)
# print(knapSack(W, wt, val, n))
s = Solution()
s.knapSack(W, wt, val, n)
>>>
1
1
1
1
1
1
220
s.array
>>>
[1, 1, 1, 1, 1, 1]
As you can see, s.array returns [1,1,1,1,1,1] and this tells me a few things. (1), even though there are only three items in the problem set, the knapSack method has been called twice for each item and (2) this is because every item flows through the else statement in the method, so option_A and option_B are each computed for each item (explaining why the array length is 6 not 3.)
I'm confused as to why 1 has been appended in every recursive loop. The item at index 0 would is not selected in the optimal solution. To answer this question, please provide:
(A) Why the current solution is behaving this way
(B) How the code can be restructured such that a one-hot "take or don't take" vector can be captured, representing whether a given item goes in the knapsack or not.
Thank you!
(A) Why the current solution is behaving this way
self.array is an instance attribute that is shared by all recursion paths. On one path or another each item is taken and so a one is appended to the list.
option_A = val[index]... takes an item but doesn't append a one to the list.
option_B = self..... skips an item but doesn't append a zero to the list.
if option_A > option_B: When you make this comparison you have lost the information that made it - the items that were taken/discarded in the branch;
in the suites you just append a one or a zero regardless of how many items made those values.
The ones and zeroes then represent whether branch A (1) or branch B (0) was successful in the current instance of the function.
(B) How the code can be restructured such that a one-hot "take or don't take" vector can be captured, representing whether a given item goes in the knapsack or not.
It would be nice to know what you have taken after running through the analysis, I suspect that is what you are trying to do with self.array. You expressed an interest in OOP: instead of keeping track with lists of numbers using indices to select numbers from the lists, make objects to represent the items work with those. Keep the objects in containers and use the functionality of the container to add or remove items/objects from it. Consider how you are going to use a container before choosing one.
Don't put the function in a class.
Change the function's signature to accept
available weight,
a container of items to be considered,
a container holding the items currently in the sack (the current sack).
Use a collections.namedtuple or a class for the items having value and weight attributes.
Item = collections.namedtuple('Item',['wt','val'])
When an item is taken add it to the current sack.
When recursing
if going down the take path add the return value from the call to the current sack
remove the item that was just considered from the list of items to be considered argument.
if taken subtract the item's weight from the available weight argument
When comparing two branches you will need to add up the values of each item the current sack.
return the sack with the highest value
carefully consider the base case
Make the items to be considered like this.
import collections
Item = collections.namedtuple('Item',['wt','val'])
items = [Item(wght,value) for wght,value in zip(wt,val)]
Add up values like this.
value = sum(item.val for item in current_sack)
# or
import operator
val = operator.itemgetter('val')
wt = operator.itemgetter('wt')
value = sum(map(val,current_sack)
Your solution enhanced with debugging prints for the curious.
class Solution:
def __init__(self):
self.array = []
self.other_array = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
def knapSack(self,W, wt, val, n,j=0):
index = n-1
deep = f'''{' '*j*3}'''
print(f'{deep}level {j}')
print(f'{deep}{W} available: considering {wt[index]},{val[index]}, {n})')
# minor change here but has no affect on the outcome0
#if n == 0 or W == 0 :
if n == 0:
print(f'{deep}Base case found')
return 0
print(f'''{deep}{wt[index]} > {W} --> {wt[index] > W}''')
if (wt[index] > W):
print(f'{deep}too heavy')
self.array.append(0)
self.other_array[index] = 0
choice = self.knapSack(W, wt, val, index,j+1)
else:
print(f'{deep}Going down the option A hole')
option_A = val[index] + self.knapSack( W-wt[index], wt, val, index,j+1)
print(f'{deep}Going down the option B hole')
option_B = self.knapSack(W, wt, val, index,j+1)
print(f'{deep}option A:{option_A} option B:{option_B}')
if option_A > option_B:
print(f'{deep}option A wins')
self.array.append(1)
self.other_array[index] = 1
choice = option_A
else:
print(f'{deep}option B wins')
self.array.append(0)
self.other_array[index] = 0
choice = option_B
print(f'{deep}level {j} Returning value={choice}')
print(f'{deep}---------------------------------------------')
return choice
I have some calculations on biological data. Each function calculates the total, average, min, max values for one list of objects.
The idea is that I have a lot of different lists each one is for a different object type.
I don't want to repeat my code for every function just changing the "for" line and the call of the object's method!
For example:
Volume function:
def calculate_volume(self):
total = 0
min = sys.maxint
max = -1
compartments_counter = 0
for n in self.nodes:
compartments_counter += 1
current = n.get_compartment_volume()
if min > current:
min = current
if max < current:
max = current
total += current
avg = float(total) / compartments_counter
return total, avg, min, max
Contraction function:
def get_contraction(self):
total = 0
min = sys.maxint
max = -1
branches_count = self.branches.__len__()
for branch in self.branches:
current = branch.get_contraction()
if min > current:
min = current
if max < current:
max = current
total += current
avg = float(total) / branches_count
return total, avg, min, max
Both functions look almost the same, just a little modification!
I know I can use the sum, min, max, ... etc. but when I apply them for my values they take more time than doing them in the loop because they can't be called at once.
I just want to know if is it the right way to write a function for every calculation? (i.e. a professional way?) Or maybe I can write one function and pass the list, object type and the method to call.
It's hard to say without seeing the rest of the code but from the limited view given I'd reckon you shouldn't have these functions in methods at all. I also really don't understand your reasoning for not using the builtins("they can't be called at once?"). If you're implying that implementing the 4 statistical methods in a single pass in python is faster than 4 passes in builtin (C) then I'm afraid you have a very wrong assumption.
That said, here's my take on the problem:
def get_stats(l):
s = sum(l)
return (
s,
float(s) / len(l),
min(l),
max(l))
# then create numeric lists from your data and send 'em through:
node_volumes = [n.get_compartment_volume() for n in self.nodes]
branches = [b.get_contraction() for b in self.branches]
# ...
total_1, avg_1, min_1, max_1 = get_stats(node_volumes)
total_2, avg_2, min_2, max_2 = get_stats(branches)
EDIT
Some benchmarks to prove that builtin is win:
MINE.py
import sys
def get_stats(l):
s = sum(l)
return (
s,
float(s) / len(l),
min(l),
max(l)
)
branches = [i for i in xrange(10000000)]
print get_stats(branches)
Versus YOURS.py
import sys
branches = [i for i in xrange(10000000)]
total = 0
min = sys.maxint
max = -1
branches_count = branches.__len__()
for current in branches:
if min > current:
min = current
if max < current:
max = current
total += current
avg = float(total) / branches_count
print total, avg, min, max
And finally with some timers:
smassey#hacklabs:/tmp $ time python mine.py
(49999995000000, 4999999.5, 0, 9999999)
real 0m1.225s
user 0m0.996s
sys 0m0.228s
smassey#hacklabs:/tmp $ time python yours.py
49999995000000 4999999.5 0 9999999
real 0m2.369s
user 0m2.180s
sys 0m0.180s
Cheers
First, notice that while it is probably more efficient to call len(self.branches) (don't call __len__ directly), it is more general to increment a counter in the loop like you do with calculate_volume. With that change, you can refactor as follows:
def _stats(self, iterable, get_current):
total = 0.0
min_value = None # Slightly better
max_value = -1
counter = 0
for n in iterable:
counter += 1
current = get_current(n)
if min_value is None or min_value > current:
min_value = current
if max_value < current:
max_value = current
total += current
avg = total / denom
return total, avg, min_value, max_value
Now, each of the two can be implemented in terms of _stats:
import operator
def calculate_volume(self):
return self._stats(self.nodes, operator.methodcaller('get_compartment_volume'))
def get_contraction(self):
return self.refactor(self.branches, operator.methodcaller('get_contraction'))
methodcaller provides a function f such that f('method_name')(x) is equivalent to x.method_name(), which allows you to factor out the method call.
You can use getattr( instance, methodname) to write a function to process lists of arbitrary objects.
def averager( things, methodname):
count,total,min,max = 0,0,sys.maxint,-1
for thing in things:
current = getattr(thing, methodname)()
count += 1
if min > current:
min = current
if max < current:
max = current
total += current
avg = float(total) / branches_count
return total, avg, min, max
Then inside your class definitions you just need
def calculate_volume(self): return averager( self.nodes, 'get_compartment_volume')
def get_contraction(self): return averager( self.branches, 'get_contraction' )
Writing a function that takes another function that knows how to extract values from the list is very common. In fact, min and max both take arguments to such and effect.
eg.
items = [1, 0, -2]
print(max(items, key=abs)) # prints -2
So it's perfectly acceptable to write your own function that does the same. Normally, I would just create a new list of all the values you want to examine and then work with that (eg. [branch.get_contraction() for branch in branches]). But perhaps space is an issue for you, so here is an example using a generator.
def sum_avg_min_max(iterable, key=None):
if key is not None:
iter_ = (key(item) for item in iterable)
else:
# if there is no key, just use the iterable itself
iter_ = iter(iterable)
try:
# We don't know sensible starting values for total, min or max. So use
# the first value.
total = min_ = max_ = next(iter_)
except StopIteration:
# can't have a min or max if we have no items in the iterable...
raise ValueError("empty iterable") from None
count = 1
for item in iter_:
total += item
min_ = min(min_, item)
max_ = max(max_, item)
count += 1
return total, float(total) / count, min_, max_
Then you might use it like this:
class MyClass(int):
def square(self):
return self ** 2
items = [MyClass(i) for i in range(10)]
print(sum_avg_min_max(items, key=MyClass.square)) # prints (285, 28.5, 0, 81)
This works because when you fetch an instance method from the class it gives your underlying function itself (without self bound). So we can use it as the key. eg.
str.upper("hello world") == "hello world".upper()
With a more concrete example (assuming items in branches are instances of Branch):
def get_contraction(self):
result = sum_avg_min_max(self.branches, key=Branch.get_contraction)
return result
Or maybe I can write one function and pass the list, object type and the method to call.
Altough you can definitely pass a function to function, and it's actually a very common way to avoid repeating yourself, in this case you can't because each object in the list has it's own method. So instead, I'm passing the function's name as a string, then using getattr in order to get the actual callable method from the object. Also note that I'm using len() instead of explicitly calling __len()__.
def handle_list(items_list, func_to_call):
total = 0
min = sys.maxint
max = -1
count = len(items_list)
for item in items_list:
current = getattr(item, func_to_call)()
if min > current:
min = current
if max < current:
max = current
total += current
avg = float(total) / count
return total, avg, min, max
Using a branch and bound algorithm I have evaluated the optimal profit from a given set of items, but now I wish to find out which items are included in this optimal solution. I'm evaluating the profit value of the optimal knapsack as follows (adapted from here):
import Queue
class Node:
def __init__(self, level, profit, weight):
self.level = level # The level within the tree (depth)
self.profit = profit # The total profit
self.weight = weight # The total weight
def solveKnapsack(weights, profits, knapsackSize):
numItems = len(weights)
queue = Queue.Queue()
root = Node(-1, 0, 0)
queue.put(root)
maxProfit = 0
bound = 0
while not queue.empty():
v = queue.get() # Get the next item on the queue
uLevel = v.level + 1
u = Node(uLevel, v.profit + e[uLevel][1], v.weight + e[uLevel][0])
bound = getBound(u, numItems, knapsackSize, weights, profits)
if u.weight <= knapsackSize and u.profit > maxProfit:
maxProfit = uProfit
if bound > maxProfit:
queue.put(u)
u = Node(uLevel, v.profit, v.weight)
bound = getBound(u, numItems, knapsackSize, weights, profits)
if (bound > maxProfit):
queue.put(u)
return maxProfit
# This is essentially the brute force solution to the fractional knapsack
def getBound(u, numItems, knapsackSize, weight, profit):
if u.weight >= knapsackSize: return 0
else:
upperBound = u.profit
totalWeight = u.weight
j = u.level + 1
while j < numItems and totalWeight + weight[j] <= C:
upperBound += profit[j]
totalWeight += weights[j]
j += 1
if j < numItems:
result += (C - totalWeight) * profit[j]/weight[j]
return upperBound
So, how can I get the items that form the optimal solution, rather than just the profit?
I got this working using your code as the starting point. I defined my Node class as:
class Node:
def __init__(self, level, profit, weight, bound, contains):
self.level = level # current level of our node
self.profit = profit
self.weight = weight
self.bound = bound # max (optimistic) value our node can take
self.contains = contains # list of items our node contains
I then started my knapsack solver similarly, but initalized root = Node(0, 0, 0, 0.0, []). The value root.bound could be a float, which is why I initalized it to 0.0, while the other values (at least in my problem) are all integers. The node contains nothing so far, so I started it off with an empty list. I followed a similar outline to your code, except that I stored the bound in each node (not sure this was necessary), and updated the contains list using:
u.contains = v.contains[:] # copies the items in the list, not the list location
# Initialize u as Node(uLevel, uProfit, uWeight, 0.0, uContains)
u.contains.append(uLevel) # add the current item index to the list
Note that I only updated the contains list in the "taking the item" node. This is the first initialization in your main loop, preceding the first if bound > maxProfit: statement. I updated the contains list in the if: statement right before this, when you update the value of maxProfit:
if u.weight <= knapsackSize and u.value > maxProfit:
maxProfit = u.profit
bestList = u.contains
This stores the indices of the items you are taking to bestList. I also added the condition if v.bound > maxProfit and v.level < items-1 to the main loop right after v = queue.get() so that I do not keep going after I reach the last item, and I do not loop through branches that are not worth exploring.
Also, if you want to get a binary list output showing which items are selected by index, you could use:
taken = [0]*numItems
for item in bestList:
taken[item] = 1
print str(taken)
I had some other differences in my code, but this should enable you to get your chosen item list out.
I have been thinking about this for some time. Apparently, you have to add some methods inside your Node class that will assign the node_path and add the current level to it. You call your methods inside your loop and assign the path_list to your optimal_item_list when your node_weight is less than the capacity and its value is greater than the max_profit, ie where you assign the maxProfit. You can find the java implementation here