Unifying similar functions in one - python

I have some calculations on biological data. Each function calculates the total, average, min, max values for one list of objects.
The idea is that I have a lot of different lists each one is for a different object type.
I don't want to repeat my code for every function just changing the "for" line and the call of the object's method!
For example:
Volume function:
def calculate_volume(self):
total = 0
min = sys.maxint
max = -1
compartments_counter = 0
for n in self.nodes:
compartments_counter += 1
current = n.get_compartment_volume()
if min > current:
min = current
if max < current:
max = current
total += current
avg = float(total) / compartments_counter
return total, avg, min, max
Contraction function:
def get_contraction(self):
total = 0
min = sys.maxint
max = -1
branches_count = self.branches.__len__()
for branch in self.branches:
current = branch.get_contraction()
if min > current:
min = current
if max < current:
max = current
total += current
avg = float(total) / branches_count
return total, avg, min, max
Both functions look almost the same, just a little modification!
I know I can use the sum, min, max, ... etc. but when I apply them for my values they take more time than doing them in the loop because they can't be called at once.
I just want to know if is it the right way to write a function for every calculation? (i.e. a professional way?) Or maybe I can write one function and pass the list, object type and the method to call.

It's hard to say without seeing the rest of the code but from the limited view given I'd reckon you shouldn't have these functions in methods at all. I also really don't understand your reasoning for not using the builtins("they can't be called at once?"). If you're implying that implementing the 4 statistical methods in a single pass in python is faster than 4 passes in builtin (C) then I'm afraid you have a very wrong assumption.
That said, here's my take on the problem:
def get_stats(l):
s = sum(l)
return (
s,
float(s) / len(l),
min(l),
max(l))
# then create numeric lists from your data and send 'em through:
node_volumes = [n.get_compartment_volume() for n in self.nodes]
branches = [b.get_contraction() for b in self.branches]
# ...
total_1, avg_1, min_1, max_1 = get_stats(node_volumes)
total_2, avg_2, min_2, max_2 = get_stats(branches)
EDIT
Some benchmarks to prove that builtin is win:
MINE.py
import sys
def get_stats(l):
s = sum(l)
return (
s,
float(s) / len(l),
min(l),
max(l)
)
branches = [i for i in xrange(10000000)]
print get_stats(branches)
Versus YOURS.py
import sys
branches = [i for i in xrange(10000000)]
total = 0
min = sys.maxint
max = -1
branches_count = branches.__len__()
for current in branches:
if min > current:
min = current
if max < current:
max = current
total += current
avg = float(total) / branches_count
print total, avg, min, max
And finally with some timers:
smassey#hacklabs:/tmp $ time python mine.py
(49999995000000, 4999999.5, 0, 9999999)
real 0m1.225s
user 0m0.996s
sys 0m0.228s
smassey#hacklabs:/tmp $ time python yours.py
49999995000000 4999999.5 0 9999999
real 0m2.369s
user 0m2.180s
sys 0m0.180s
Cheers

First, notice that while it is probably more efficient to call len(self.branches) (don't call __len__ directly), it is more general to increment a counter in the loop like you do with calculate_volume. With that change, you can refactor as follows:
def _stats(self, iterable, get_current):
total = 0.0
min_value = None # Slightly better
max_value = -1
counter = 0
for n in iterable:
counter += 1
current = get_current(n)
if min_value is None or min_value > current:
min_value = current
if max_value < current:
max_value = current
total += current
avg = total / denom
return total, avg, min_value, max_value
Now, each of the two can be implemented in terms of _stats:
import operator
def calculate_volume(self):
return self._stats(self.nodes, operator.methodcaller('get_compartment_volume'))
def get_contraction(self):
return self.refactor(self.branches, operator.methodcaller('get_contraction'))
methodcaller provides a function f such that f('method_name')(x) is equivalent to x.method_name(), which allows you to factor out the method call.

You can use getattr( instance, methodname) to write a function to process lists of arbitrary objects.
def averager( things, methodname):
count,total,min,max = 0,0,sys.maxint,-1
for thing in things:
current = getattr(thing, methodname)()
count += 1
if min > current:
min = current
if max < current:
max = current
total += current
avg = float(total) / branches_count
return total, avg, min, max
Then inside your class definitions you just need
def calculate_volume(self): return averager( self.nodes, 'get_compartment_volume')
def get_contraction(self): return averager( self.branches, 'get_contraction' )

Writing a function that takes another function that knows how to extract values from the list is very common. In fact, min and max both take arguments to such and effect.
eg.
items = [1, 0, -2]
print(max(items, key=abs)) # prints -2
So it's perfectly acceptable to write your own function that does the same. Normally, I would just create a new list of all the values you want to examine and then work with that (eg. [branch.get_contraction() for branch in branches]). But perhaps space is an issue for you, so here is an example using a generator.
def sum_avg_min_max(iterable, key=None):
if key is not None:
iter_ = (key(item) for item in iterable)
else:
# if there is no key, just use the iterable itself
iter_ = iter(iterable)
try:
# We don't know sensible starting values for total, min or max. So use
# the first value.
total = min_ = max_ = next(iter_)
except StopIteration:
# can't have a min or max if we have no items in the iterable...
raise ValueError("empty iterable") from None
count = 1
for item in iter_:
total += item
min_ = min(min_, item)
max_ = max(max_, item)
count += 1
return total, float(total) / count, min_, max_
Then you might use it like this:
class MyClass(int):
def square(self):
return self ** 2
items = [MyClass(i) for i in range(10)]
print(sum_avg_min_max(items, key=MyClass.square)) # prints (285, 28.5, 0, 81)
This works because when you fetch an instance method from the class it gives your underlying function itself (without self bound). So we can use it as the key. eg.
str.upper("hello world") == "hello world".upper()
With a more concrete example (assuming items in branches are instances of Branch):
def get_contraction(self):
result = sum_avg_min_max(self.branches, key=Branch.get_contraction)
return result

Or maybe I can write one function and pass the list, object type and the method to call.
Altough you can definitely pass a function to function, and it's actually a very common way to avoid repeating yourself, in this case you can't because each object in the list has it's own method. So instead, I'm passing the function's name as a string, then using getattr in order to get the actual callable method from the object. Also note that I'm using len() instead of explicitly calling __len()__.
def handle_list(items_list, func_to_call):
total = 0
min = sys.maxint
max = -1
count = len(items_list)
for item in items_list:
current = getattr(item, func_to_call)()
if min > current:
min = current
if max < current:
max = current
total += current
avg = float(total) / count
return total, avg, min, max

Related

Adding a cache array to recursive knapsack solution?

I'm familiar with the naive recursive solution to the knapsack problem. However, this solution simply spits out the max value that can be stored in the knapsack given its weight constraints. What I'd like to do is add some form of metadata cache (namely which items have/not been selected, using a "one-hot" array [0,1,1]).
Here's my attempt:
class Solution:
def __init__(self):
self.array = []
def knapSack(self,W, wt, val, n):
index = n-1
if n == 0 or W == 0 :
return 0
if (wt[index] > W):
self.array.append(0)
choice = self.knapSack(W, wt, val, index)
else:
option_A = val[index] + self.knapSack( W-wt[index], wt, val, index)
option_B = self.knapSack(W, wt, val, index)
if option_A > option_B:
self.array.append(1)
choice = option_A
else:
self.array.append(0)
choice = option_B
print(int(option_A > option_B)) #tells you which path was traveled
return choice
# To test above function
val = [60, 100, 120]
wt = [10, 20, 30]
W = 50
n = len(val)
# print(knapSack(W, wt, val, n))
s = Solution()
s.knapSack(W, wt, val, n)
>>>
1
1
1
1
1
1
220
s.array
>>>
[1, 1, 1, 1, 1, 1]
As you can see, s.array returns [1,1,1,1,1,1] and this tells me a few things. (1), even though there are only three items in the problem set, the knapSack method has been called twice for each item and (2) this is because every item flows through the else statement in the method, so option_A and option_B are each computed for each item (explaining why the array length is 6 not 3.)
I'm confused as to why 1 has been appended in every recursive loop. The item at index 0 would is not selected in the optimal solution. To answer this question, please provide:
(A) Why the current solution is behaving this way
(B) How the code can be restructured such that a one-hot "take or don't take" vector can be captured, representing whether a given item goes in the knapsack or not.
Thank you!
(A) Why the current solution is behaving this way
self.array is an instance attribute that is shared by all recursion paths. On one path or another each item is taken and so a one is appended to the list.
option_A = val[index]... takes an item but doesn't append a one to the list.
option_B = self..... skips an item but doesn't append a zero to the list.
if option_A > option_B: When you make this comparison you have lost the information that made it - the items that were taken/discarded in the branch;
in the suites you just append a one or a zero regardless of how many items made those values.
The ones and zeroes then represent whether branch A (1) or branch B (0) was successful in the current instance of the function.
(B) How the code can be restructured such that a one-hot "take or don't take" vector can be captured, representing whether a given item goes in the knapsack or not.
It would be nice to know what you have taken after running through the analysis, I suspect that is what you are trying to do with self.array. You expressed an interest in OOP: instead of keeping track with lists of numbers using indices to select numbers from the lists, make objects to represent the items work with those. Keep the objects in containers and use the functionality of the container to add or remove items/objects from it. Consider how you are going to use a container before choosing one.
Don't put the function in a class.
Change the function's signature to accept
available weight,
a container of items to be considered,
a container holding the items currently in the sack (the current sack).
Use a collections.namedtuple or a class for the items having value and weight attributes.
Item = collections.namedtuple('Item',['wt','val'])
When an item is taken add it to the current sack.
When recursing
if going down the take path add the return value from the call to the current sack
remove the item that was just considered from the list of items to be considered argument.
if taken subtract the item's weight from the available weight argument
When comparing two branches you will need to add up the values of each item the current sack.
return the sack with the highest value
carefully consider the base case
Make the items to be considered like this.
import collections
Item = collections.namedtuple('Item',['wt','val'])
items = [Item(wght,value) for wght,value in zip(wt,val)]
Add up values like this.
value = sum(item.val for item in current_sack)
# or
import operator
val = operator.itemgetter('val')
wt = operator.itemgetter('wt')
value = sum(map(val,current_sack)
Your solution enhanced with debugging prints for the curious.
class Solution:
def __init__(self):
self.array = []
self.other_array = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
def knapSack(self,W, wt, val, n,j=0):
index = n-1
deep = f'''{' '*j*3}'''
print(f'{deep}level {j}')
print(f'{deep}{W} available: considering {wt[index]},{val[index]}, {n})')
# minor change here but has no affect on the outcome0
#if n == 0 or W == 0 :
if n == 0:
print(f'{deep}Base case found')
return 0
print(f'''{deep}{wt[index]} > {W} --> {wt[index] > W}''')
if (wt[index] > W):
print(f'{deep}too heavy')
self.array.append(0)
self.other_array[index] = 0
choice = self.knapSack(W, wt, val, index,j+1)
else:
print(f'{deep}Going down the option A hole')
option_A = val[index] + self.knapSack( W-wt[index], wt, val, index,j+1)
print(f'{deep}Going down the option B hole')
option_B = self.knapSack(W, wt, val, index,j+1)
print(f'{deep}option A:{option_A} option B:{option_B}')
if option_A > option_B:
print(f'{deep}option A wins')
self.array.append(1)
self.other_array[index] = 1
choice = option_A
else:
print(f'{deep}option B wins')
self.array.append(0)
self.other_array[index] = 0
choice = option_B
print(f'{deep}level {j} Returning value={choice}')
print(f'{deep}---------------------------------------------')
return choice

Pythonic way to assign range of number to bucket

I'm developing an ABtest framework using django. I want to assign variant number based on bucket_id from cookies' request.
bucket_id is set by the front end with a range integer from 0-99.
So far, I have created the function name get_bucket_name:
def get_bucket_range(data):
range_bucket = []
first_val = 0
next_val = 0
for i, v in enumerate(data.split(",")):
v = int(v)
if i == 0:
first_val = v
range_bucket.append([0, first_val])
elif i == 1:
range_bucket.append([first_val, first_val + v])
next_val = first_val + v
else:
range_bucket.append([next_val, next_val + v])
next_val = next_val + v
return range_bucket
Data input for get_bucket_range is a comma delineated string which means we have 3 variants where each variant has its own weight e.g. data = "25,25,50" with first variant's weight being 25 etc.
I then created a function to assign the variant named,
def assign_variant(range_bucket, num):
for i in range(len(range_bucket)):
if num in range(range_bucket[i][0], range_bucket[i][1]):
return i
This function should have 2 parameters, range_bucket -> from get_bucket_range function, and num -> bucket_id from cookies.
With this function I can return which bucket_id belongs to the variant id.
For example, we have 25 as bucket_id, with data = "25,25,50". This means our bucket_id should belong to variant id 1. Or in the case that we have 25 as bucket_id, with data = "10,10,10,70". This should mean that our bucket_id will belong to variant id 2.
However, it feels like neither of my functions are pythonic or optimised. Does anyone here have any suggestions as to how I could improve my code?
Your functions could look like this for example:
def get_bucket_range(data):
last = 0
range_bucket = []
for v in map(int, data.split(',')):
range_bucket.append([last, last+v])
last += v
return range_bucket
def assign_variant(range_bucket, num):
for i, (low, high) in enumerate(range_bucket):
if low <= num < high:
return i
You can greatly reduce the lengths of your functions with the itertools.accumulate and bisect.bisect functions. The first function accumulates all the weights into sums (10,10,10,70 becomes 10,20,30,100), and the second function gives you the index of where that element would belong, which in your case is equivalent to the index of the group it belongs to.
from itertools import accumulate
from bisect import bisect
def get_bucket_range(data):
return list(accumulate(map(int, data.split(',')))
def assign_variant(range_bucket, num):
return bisect(range_bucket, num)

Knapsack recursive function

I have a list named self.items where the elements are:
items = [dict(id=0, w=4, v=12),
dict(id=1, w=6, v=10),
dict(id=2, w=5, v=8),
dict(id=3, w=7, v=11),
dict(id=4, w=3, v=14),
dict(id=5, w=1, v=7),
dict(id=6, w=6, v=9)]
With this I had to do a list of lists, where every element has all the possible combinations including the empty case, so finally my list of lists has more or less this appearence:
[[],[{id:0,w:4,v:12}],....,[{id:0,w:4,v:12}, {id:1,w:6,v:10}]....]
Now I have to found a recursive function to search what combination of elements has the max weight permitted and the max value.
def recursive(self, n, max_weight):
""" Recursive Knapsack
:param n: Number of elements
:param max_weight: Maximum weight allowed
:return: max_value
"""
self.iterations += 1
result = 0
if max_weight > self.max_weight: #they gave me self.max_weight as a variable of __init__ method which shows me what is the maximum weight permitted
self.recursive(self, self.items+self.iterations, max_weight)
if max_weight < self.max_weight:
self.recursive(self, self.items+self.iterations, max_weight)
else:
result = self.items['v']+result
return result
I think that my error is in this line:
result = self.items['v']+result
But I cannot find it.
I just have found the solution to this recursive problem:
(I'm from Spain so the variable "cantidad" also means "quantity")
def recursive(self, n, max_weight):
""" Recursive Knapsack
:param n: Number of elements
:param max_weight: Maximum weight allowed
:return: max_valu
"""
self.iterations += 1
result = 0
cantidad = 0
quantity = 0
if max_weight == 0 or n == 1:
if max_weight >= self.items[n-1]['w'] :
cantidad= self.items[n-1]['v']
return max(cantidad,quantity)
else:
if max_weight >= self.items[n-1]['w']:
cantidad = self.items[n-1]['v']+self.recursive(n-1,max_weight-self.items[n-1]['w'])
quantity = self.recursive(n-1,max_weight)
result = max(cantidad, quantity)
return result
I put this code into a program proportioned by the university I am studying at and it returns to me the correct result:
Method: recursive
Iterations:107
Max value:44 expected max_value:44

single iteration sharing the iterator

I have a lot of data, usually in a file. I want to compute some quantities so I have this kind of functions:
def mean(iterator):
n = 0
sum = 0.
for i in iterator:
sum += i
n += 1
return sum / float(n)
I have also many other similar functions (var, size, ...)
Now I have an iterator iterating throught the data: iter_data. I can compute all the quantities I want: m = mean(iter_data); v = var(iter_data) and so on, but the problem is that I am iterating many times and this is expensive in my case. Actually the I/O is the most expensive part.
So the question is: can I compute my quantities m, v, ... iterating only one time over iter_data keeping separate the functions mean, var, ... so that it is easy to add new ones?
What I need is something similar to boost::accumulators
For example use objects and callbacks like:
class Counter():
def __init__(self):
self.n = 0
def __call__(self, i):
self.n += 1
class Summer():
def __init__(self):
self.sum = 0
def __call__(self, i):
self.sum += i
def process(iterator, callbacks):
for i in iterator:
for f in callbacks: f(i)
counter = Counter()
summer = Summer()
callbacks = [counter, summer]
iterator = xrange(10) # testdata
process(iterator, callbacks)
# process results from callbacks
n = counter.n
sum = summer.sum
This is easily extendible and iterates the data only once.
You can use itertools.tee and generator magic (I say magic because it's not exactly nice and readable):
import itertools
def mean(iterator):
n = 0
sum = 0.
for i in iterator:
sum += i
n += 1
yield
yield sum / float(n)
def multi_iterate(funcs, iter_data):
iterators = itertools.tee(iter_data, len(funcs))
result_iterators = [func(values) for func, values in zip(funcs, iterators)]
for results in itertools.izip(*result_iterators):
pass
return results
mean_result, var_result = multi_iterate([mean, var], iter([10, 20, 30]))
print(mean_result) # 20.0
By the way, you can write mean in a simpler way:
def mean(iterator):
total = 0.
for n, item in enumerate(iterator, 1):
total += i
yield
yield total / n
You shouldn't name variables sum because that shadows the built-in function with the same name.
Without classes, you could adapt the following:
def my_mean():
total = 0.
length = 0
while True:
val = (yield)
if val is not None:
total += val
length += 1
else:
yield total / length
def my_len():
length = 0
while True:
val = (yield)
if val is not None:
length += 1
else:
yield length
def my_sum():
total = 0.
while True:
val = (yield)
if val is not None:
total += val
else:
yield total
def process(iterable, **funcs):
fns = {name:func() for name, func in funcs.iteritems()}
for fn in fns.itervalues():
fn.send(None)
for item in iterable:
for fn in fns.itervalues():
fn.send(item)
return {name:next(func) for name, func in fns.iteritems()}
data = [1, 2, 3]
print process(data, items=my_len, some_other_value=my_mean, Total=my_sum)
# {'items': 3, 'some_other_value': 2.0, 'Total': 6.0}
What you want is to have a main Calc class that iterates over the data applying different calculation for mean, var, etc and then can return those values through an interface. You could make it more generic by letting calculations register themselves with this class before the main calculation and then have their results available through new accessors in the interface.

Calculating items included in branch and bound knapsack

Using a branch and bound algorithm I have evaluated the optimal profit from a given set of items, but now I wish to find out which items are included in this optimal solution. I'm evaluating the profit value of the optimal knapsack as follows (adapted from here):
import Queue
class Node:
def __init__(self, level, profit, weight):
self.level = level # The level within the tree (depth)
self.profit = profit # The total profit
self.weight = weight # The total weight
def solveKnapsack(weights, profits, knapsackSize):
numItems = len(weights)
queue = Queue.Queue()
root = Node(-1, 0, 0)
queue.put(root)
maxProfit = 0
bound = 0
while not queue.empty():
v = queue.get() # Get the next item on the queue
uLevel = v.level + 1
u = Node(uLevel, v.profit + e[uLevel][1], v.weight + e[uLevel][0])
bound = getBound(u, numItems, knapsackSize, weights, profits)
if u.weight <= knapsackSize and u.profit > maxProfit:
maxProfit = uProfit
if bound > maxProfit:
queue.put(u)
u = Node(uLevel, v.profit, v.weight)
bound = getBound(u, numItems, knapsackSize, weights, profits)
if (bound > maxProfit):
queue.put(u)
return maxProfit
# This is essentially the brute force solution to the fractional knapsack
def getBound(u, numItems, knapsackSize, weight, profit):
if u.weight >= knapsackSize: return 0
else:
upperBound = u.profit
totalWeight = u.weight
j = u.level + 1
while j < numItems and totalWeight + weight[j] <= C:
upperBound += profit[j]
totalWeight += weights[j]
j += 1
if j < numItems:
result += (C - totalWeight) * profit[j]/weight[j]
return upperBound
So, how can I get the items that form the optimal solution, rather than just the profit?
I got this working using your code as the starting point. I defined my Node class as:
class Node:
def __init__(self, level, profit, weight, bound, contains):
self.level = level # current level of our node
self.profit = profit
self.weight = weight
self.bound = bound # max (optimistic) value our node can take
self.contains = contains # list of items our node contains
I then started my knapsack solver similarly, but initalized root = Node(0, 0, 0, 0.0, []). The value root.bound could be a float, which is why I initalized it to 0.0, while the other values (at least in my problem) are all integers. The node contains nothing so far, so I started it off with an empty list. I followed a similar outline to your code, except that I stored the bound in each node (not sure this was necessary), and updated the contains list using:
u.contains = v.contains[:] # copies the items in the list, not the list location
# Initialize u as Node(uLevel, uProfit, uWeight, 0.0, uContains)
u.contains.append(uLevel) # add the current item index to the list
Note that I only updated the contains list in the "taking the item" node. This is the first initialization in your main loop, preceding the first if bound > maxProfit: statement. I updated the contains list in the if: statement right before this, when you update the value of maxProfit:
if u.weight <= knapsackSize and u.value > maxProfit:
maxProfit = u.profit
bestList = u.contains
This stores the indices of the items you are taking to bestList. I also added the condition if v.bound > maxProfit and v.level < items-1 to the main loop right after v = queue.get() so that I do not keep going after I reach the last item, and I do not loop through branches that are not worth exploring.
Also, if you want to get a binary list output showing which items are selected by index, you could use:
taken = [0]*numItems
for item in bestList:
taken[item] = 1
print str(taken)
I had some other differences in my code, but this should enable you to get your chosen item list out.
I have been thinking about this for some time. Apparently, you have to add some methods inside your Node class that will assign the node_path and add the current level to it. You call your methods inside your loop and assign the path_list to your optimal_item_list when your node_weight is less than the capacity and its value is greater than the max_profit, ie where you assign the maxProfit. You can find the java implementation here

Categories

Resources