Merge sorted lists in python - python

I have a bunch of sorted lists of objects, and a comparison function
class Obj :
def __init__(p) :
self.points = p
def cmp(a, b) :
return a.points < b.points
a = [Obj(1), Obj(3), Obj(8), ...]
b = [Obj(1), Obj(2), Obj(3), ...]
c = [Obj(100), Obj(300), Obj(800), ...]
result = magic(a, b, c)
assert result == [Obj(1), Obj(1), Obj(2), Obj(3), Obj(3), Obj(8), ...]
what does magic look like? My current implementation is
def magic(*args) :
r = []
for a in args : r += a
return sorted(r, cmp)
but that is quite inefficient. Better answers?

Python standard library offers a method for it: heapq.merge.
As the documentation says, it is very similar to using itertools (but with more limitations); if you cannot live with those limitations (or if you do not use Python 2.6) you can do something like this:
sorted(itertools.chain(args), cmp)
However, I think it has the same complexity as your own solution, although using iterators should give some quite good optimization and speed increase.

I like Roberto Liffredo's answer. I didn't know about heapq.merge(). Hmmmph.
Here's what the complete solution looks like using Roberto's lead:
class Obj(object):
def __init__(self, p) :
self.points = p
def __cmp__(self, b) :
return cmp(self.points, b.points)
def __str__(self):
return "%d" % self.points
a = [Obj(1), Obj(3), Obj(8)]
b = [Obj(1), Obj(2), Obj(3)]
c = [Obj(100), Obj(300), Obj(800)]
import heapq
sorted = [item for item in heapq.merge(a,b,c)]
for item in sorted:
print item
Or:
for item in heapq.merge(a,b,c):
print item

Use the bisect module. From the documentation: "This module provides support for maintaining a list in sorted order without having to sort the list after each insertion."
import bisect
def magic(*args):
r = []
for a in args:
for i in a:
bisect.insort(r, i)
return r

Instead of using a list, you can use a [heap](http://en.wikipedia.org/wiki/Heap_(data_structure).
The insertion is O(log(n)), so merging a, b and c will be O(n log(n))
In Python, you can use the heapq module.

I don't know whether it would be any quicker, but you could simplify it with:
def GetObjKey(a):
return a.points
return sorted(a + b + c, key=GetObjKey)
You could also, of course, use cmp rather than key if you prefer.

One line solution using sorted:
def magic(*args):
return sorted(sum(args,[]), key: lambda x: x.points)
IMO this solution is very readable.
Using heapq module, it could be more efficient, but I have not tested it. You cannot specify cmp/key function in heapq, so you have to implement Obj to be implicitly sorted.
import heapq
def magic(*args):
h = []
for a in args:
heapq.heappush(h,a)
return [i for i in heapq.heappop(h)

I asked a similar question and got some excellent answers:
Joining a set of ordered-integer yielding Python iterators
The best solutions from that question are variants of the merge algorithm, which you can read about here:
Wikipedia: Merge Algorithm

Below is an example of a function that runs in O(n) comparisons.
You could make this faster by making a and b iterators and incrementing them.
I have simply called the function twice to merge 3 lists:
def zip_sorted(a, b):
'''
zips two iterables, assuming they are already sorted
'''
i = 0
j = 0
result = []
while i < len(a) and j < len(b):
if a[i] < b[j]:
result.append(a[i])
i += 1
else:
result.append(b[j])
j += 1
if i < len(a):
result.extend(a[i:])
else:
result.extend(b[j:])
return result
def genSortedList(num,seed):
result = []
for i in range(num):
result.append(i*seed)
return result
if __name__ == '__main__':
a = genSortedList(10000,2.0)
b = genSortedList(6666,3.0)
c = genSortedList(5000,4.0)
d = zip_sorted(zip_sorted(a,b),c)
print d
However, heapq.merge uses a mix of this method and heaping the current elements of all lists, so should perform much better

Here you go: a fully functioning merge sort for lists (adapted from my sort here):
def merge(*args):
import copy
def merge_lists(left, right):
result = []
while left and right:
which_list = (left if left[0] <= right[0] else right)
result.append(which_list.pop(0))
return result + left + right
lists = list(args)
while len(lists) > 1:
left, right = copy.copy(lists.pop(0)), copy.copy(lists.pop(0))
result = merge_lists(left, right)
lists.append(result)
return lists.pop(0)
Call it like this:
merged_list = merge(a, b, c)
for item in merged_list:
print item
For good measure, I'll throw in a couple of changes to your Obj class:
class Obj(object):
def __init__(self, p) :
self.points = p
def __cmp__(self, b) :
return cmp(self.points, b.points)
def __str__(self):
return "%d" % self.points
Derive from object
Pass self to __init__()
Make __cmp__ a member function
Add a str() member function to present Obj as string

Related

A memoized function that takes a tuple of strings to return an integer?

Suppose I have arrays of tuples like so:
a = [('shape', 'rectangle'), ('fill', 'no'), ('size', 'huge')]
b = [('shape', 'rectangle'), ('fill', 'yes'), ('size', 'large')]
I am trying to turn these arrays into numerical vectors with each dimension representing a feature.
So the expected output we be something like:
amod = [1, 0, 1] # or [1, 1, 1]
bmod = [1, 1, 2] # or [1, 2, 2]
So the vector that gets created is dependent on what it has seen before (i.e rectangle is still coded as 1 but the new value 'large' gets coded as a next step up as 2).
I think I could use some combination of yield and a memoize function to help me with this. This is what I've tried so far:
def memoize(f):
memo = {}
def helper(x):
if x not in memo:
memo[x] = f(x)
return memo[x]
return helper
#memoize
def verbal_to_value(tup):
u = 1
if tup[0] == 'shape':
yield u
u += 1
if tup[0] == 'fill':
yield u
u += 1
if tup[0] == 'size':
yield u
u += 1
But I keep getting this error:
TypeError: 'NoneType' object is not callable
Is there a way I can create this function that has a memory of what it has seen? Bonus points if it could add keys dynamically so I don't have to hardcode things like 'shape' or 'fill'.
First off: this is my preferred implementation of the memoize
decorator, mostly because of speed ...
def memoize(f):
class memodict(dict):
__slots__ = ()
def __missing__(self, key):
self[key] = ret = f(key)
return ret
return memodict().__getitem__
except for some a few edge cases it has the same effect as yours:
def memoize(f):
memo = {}
def helper(x):
if x not in memo:
memo[x] = f(x)
#else:
# pass
return memo[x]
return helper
but is somewhat faster because the if x not in memo: happens in
native code instead of in python. To understand it you merely need
to know that under normal circumstances: to interpret adict[item]
python calls adict.__getitem__(key), if adict doesn't contain key,
__getitem__() calls adict.__missing__(key) so we can leverage the
python magic methods protocols for our gain...
#This the first idea I had how I would implement your
#verbal_to_value() using memoization:
from collections import defaultdict
work=defaultdict(set)
#memoize
def verbal_to_value(kv):
k, v = kv
aset = work[k] #work creates a new set, if not already created.
aset.add(v) #add value if not already added
return len(aset)
including the memoize decorator, that's 15 lines of code...
#test suite:
def vectorize(alist):
return [verbal_to_value(kv) for kv in alist]
a = [('shape', 'rectangle'), ('fill', 'no'), ('size', 'huge')]
b = [('shape', 'rectangle'), ('fill', 'yes'), ('size', 'large')]
print (vectorize(a)) #shows [1,1,1]
print (vectorize(b)) #shows [1,2,2]
defaultdict is a powerful object that has almost the same logic
as memoize: a standard dictionary in every way, except that when the
lookup fails, it runs the callback function to create the missing
value. In our case set()
Unfortunately this problem requires either access to the tupple that
is being used as the key, or to the dictionary state itself. With the
result that we cannot just write a simple function for .default_factory
But we can write a new object based on the memoize/defaultdict pattern:
#This how I would implement your verbal_to_value without
#memoization, though the worker class is so similar to #memoize,
#that it's easy to see why memoize is a good pattern to work from:
class sloter(dict):
__slots__ = ()
def __missing__(self,key):
self[key] = ret = len(self) + 1
#this + 1 bothers me, why can't these vectors be 0 based? ;)
return ret
from collections import defaultdict
work2 = defaultdict(sloter)
def verbal_to_value2(kv):
k, v = kv
return work2[k][v]
#~10 lines of code?
#test suite2:
def vectorize2(alist):
return [verbal_to_value2(kv) for kv in alist]
print (vectorize2(a)) #shows [1,1,1]
print (vectorize2(b)) #shows [1,2,2]
You might have seen something like sloter before, because it's
sometimes used for exactly this sort of situation. Converting member
names to numbers and back. Because of this, we have the advantage of
being able to reverse things like this:
def unvectorize2(a_vector, pattern=('shape','fill','size')):
reverser = [{v:k2 for k2,v in work2[k].items()} for k in pattern]
for index, vect in enumerate(a_vector):
yield pattern[index], reverser[index][vect]
print (list(unvectorize2(vectorize2(a))))
print (list(unvectorize2(vectorize2(b))))
But I saw those yields in your original post, and they've got me
thinking... what if there was a memoize / defaultdict like object
that could take a generator instead of a function and knew to just
advance the generator rather than calling it. Then I realized ...
that yes generators come with a callable called __next__() which
meant that we didn't need a new defaultdict implementation, just a
careful extraction of the correct member funtion...
def count(start=0): #same as: from itertools import count
while True:
yield start
start += 1
#so we could get the exact same behavior as above, (except faster)
#by saying:
sloter3=lambda :defaultdict(count(1).__next__)
#and then
work3 = defaultdict(sloter3)
#or just:
work3 = defaultdict(lambda :defaultdict(count(1).__next__))
#which yes, is a bit of a mindwarp if you've never needed to do that
#before.
#the outer defaultdict interprets the first item. Every time a new
#first item is received, the lambda is called, which creates a new
#count() generator (starting from 1), and passes it's .__next__ method
#to a new inner defaultdict.
def verbal_to_value3(kv):
k, v = kv
return work3[k][v]
#you *could* call that 8 lines of code, but we managed to use
#defaultdict twice, and didn't need to define it, so I wouldn't call
#it 'less complex' or anything.
#test suite3:
def vectorize3(alist):
return [verbal_to_value3(kv) for kv in alist]
print (vectorize3(a)) #shows [1,1,1]
print (vectorize3(b)) #shows [1,2,2]
#so yes, that can also work.
#and since the internal state in `work3` is stored in the exact same
#format, it be accessed the same way as `work2` to reconstruct input
#from output.
def unvectorize3(a_vector, pattern=('shape','fill','size')):
reverser = [{v:k2 for k2,v in work3[k].items()} for k in pattern]
for index, vect in enumerate(a_vector):
yield pattern[index], reverser[index][vect]
print (list(unvectorize3(vectorize3(a))))
print (list(unvectorize3(vectorize3(b))))
Final comments:
Each of these implementations suffer from storing state in a global
variable. Which I find anti-aesthetic but depending on what you're
planning to do with that vector later, that might be a feature. As I
demonstrated.
Edit:
Another day of meditating on this, and the sorts of situations where I might need it,
I think that I'd encapsulate this feature like this:
from collections import defaultdict
from itertools import count
class slotter4:
def __init__(self):
#keep track what order we expect to see keys
self.pattern = defaultdict(count(1).__next__)
#keep track of what values we've seen and what number we've assigned to mean them.
self.work = defaultdict(lambda :defaultdict(count(1).__next__))
def slot(self, kv, i=False):
"""used to be named verbal_to_value"""
k, v = kv
if i and i != self.pattern[k]:# keep track of order we saw initial keys
raise ValueError("Input fields out of order")
#in theory we could ignore this error, and just know
#that we're going to default to the field order we saw
#first. Or we could just not keep track, which might be
#required, if our code runs to slow, but then we cannot
#make pattern optional in .unvectorize()
return self.work[k][v]
def vectorize(self, alist):
return [self.slot(kv, i) for i, kv in enumerate(alist,1)]
#if we're not keeping track of field pattern, we could do this instead
#return [self.work[k][v] for k, v in alist]
def unvectorize(self, a_vector, pattern=None):
if pattern is None:
pattern = [k for k,v in sorted(self.pattern.items(), key=lambda a:a[1])]
reverser = [{v:k2 for k2,v in work3[k].items()} for k in pattern]
return [(pattern[index], reverser[index][vect])
for index, vect in enumerate(a_vector)]
#test suite4:
s = slotter4()
if __name__=='__main__':
Av = s.vectorize(a)
Bv = s.vectorize(b)
print (Av) #shows [1,1,1]
print (Bv) #shows [1,2,2]
print (s.unvectorize(Av))#shows a
print (s.unvectorize(Bv))#shows b
else:
#run the test silently, and only complain if something has broken
assert s.unvectorize(s.vectorize(a))==a
assert s.unvectorize(s.vectorize(b))==b
Good luck out there!
Not the best approach, but may help you to figure out a better solution
class Shape:
counter = {}
def to_tuple(self, tuples):
self.tuples = tuples
self._add()
l = []
for i,v in self.tuples:
l.append(self.counter[i][v])
return l
def _add(self):
for i,v in self.tuples:
if i in self.counter.keys():
if v not in self.counter[i]:
self.counter[i][v] = max(self.counter[i].values()) +1
else:
self.counter[i] = {v: 0}
a = [('shape', 'rectangle'), ('fill', 'no'), ('size', 'huge')]
b = [('shape', 'rectangle'), ('fill', 'yes'), ('size', 'large')]
s = Shape()
s.to_tuple(a)
s.to_tuple(b)

one-liner reduce in Python3

In Python3, I am looking for a way to compute in one line a lambda function called on elements two by two. Let’s say I want to compute the LCM of a list of integers, this can be done in one line in Python2:
print reduce(lambda a,b: a * b // gcd(a, b), mylist)
Is it possible to do the same in one line Python3 (implied, without functools.reduce)?
In Python3 I know that filter, map and reduce are gone. I don’t feel I need filter and map anymore because they can be written in Python3 in a shorter and more clear fashion but I thought I could find a nice replacement for reduce as well, except I haven’t found any. I have seen many articles that suggest to use functools.reduce or to “write out the accumulation loop explicitly” but I’d like to do it without importing functools and in one line.
If it makes it any easier, I should mention I use functions that are both associative and commutative. For instance with a function f on the list [1,2,3,4], the result will be good if it either computes:
f(1,f(2,f(3,4)))
f(f(1,2),f(3,4))
f(f(3,f(1,4)),2)
or any other order
So I actually did come up with something. I do not guarantee the performance though, but it is a one-liner using exclusively lambda functions - nothing from functools or itertools, not even a single loop.
my_reduce = lambda l, f: (lambda u, a: u(u, a))((lambda v, m: None if len(m) == 0 else (m[0] if len(m) == 1 else v(v, [f(m[0], m[1])] + m[2:]))), l)
This is somewhat unreadable, so here it is expanded:
my_reduce = lambda l, f: (
lambda u, a: u(u, a)) (
(lambda v, m: None if len(m) == 0
else (m[0] if len(m) == 1
else v(v, [f(m[0], m[1])] + m[2:])
)
),
l
)
Test:
>>> f = lambda a,b: a+b
>>> my_reduce([1, 2, 3, 4], f)
10
>>> my_reduce(['a', 'b', 'c', 'd'], f)
'abcd'
Please check this other post for a deeper explanation of how this works.
The principle if to emulate a recursive function, by using a lambda function whose first parameter is a function, and will be itself.
This recursive function is embedded inside of a function that effectively triggers the recursive calling: lambda u, a: u(u, a).
Finally, everything is wrapped in a function whose parameters are a list and a binary function.
Using my_reduce with your code:
my_reduce(mylist, lambda a,b: a * b // gcd(a, b))
Assuming you have a sequence that is at least one item long you can simply define reduce recursivly like this:
def reduce(func, seq): return seq[0] if len(seq) == 1 else func(reduce(func, seq[:-1]), seq[-1])
The long version would be slightly more readable:
def reduce(func, seq):
if len(seq) == 1:
return seq[0]
else:
return func(reduce(func, seq[:-1]), seq[-1])
However that's recursive and python isn't very good at recursive calls (meaning slow and the recursion limit prevents prosessing sequences longer than 300 items). A much faster implementation would be:
def reduce(func, seq):
tmp = seq[0]
for item in seq[1:]:
tmp = func(tmp, item)
return tmp
But because of the loop it can't be put in one-line. It could be solved using side-effects:
def reduce(func, seq): d = {}; [d.__setitem__('last', func(d['last'], i)) if 'last' in d else d.__setitem__('last', i) for i in seq]; return d['last']
or:
def reduce(func, seq): d = {'last': seq[0]}; [d.__setitem__('last', func(d['last'], i)) for i in seq[1:]]; return d['last']
Which is the equivalent of:
def reduce(func, seq):
d = {}
for item in seq:
if 'last' in d:
d['last'] = func(d['last'], item)
else:
d['last'] = item
return d['last'] # or "d.get('last', 0)"
That should be faster but it's not exactly pythonic because the list-comprehension in the one-line implementation is just used because of the side-effects.

Short-circuit evaluation like Python's "and" while storing results of checks

I have multiple expensive functions that return results. I want to return a tuple of the results of all the checks if all the checks succeed. However, if one check fails I don't want to call the later checks, like the short-circuiting behavior of and. I could nest if statements, but that will get out of hand if there are a lot of checks. How can I get the short-circuit behavior of and while also storing the results for later use?
def check_a():
# do something and return the result,
# for simplicity, just make it "A"
return "A"
def check_b():
# do something and return the result,
# for simplicity, just make it "B"
return "B"
...
This doesn't short-circuit:
a = check_a()
b = check_b()
c = check_c()
if a and b and c:
return a, b, c
This is messy if there are many checks:
if a:
b = check_b()
if b:
c = check_c()
if c:
return a, b, c
Is there a shorter way to do this?
Just use a plain old for loop:
results = {}
for function in [check_a, check_b, ...]:
results[function.__name__] = result = function()
if not result:
break
The results will be a mapping of the function name to their return values, and you can do what you want with the values after the loop breaks.
Use an else clause on the for loop if you want special handling for the case where all of the functions have returned truthy results.
Write a function that takes an iterable of functions to run. Call each one and append the result to a list, or return None if the result is False. Either the function will stop calling further checks after one fails, or it will return the results of all the checks.
def all_or_none(checks, *args, **kwargs):
out = []
for check in checks:
rv = check(*args, **kwargs)
if not rv:
return None
out.append(rv)
return out
rv = all_or_none((check_a, check_b, check_c))
# rv is a list if all checks passed, otherwise None
if rv is not None:
return rv
def check_a(obj):
...
def check_b(obj):
...
# pass arguments to each check, useful for writing reusable checks
rv = all_or_none((check_a, check_b), obj=my_object)
In other languages that did have assignments as expressions you would be able to use
if (a = check_a()) and (b = check_b()) and (c = check_c()):
but Python is no such language. Still, we can circumvent the restriction and emulate that behaviour:
result = []
def put(value):
result.append(value)
return value
if put(check_a()) and put(check_b()) and put(check_c()):
# if you need them as variables, you could do
# (a, b, c) = result
# but you just want
return tuple(result)
This might loosen the connection between the variables and function calls a bit too much, so if you want to do lots of separate things with the variables, instead of using the result elements in the order they were put in the list, I would rather avoid this approach. Still, it might be quicker and shorter than some loop.
You could use either a list or an OrderedDict, using a for loop would serve the purpose of emulating short circuiting.
from collections import OrderedDict
def check_a():
return "A"
def check_b():
return "B"
def check_c():
return "C"
def check_d():
return False
def method1(*args):
results = []
for i, f in enumerate(args):
value = f()
results.append(value)
if not value:
return None
return results
def method2(*args):
results = OrderedDict()
for f in args:
results[f.__name__] = result = f()
if not result:
return None
return results
# Case 1, it should return check_a, check_b, check_c
for m in [method1, method2]:
print(m(check_a, check_b, check_c))
# Case 1, it should return None
for m in [method1, method2]:
print(m(check_a, check_b, check_d, check_c))
There are lots of ways to do this! Here's another.
You can use a generator expression to defer the execution of the functions. Then you can use itertools.takewhile to implement the short-circuiting logic by consuming items from the generator until one of them is false.
from itertools import takewhile
functions = (check_a, check_b, check_c)
generator = (f() for f in functions)
results = tuple(takewhile(bool, generator))
if len(results) == len(functions):
return results
Another way to tackle this is using a generator, since generators use lazy evaluation. First put all checks into a generator:
def checks():
yield check_a()
yield check_b()
yield check_c()
Now you could force evaluation of everything by converting it to a list:
list(checks())
But the standard all function does proper short cut evaluation on the iterator returned from checks(), and returns whether all elements are truthy:
all(checks())
Last, if you want the results of succeeding checks up to the failure you can use itertools.takewhile to take the first run of truthy values only. Since the result of takewhile is lazy itself you'll need to convert it to a list to see the result in a REPL:
from itertools import takewhile
takewhile(lambda x: x, checks())
list(takewhile(lambda x: x, checks()))
main logic:
results = list(takewhile(lambda x: x, map(lambda x: x(), function_list)))
if len(results) == len(function_list):
return results
you can learn a lot about collection transformations if you look at all methods of an api like http://www.scala-lang.org/api/2.11.7/#scala.collection.immutable.List and search/implement python equivalents
logic with setup and alternatives:
import sys
if sys.version_info.major == 2:
from collections import imap
map = imap
def test(bool):
def inner():
print(bool)
return bool
return inner
def function_for_return():
function_list = [test(True),test(True),test(False),test(True)]
from itertools import takewhile
print("results:")
results = list(takewhile(lambda x:x,map(lambda x:x(),function_list)))
if len(results) == len(function_list):
return results
print(results)
#personally i prefer another syntax:
class Iterator(object):
def __init__(self,iterable):
self.iterator = iter(iterable)
def __next__(self):
return next(self.iterator)
def __iter__(self):
return self
def map(self,f):
return Iterator(map(f,self.iterator))
def takewhile(self,f):
return Iterator(takewhile(f,self.iterator))
print("results2:")
results2 = list(
Iterator(function_list)
.map(lambda x:x())
.takewhile(lambda x:x)
)
print(results2)
print("with additional information")
function_list2 = [(test(True),"a"),(test(True),"b"),(test(False),"c"),(test(True),"d")]
results3 = list(
Iterator(function_list2)
.map(lambda x:(x[0](),x[1]))
.takewhile(lambda x:x[0])
)
print(results3)
function_for_return()
If you don't need to take an arbitrary number of expressions at runtime (possibly wrapped in lambdas), you can expand your code directly into this pattern:
def f ():
try:
return (<a> or jump(),
<b> or jump(),
<c> or jump())
except NonLocalExit:
return None
Where those definitions apply:
class NonLocalExit(Exception):
pass
def jump():
raise NonLocalExit()
Flexible short circuiting is really best done with Exceptions. For a very simple prototype you could even just assert each check result:
try:
a = check_a()
assert a
b = check_b()
assert b
c = check_c()
assert c
return a, b, c
except AssertionException as e:
return None
You should probably raise a custom Exception instead. You could change your check_X functions to raise Exceptions themself, in an arbitrary nested way. Or you could wrap or decorate your check_X functions to raise errors on falsy return values.
In short, exception handling is very flexible and exactly what you are looking for, don't be afraid to use it. If you learned somewhere that exception handling is not to be used for your own flow control, this does not apply to python. Liberal use of exception handling is considered pythonic, as in EAFP.
You mentioned 'short-circuiting' in your answer, which can be done with the 'or' statement. Top answer basically does the same thing, but in case someone wants to know more about this behaviour you could do this;
class Container(object):
def __init__(self):
self.values = []
def check_and_cache(self, value, checking_function):
value_true = checking_function(value)
if value_true:
self.values.append(value)
return True
c = Container()
if not c.check_and_cache(a, check_a) or not c.check_and_cache(b, check_b) or not c.check_and_cache(c, check_c):
print 'done'
return tuple(c.values)
The 'not .. or' setup of the if statements will result in a 'True' if the check fails, so the overall if statement passes without evaluating the remaining values.
Since I can not comment "wim":s answer as guest, I'll just add an extra answer.
Since you want a tuple, you should collect the results in a list and then cast to tuple.
def short_eval(*checks):
result = []
for check in checks:
checked = check()
if not checked:
break
result.append(checked)
return tuple(result)
# Example
wished = short_eval(check_a, check_b, check_c)
You can try use #lazy_function decorator from lazy_python
package. Example of usage:
from lazy import lazy_function, strict
#lazy_function
def check(a, b):
strict(print('Call: {} {}'.format(a, b)))
if a + b > a * b:
return '{}, {}'.format(a, b)
a = check(-1, -2)
b = check(1, 2)
c = check(-1, 2)
print('First condition')
if c and a and b: print('Ok: {}'.format((a, b)))
print('Second condition')
if c and b: print('Ok: {}'.format((c, b)))
# Output:
# First condition
# Call: -1 2
# Call: -1 -2
# Second condition
# Call: 1 2
# Ok: ('-1, 2', '1, 2')
This is similar to Bergi's answer but I think that answer misses the point of wanting separate functions (check_a, check_b, check_c):
list1 = []
def check_a():
condition = True
a = 1
if (condition):
list1.append(a)
print ("checking a")
return True
else:
return False
def check_b():
condition = False
b = 2
if (condition):
list1.append(b)
print ("checking b")
return True
else:
return False
def check_c():
condition = True
c = 3
if (condition):
list1.append(c)
print ("checking c")
return True
else:
return False
if check_a() and check_b() and check_c():
# won't get here
tuple1 = tuple(list1)
print (tuple1)
# output is:
# checking a
# (1,)
Or, if you don't want to use the global list, pass a reference of a local list to each of the functions.
If the main objection is
This is messy if there are many checks:
if a:
b = check_b()
if b:
c = check_c()
if c:
return a, b, c
A fairly nice pattern is to reverse the condition and return early
if not a:
return # None, or some value, or however you want to handle this
b = check_b()
if not b:
return
c = check_c()
if not c:
return
# ok, they were all truthy
return a, b, c

Python Generator "chain" in a for loop

I'm trying to set up a "processing pipeline" for data that I'm reading in from a data source, and applying a sequence of operators (using generators) to each item as it is read.
Some sample code that demonstrates the same issue.
def reader():
yield 1
yield 2
yield 3
def add_1(val):
return val + 1
def add_5(val):
return val + 5
def add_10(val):
return val + 10
operators = [add_1, add_5, add_10]
def main():
vals = reader()
for op in operators:
vals = (op(val) for val in vals)
return vals
print(list(main()))
Desired : [17, 18, 19]
Actual: [31, 32, 33]
Python seems to not be saving the value of op each time through the for loop, so it instead applies the third function each time. Is there a way to "bind" the actual operator function to the generator expression each time through the for loop?
I could get around this trivially by changing the generator expression in the for loop to a list comprehension, but since the actual data is much larger, I don't want to be storing it all in memory at any one point.
You can force a variable to be bound by creating the generator in a new function. eg.
def map_operator(operator, iterable):
# closure value of operator is now separate for each generator created
return (operator(item) for item in iterable)
def main():
vals = reader()
for op in operators:
vals = map_operator(op, vals)
return vals
However, map_operator is pretty much identical to the map builtin (in python 3.x). So just use that instead.
You can define a little helper which composes the functions but in reverse order:
import functools
def compose(*fns):
return functools.reduce(lambda f, g: lambda x: g(f(x)), fns)
I.e. you can use compose(f,g,h) to generate a lambda expression equivalent to lambda x: h(g(f(x))). This order is uncommon, but ensures that your functions are applied left-to-right (which is probably what you expect):
Using this, your main becomes just
def main():
vals = reader()
f = compose(add_1, add_5, add_10)
return (f(v) for v in vals)
This may be what you want - create a composite function:
import functools
def compose(functions):
return functools.reduce(lambda f, g: lambda x: g(f(x)), functions, lambda x: x)
def reader():
yield 1
yield 2
yield 3
def add_1(val):
return val + 1
def add_5(val):
return val + 5
def add_10(val):
return val + 10
operators = [add_1, add_5, add_10]
def main():
vals = map(compose(operators), reader())
return vals
print(list(main()))
The reason for this problem is that you are creating a deeply nested generator of generators and evaluate the whole thing after the loop, when op has been bound to the last element in the list -- similar to the quite common "lambda in a loop" problem.
In a sense, your code is roughly equivalent to this:
for op in operators:
pass
print(list((op(val) for val in (op(val) for val in (op(val) for val in (x for x in [1, 2, 3])))))
One (not very pretty) way to fix this would be to zip the values with another generator, repeating the same operation:
def add(n):
def add_n(val):
return val + n
return add_n
operators = [add(n) for n in [1, 5, 10]]
import itertools
def main():
vals = (x for x in [1, 2, 3])
for op in operators:
vals = (op(val) for (val, op) in zip(vals, itertools.repeat(op)))
return vals
print(list(main()))

How can I sum (make totals) on multiple object attributes with one loop pass?

I want to sum multiple attributes at a time in a single loop:
class Some(object):
def __init__(self, acounter, bcounter):
self.acounter = acounter
self.bcounter = bcounter
someList = [Some(x, x) for x in range(10)]
Can I do something simpler and faster than it?
atotal = sum([x.acounter for x in someList])
btotal = sum([x.bcounter for x in someList])
First off - sum doesn't need a list - you can use a generator expression instead:
atotal = sum(x.acounter for x in someList)
You could write a helper function to do the search of the list once but look up each attribute in turn per item, eg:
def multisum(iterable, *attributes, **kwargs):
sums = dict.fromkeys(attributes, kwargs.get('start', 0))
for it in iterable:
for attr in attributes:
sums[attr] += getattr(it, attr)
return sums
counts = multisum(someList, 'acounter', 'bcounter')
# {'bcounter': 45, 'acounter': 45}
Another alternative (which may not be faster) is to overload the addition operator for your class:
class Some(object):
def __init__(self, acounter, bcounter):
self.acounter = acounter
self.bcounter = bcounter
def __add__(self, other):
if isinstance(other, self.__class__):
return Some(self.acounter+other.acounter, self.bcounter+other.bcounter)
elif isinstance(other, int):
return self
else:
raise TypeError("useful message")
__radd__ = __add__
somelist = [Some(x, x) for x in range(10)]
combined = sum(somelist)
print combined.acounter
print combined.bcounter
This way sum returns a Some object.
I doubt that this is really faster, but you can do it like thus:
First define padd (for "pair add") via:
def padd(p1,p2):
return (p1[0]+p2[0],p1[1]+p2[1])
For example, padd((1,4), (5,10)) = (6,14)
Then use reduce:
atotal, btotal = reduce(padd, ((x.acounter,x.bcounter) for x in someList))
in Python 3 you need to import reduce from functools but IIRC it can be used directly in Python 2.
On edit: For more than 2 attributes you can replace padd by vadd ("vector add") which can handle tuples of arbitrary dimensions:
def vadd(v1,v2):
return tuple(x+y for x,y in zip(v1,v2))
For just 2 attributes it is probably more efficient to hard-wire in the dimension since there is less function-call overhead.
Use this line to accumulate all of the attributes that you wish to sum.
>>> A = ((s.acounter,s.bcounter) for s in someList)
Then use this trick from https://stackoverflow.com/a/19343/47078 to make separate lists of each attribute by themselves.
>>> [sum(x) for x in zip(*A)]
[45, 45]
You can obviously combine the lines, but I thought breaking it apart would be easier to follow here.
And based on this answer, you can make it much more readable by defining an unzip(iterable) method.
def unzip(iterable):
return zip(*iterable)
[sum(x) for x in unzip((s.acounter,s.bcounter) for s in someList)]

Categories

Resources