Lazy evaluation of variables - python

I want to lexicographically compare two lists, but the values inside the list should be computed when needed. For instance, for these two lists
a = list([1, 3, 3])
b = list([1, 2, 2])
(a < b) == False
(b < a) == True
I'd like the values in the list to be functions and in the case of a and b, the values (i.e. the function) at index=2 would not be evaluated as the values at index=1 (a[1]==3, b[1]==2) are already sufficient to determine that b < a.
One option would be to manually compare the elements, and that's probably what I will do when I don't find a solution that allows me to use the list's comparator, but I found that the manual loop is a tad slower than the list's builtin comparator which is why I want to make use of it.
Update
Here's a way to accomplish what I am trying to do, but I was wondering if there are any built-in functions that would do this faster (and which makes use of this feature of lists).
def lex_comp(a, b):
for func_a, func_b in izip(a, b):
v_a = func_a()
v_b = func_b()
if v_a < v_b: return -1
if v_b > v_a: return +1
return 0
def foo1(): return 1
def foo2(): return 1
def bar1(): return 1
def bar2(): return 2
def func1(): return ...
def func2(): return ...
list_a = [foo1, bar1, func1, ...]
list_b = [foo2, bar2, func2, ...]
# now you can use the comparator for instance to sort a list of these lists
sort([list_a, list_b], cmp=lex_comp)

Try this (the extra parameters to the function are just for illustration purposes):
import itertools
def f(a, x):
print "lazy eval of {}".format(a)
return x
a = [lambda: f('a', 1), lambda: f('b', 3), lambda: f('c', 3)]
b = [lambda: f('d', 1), lambda: f('e', 2), lambda: f('f', 2)]
c = [lambda: f('g', 1), lambda: f('h', 2), lambda: f('i', 2)]
def lazyCmpList(a, b):
l = len(list(itertools.takewhile(lambda (x, y): x() == y(), itertools.izip(a, b))))
if l == len(a):
return 0
else:
return cmp(a[l](), b[l]())
print lazyCmpList(a, b)
print lazyCmpList(b, a)
print lazyCmpList(b, c)
Produces:
lazy eval of a
lazy eval of d
lazy eval of b
lazy eval of e
-1
lazy eval of d
lazy eval of a
lazy eval of e
lazy eval of b
1
lazy eval of d
lazy eval of g
lazy eval of e
lazy eval of h
lazy eval of f
lazy eval of i
0
Note that the code assumes the list of functions are of the same length. It could be enhanced to support non-equal list length, you'd have to define what the logic was i.e. what should cmp([f1, f2, f3], [f1, f2, f3, f1]) produce?
I haven't compared the speed but given your updated code I would imagine any speedup will be marginal (looping done in C code rather than Python). This solution may actually be slower as it is more complex and involved more memory allocation.
Given you are trying to sort a list of functions by evaluating them it follows that the functions will be evaluated i.e. O(nlogn) times and so your best speedup may be to look at using memoization to avoid repeated revaluation of the functions.

Here is an approach that uses lazy evaluation:
>>> def f(x):
... return 2**x
...
>>> def g(x):
... return x*2
...
>>> [f(x) for x in range(1,10)]
[2, 4, 8, 16, 32, 64, 128, 256, 512]
>>> [g(x) for x in range(1,10)]
[2, 4, 6, 8, 10, 12, 14, 16, 18]
>>> zipped = zip((f(i) for i in range(1,10)),(g(i) for i in range(1,10)))
>>> x,y = next(itertools.dropwhile(lambda t: t[0]==t[1],zipped))
>>> x > y
True
>>> x < y
False
>>> x
8
>>> y
6
>>>

I did some testing and found that #juanpa's answer and the version in my update are the fastest versions:
import random
import itertools
import functools
num_rows = 100
data = [[random.randint(0, 2) for i in xrange(10)] for j in xrange(num_rows)]
# turn data values into functions.
def return_func(value):
return value
list_funcs = [[functools.partial(return_func, v) for v in row] for row in data]
def lazy_cmp_FujiApple(a, b):
l = len(list(itertools.takewhile(lambda (x, y): x() == y(), itertools.izip(a, b))))
if l == len(a):
return 0
else:
return cmp(a[l](), b[l]())
sorted1 = sorted(list_funcs, lazy_cmp_FujiApple)
%timeit sorted(list_funcs, lazy_cmp_FujiApple)
# 100 loops, best of 3: 2.77 ms per loop
def lex_comp_mine(a, b):
for func_a, func_b in itertools.izip(a, b):
v_a = func_a()
v_b = func_b()
if v_a < v_b: return -1
if v_a > v_b: return +1
return 0
sorted2 = sorted(list_funcs, cmp=lex_comp_mine)
%timeit sorted(list_funcs, cmp=lex_comp_mine)
# 1000 loops, best of 3: 930 µs per loop
def lazy_comp_juanpa(a, b):
x, y = next(itertools.dropwhile(lambda t: t[0]==t[1], itertools.izip(a, b)))
return cmp(x, y)
sorted3 = sorted(list_funcs, cmp=lazy_comp_juanpa)
%timeit sorted(list_funcs, cmp=lex_comp_mine)
# 1000 loops, best of 3: 949 µs per loop
%timeit sorted(data)
# 10000 loops, best of 3: 45.4 µs per loop
# print sorted(data)
# print [[c() for c in row] for row in sorted1]
# print [[c() for c in row] for row in sorted2]
# print sorted3
I guess the creation of an intermediate list is hurting performance of #FujiApple's version. When running my comparator version on the original data list and comparing the runtime to Python's native list sorting, I note that my version is about 10times slower (501 µs vs 45.4 µs per loop). I guess theres' no easy way to get close to the performance of Python's native implementation...

Related

what is the most efficient way of concat two numbers to one number in python?

what is the most efficient way of concat two numbers to one number in python?
numbers are always in between 0 to 255, i have tested few ways by Concat as string and cast back to int but they are very costly in time vice for my code.
example
a = 152
c = 255
d = concat(a,c)
answer:
d = 152255
If the numbers are bounded, just multiply and add:
>>> a = 152
>>> c = 255
>>> d = a*1000+c
>>> d
152255
>>>
This is pretty fast:
def concat(a, b):
return 10**int(log(b, 10)+1)*a+b
It uses the logarithm to find how many times the first number must be multiplied by 10 for the sum to work as a concatenation
In [1]: from math import log
In [2]: a = 152
In [3]: b = 255
In [4]: def concat(a, b):
...: return 10**int(log(b, 10)+1)*a+b
...:
In [5]: concat(a, b)
Out[5]: 152255
In [6]: %timeit concat(a, b)
1000000 loops, best of 3: 1.18 us per loop
Yeah, there you go:
a = 152
b = 255
def concat(a, b):
n = next(x for x in range(10) if 10**x>a) # concatenates numbers up to 10**10
return a * 10**n + b
print(concat(a, b)) # -> 152255

Python adding/subtracting sequence

Hi so i'm trying to make a function where I subtract the first number with the second and then add the third then subtract the fourth ie. x1-x2+x3-x4+x5-x6...
So far I have this, I can only add two variables, x and y. Was thinking of doing
>>> reduce(lambda x,y: (x-y) +x, [2,5,8,10]
Still not getting it
pretty simple stuff just confused..
In this very case it would be easier to use sums:
a = [2,5,8,10]
sum(a[::2])-sum(a[1::2])
-5
Use a multiplier that you revert to +- after each addition.
result = 0
mult = 1
for i in [2,5,8,10]:
result += i*mult
mult *= -1
You can keep track of the position (and thus whether to do + or -) with enumerate, and you could use the fact that -12n is +1 and -12n+1 is -1. Use this as a factor and sum all the terms.
>>> sum(e * (-1)**i for i, e in enumerate([2,5,8,10]))
-5
If you really want to use reduce, for some reason, you could do something like this:
class plusminus(object):
def __init__(self):
self._plus = True
def __call__(self, a, b):
self._plus ^= True
if self._plus:
return a+b
return a-b
reduce(plusminus(), [2,5,8,10]) # output: -5
Or just using sum and a generator:
In [18]: xs
Out[18]: [1, 2, 3, 4, 5]
In [19]: def plusminus(iterable):
....: for i, x in enumerate(iterable):
....: if i%2 == 0:
....: yield x
....: else:
....: yield -x
....:
In [20]: sum(plusminus(xs))
Out[20]: 3
Which could also be expressed as:
sum(map(lambda x: operator.mul(*x), zip(xs, itertools.cycle([1, -1]))))

Performance considerations when populating lists vs dictionaries

Say I need to collect millions of strings in an iterable that I can later randomly index by position.
I need to populate the iterable one item at a time, sequentially, for millions of entries.
Given the above, which method could in principle be more efficient:
Populating a list:
while <condition>:
if <condition>:
my_list[count] = value
count += 1
Populating a dictionary:
while <condition>:
if <condition>:
my_dict[count] = value
count += 1
(the above is pesudocode, everything would be initialized before running the snippets).
I am specifically interested in the CPython implementation for Python 3.4.
Lists are definitely faster, if you use them in the right way.
In [19]: %%timeit l = []
....: for i in range(1000000): l.append(str(i))
....:
1 loops, best of 3: 182 ms per loop
In [20]: %%timeit d = {}
....: for i in range(1000000): d[i] = str(i)
....:
1 loops, best of 3: 207 ms per loop
In [21]: %timeit [str(i) for i in range(1000000)]
10 loops, best of 3: 158 ms per loop
Pushing the Python loop down to the C level with a comprehension buys you quite a bit of time. It also makes more sense to prefer a list for keys that are a prefix of the integers. Pre-allocating saves even more time:
>>> %%timeit
... l = [None] * 1000000
... for i in xrange(1000000): my_list[i] = str(i)
...
10 loops, best of 3: 147 ms per loop
For completeness, a dict comprehension does not speed things up:
In [22]: %timeit {i: str(i) for i in range(1000000)}
1 loops, best of 3: 213 ms per loop
With larger strings, I see very similar differences in performance (try str(i) * 10). This is CPython 2.7.6 on an x86-64.
I don't understand why you want to create an empty list or dict and then populate it. Why not create a new list or dictionary directly from the generation process?
results = list(a_generator)
# Or if you really want to use a dict for some reason:
results = dict(enumerate(a_generator))
You can get even better times by using the map function:
>>> def test1():
l = []
for i in range(10 ** 6):
l.append(str(i))
>>> def test2():
d = {}
for i in range(10 ** 6):
d[i] = str(i)
>>> def test3():
[str(i) for i in range(10 ** 6)]
>>> def test4():
{i: str(i) for i in range(10 ** 6)}
>>> def test5():
list(map(str, range(10 ** 6)))
>>> def test6():
r = range(10 ** 6)
dict(zip(r, map(str, r)))
>>> timeit.Timer('test1()', 'from __main__ import test1').timeit(100)
30.628035710889932
>>> timeit.Timer('test2()', 'from __main__ import test2').timeit(100)
31.093550469839613
>>> timeit.Timer('test3()', 'from __main__ import test3').timeit(100)
25.778271498509355
>>> timeit.Timer('test4()', 'from __main__ import test4').timeit(100)
30.10892986559668
>>> timeit.Timer('test5()', 'from __main__ import test5').timeit(100)
20.633583353028826
>>> timeit.Timer('test6()', 'from __main__ import test6').timeit(100)
28.660790917067914

Repeat function python

I'm stuck at higher-order functions in python. I need to write a repeat function repeat that applies the function f n times on a given argument x.
For example, repeat(f, 3, x) is f(f(f(x))).
This is what I have:
def repeat(f,n,x):
if n==0:
return f(x)
else:
return repeat(f,n-1,x)
When I try to assert the following line:
plus = lambda x,y: repeat(lambda z:z+1,x,y)
assert plus(2,2) == 4
It gives me an AssertionError. I read about How to repeat a function n times but I need to have it done in this way and I can't figure it out...
You have two problems:
You are recursing the wrong number of times (if n == 1, the function should be called once); and
You aren't calling f on the returned value from the recursive call, so the function is only ever applied once.
Try:
def repeat(f, n, x):
if n == 1: # note 1, not 0
return f(x)
else:
return f(repeat(f, n-1, x)) # call f with returned value
or, alternatively:
def repeat(f, n, x):
if n == 0:
return x # note x, not f(x)
else:
return f(repeat(f, n-1, x)) # call f with returned value
(thanks to #Kevin for the latter, which supports n == 0).
Example:
>>> repeat(lambda z: z + 1, 2, 2)
4
>>> assert repeat(lambda z: z * 2, 4, 3) == 3 * 2 * 2 * 2 * 2
>>>
You've got a very simple error there, in the else block you are just passing x along without doing anything to it. Also you are applying x when n == 0, don't do that.
def repeat(f,n,x):
"""
>>> repeat(lambda x: x+1, 2, 0)
2
"""
return repeat(f, n-1, f(x)) if n > 0 else x

Iterate over a ‘window’ of adjacent elements in Python

This is more a question of elegance and performance rather than “how to do at all”, so I'll just show the code:
def iterate_adjacencies(gen, fill=0, size=2, do_fill_left=True,
do_fill_right=False):
""" Iterates over a 'window' of `size` adjacent elements in the supploed
`gen` generator, using `fill` to fill edge if `do_fill_left` is True
(default), and fill the right edge (i.e. last element and `size-1` of
`fill` elements as the last item) if `do_fill_right` is True. """
fill_size = size - 1
prev = [fill] * fill_size
i = 1
for item in gen: # iterate over the supplied `whatever`.
if not do_fill_left and i < size:
i += 1
else:
yield prev + [item]
prev = prev[1:] + [item]
if do_fill_right:
for i in range(fill_size):
yield prev + [fill]
prev = prev[1:] + [fill]
and then ask: is there already a function for that? And, if not, can you do the same thing in a better (i.e. more neat and/or more fast) way?
Edit:
with ideas from answers of #agf, #FogleBird, #senderle, a resulting somewhat-neat-looking piece of code is:
def window(seq, size=2, fill=0, fill_left=True, fill_right=False):
""" Returns a sliding window (of width n) over data from the iterable:
s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...
"""
ssize = size - 1
it = chain(
repeat(fill, ssize * fill_left),
iter(seq),
repeat(fill, ssize * fill_right))
result = tuple(islice(it, size))
if len(result) == size: # `<=` if okay to return seq if len(seq) < size
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
This page shows how to implement a sliding window with itertools. http://docs.python.org/release/2.3.5/lib/itertools-example.html
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
Example output:
>>> list(window(range(10)))
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9)]
You'd need to change it to fill left and right if you need.
This is my version that fills, keeping the signature the same. I have previously seen the itertools recipe, but did not look at it before writing this.
from itertools import chain
from collections import deque
def ia(gen, fill=0, size=2, fill_left=True, fill_right=False):
gen, ssize = iter(gen), size - 1
deq = deque(chain([fill] * ssize * fill_left,
(next(gen) for _ in xrange((not fill_left) * ssize))),
maxlen = size)
for item in chain(gen, [fill] * ssize * fill_right):
deq.append(item)
yield deq
Edit: I also didn't see your comments on your question before posting this.
Edit 2: Fixed. I had tried to do it with one chain but this design needs two.
Edit 3: As #senderle noted, only use it this as a generator, don't wrap it with list or accumulate the output, as it yields the same mutable item repeatedly.
Ok, after coming to my senses, here's a non-ridiculous version of window_iter_fill. My previous version (visible in edits) was terrible because I forgot to use izip. Not sure what I was thinking. Using izip, this works, and, in fact, is the fastest option for small inputs!
def window_iter_fill(gen, size=2, fill=None):
gens = (chain(repeat(fill, size - i - 1), gen, repeat(fill, i))
for i, gen in enumerate(tee(gen, size)))
return izip(*gens)
This one is also fine for tuple-yielding, but not quite as fast.
def window_iter_deque(it, size=2, fill=None, fill_left=False, fill_right=False):
lfill = repeat(fill, size - 1 if fill_left else 0)
rfill = repeat(fill, size - 1 if fill_right else 0)
it = chain(lfill, it, rfill)
d = deque(islice(it, 0, size - 1), maxlen=size)
for item in it:
d.append(item)
yield tuple(d)
HoverHell's newest solution is still the best tuple-yielding solution for high inputs.
Some timings:
Arguments: [xrange(1000), 5, 'x', True, True]
==============================================================================
window HoverHell's frankeniter : 0.2670ms [1.91x]
window_itertools from old itertools docs : 0.2811ms [2.02x]
window_iter_fill extended `pairwise` with izip : 0.1394ms [1.00x]
window_iter_deque deque-based, copying : 0.4910ms [3.52x]
ia_with_copy deque-based, copying v2 : 0.4892ms [3.51x]
ia deque-based, no copy : 0.2224ms [1.60x]
==============================================================================
Scaling behavior:
Arguments: [xrange(10000), 50, 'x', True, True]
==============================================================================
window HoverHell's frankeniter : 9.4897ms [4.61x]
window_itertools from old itertools docs : 9.4406ms [4.59x]
window_iter_fill extended `pairwise` with izip : 11.5223ms [5.60x]
window_iter_deque deque-based, copying : 12.7657ms [6.21x]
ia_with_copy deque-based, copying v2 : 13.0213ms [6.33x]
ia deque-based, no copy : 2.0566ms [1.00x]
==============================================================================
The deque-yielding solution by agf is super fast for large inputs -- seemingly O(n) instead of O(n, m) like the others, where n is the length of the iter and m is the size of the window -- because it doesn't have to iterate over every window. But I still think it makes more sense to yield a tuple in the general case, because the calling function is probably just going to iterate over the deque anyway; it's just a shift of the computational burden. The asymptotic behavior of the larger program should remain the same.
Still, in some special cases, the deque-yielding version will probably be faster.
Some more timings based on HoverHell's test structure.
>>> import testmodule
>>> kwa = dict(gen=xrange(1000), size=4, fill=-1, fill_left=True, fill_right=True)
>>> %timeit -n 1000 [a + b + c + d for a, b, c, d in testmodule.window(**kwa)]
1000 loops, best of 3: 462 us per loop
>>> %timeit -n 1000 [a + b + c + d for a, b, c, d in testmodule.ia(**kwa)]
1000 loops, best of 3: 463 us per loop
>>> %timeit -n 1000 [a + b + c + d for a, b, c, d in testmodule.window_iter_fill(**kwa)]
1000 loops, best of 3: 251 us per loop
>>> %timeit -n 1000 [sum(x) for x in testmodule.window(**kwa)]
1000 loops, best of 3: 525 us per loop
>>> %timeit -n 1000 [sum(x) for x in testmodule.ia(**kwa)]
1000 loops, best of 3: 462 us per loop
>>> %timeit -n 1000 [sum(x) for x in testmodule.window_iter_fill(**kwa)]
1000 loops, best of 3: 333 us per loop
Overall, once you use izip, window_iter_fill is quite fast, as it turns out -- especially for small windows.
Resulting function (from the edit of the question),
frankeniter with ideas from answers of #agf, #FogleBird, #senderle, a resulting somewhat-neat-looking piece of code is:
from itertools import chain, repeat, islice
def window(seq, size=2, fill=0, fill_left=True, fill_right=False):
""" Returns a sliding window (of width n) over data from the iterable:
s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...
"""
ssize = size - 1
it = chain(
repeat(fill, ssize * fill_left),
iter(seq),
repeat(fill, ssize * fill_right))
result = tuple(islice(it, size))
if len(result) == size: # `<=` if okay to return seq if len(seq) < size
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
and, for some performance information regarding deque/tuple:
In [32]: kwa = dict(gen=xrange(1000), size=4, fill=-1, fill_left=True, fill_right=True)
In [33]: %timeit -n 10000 [a+b+c+d for a,b,c,d in tmpf5.ia(**kwa)]
10000 loops, best of 3: 358 us per loop
In [34]: %timeit -n 10000 [a+b+c+d for a,b,c,d in tmpf5.window(**kwa)]
10000 loops, best of 3: 368 us per loop
In [36]: %timeit -n 10000 [sum(x) for x in tmpf5.ia(**kwa)]
10000 loops, best of 3: 340 us per loop
In [37]: %timeit -n 10000 [sum(x) for x in tmpf5.window(**kwa)]
10000 loops, best of 3: 432 us per loop
but anyway, if it's numbers then numpy is likely preferable.
I'm surprised nobody took a simple coroutine approach.
from collections import deque
def window(n, initial_data=None):
if initial_data:
win = deque(initial_data, n)
else:
win = deque(((yield) for _ in range(n)), n)
while 1:
side, val = (yield win)
if side == 'left':
win.appendleft(val)
else:
win.append(val)
win = window(4)
win.next()
print(win.send(('left', 1)))
print(win.send(('left', 2)))
print(win.send(('left', 3)))
print(win.send(('left', 4)))
print(win.send(('right', 5)))
## -- Results of print statements --
deque([1, None, None, None], maxlen=4)
deque([2, 1, None, None], maxlen=4)
deque([3, 2, 1, None], maxlen=4)
deque([4, 3, 2, 1], maxlen=4)
deque([3, 2, 1, 5], maxlen=4)

Categories

Resources