How to use python iterators in a pure functional workflow - python

Something that has been bothering me is that python iterators do not fall into the definition of a pure immutable object as re accessing them modifies their behavior.
I understand the way this works but reading code with iterators can become confusing and doesn't seem very pythonic.
My question is... is there a nice pythonic way to approach this?
I.e. The use of an iterator here results in a side effect(input argument is modified) makes the function impure
def foo(i):
return list(i)
b = iter([1,2,3])
print(foo(b)) # outputs [1,2,3]
print(foo(b)) # outputs []
print(list(b)) # outputs []

Issue in your example is that your iterator a state is in global scope which sort of already clashes with "no side-effects" rule. Once it gets exhausted (eg, it has throw StopIteration exception), its done and has to be reinitialized.
from copy import copy
def foo(i):
return list(i)
a = [1,2,3]
b = iter(a)
print(foo(copy(b))) # outputs [1,2,3]
print(foo(copy(b))) # outputs [1,2,3]
print(list(copy(b))) # outputs [1,2,3]

Related

Forcing a list out of a generator

>>>def change(x):
... x.append(len(x))
... return x
>>>a=[]
>>>b=(change(a) for i in range(3))
>>>next(b)
[0]
>>>next(b)
[0,1]
>>>next(b)
[0,1,2]
>>>next(b)
Traceback ... StopIteration
>>>a=[]
>>>b=(change(a) for i in range(3))
>>>list(b) #expecting [[0],[0,1],[0,1,2]]
[[0,1,2],[0,1,2],[0,1,2]]
So I was just testing my understanding of generators and messing around with the command prompt and now I'm unsure if I actually understand how generators work.
The problem is that all calls to change(a) return the same object (in this case, the object is the value of a), but this object is mutable and changes its value. An example of the same problem without using generators:
a = []
b = []
for i in range(3):
a.append(len(a))
b.append(a)
print b
If you want to avoid it, you need to make a copy of your object (for example, make change return x[:] instead of x).

Why does my function call affect my variable sent in the parameter?

So as part of /r/dailyprogrammer's challenge on trying out a few simple tasks in a new Programming language, I tried out Python after only having dabbled in it very slightly.
There I had to recreate a Bubble-Sort in Python and this is what I came up with:
def bubble(unsorted):
length = len(unsorted)
isSorted = False
while not isSorted:
isSorted = True
for i in range(0, length-1):
if(unsorted[i] > unsorted[i+1]):
isSorted = False
holder = unsorted[i]
unsorted[i] = unsorted[i+1]
unsorted[i+1] = holder
myList = [5,6,4,2,10,1]
bubble(myList)
print myList
Now this code works flawlessly as far as I can tell, and that is precisely the problem. I can't figure out why bubble function would affect the variable myList without me returning anything to it, or setting it anew.
This is really bugging me but it's probably a python type thing :) That or I'm a very silly man indeed.
I'm not sure what the reason of the confusion is, but if you think that each time when you write func(obj) the whole object is copied to the stack, you're wrong.
All parameters, except primitive types such as numbers, are passed by reference. It means that object's members or array elements can be updated after function is executed.
Write a simple prog to confirm that:
>>> a=[1]
>>> def f(x):
... x[0]=2
...
>>> f(a)
>>> print a[0]
2
I hope it'll clarify the picture.
For primitive types you'll have a different result though:
>>> i=1
>>> def f(x):
... x=2
...
>>> f(i)
>>> print i
1
>>>
The answer is unsorted and myList point to the same object, they are not copies. Hence, when you change one you change the other. You can find a visualization of it here.

python generator of generators?

I wrote a class that reads a txt file. The file is composed of blocks of non-empty lines (let's call them "sections"), separated by an empty line:
line1.1
line1.2
line1.3
line2.1
line2.2
My first implementation was to read the whole file and return a list of lists, that is a list of sections, where each section is a list of lines.
This was obviously terrible memory-wise.
So I re-implemented it as a generator of lists, that is at every cycle my class reads a whole section in memory as a list and yields it.
This is better, but it's still problematic in case of large sections. So I wonder if I can reimplement it as a generator of generators? The problem is that this class is very generic, and it should be able to satisfy both of these use cases:
read a very big file, containing very big sections, and cycle through it only once. A generator of generators is perfect for this.
read a smallish file into memory to be cycled over multiple times. A generator of lists works fine, because the user can just invoke
list(MyClass(file_handle))
However, a generator of generators would NOT work in case 2, as the inner objects would not be transformed to lists.
Is there anything more elegant than implementing an explicit to_list() method, that would transform the generator of generators into a list of lists?
Python 2:
map(list, generator_of_generators)
Python 3:
list(map(list, generator_of_generators))
or for both:
[list(gen) for gen in generator_of_generators]
Since the generated objects are generator functions, not mere generators, you'd want to do
[list(gen()) for gen in generator_of_generator_functions]
If that doesn't work I have no idea what you're asking. Also, why would it return a generator function and not a generator itself?
Since in the comments you said you wanted to avoid list(generator_of_generator_functions) from crashing mysteriously, this depends on what you really want.
It is not possible to overwrite the behaviour of list in this way: either you store the sub-generator elements or not
If you really do get a crash, I recommend exhausting the sub-generator with the main generator loop every time the main generator iterates. This is standard practice and exactly what itertools.groupby does, a stdlib generator-of-generators.
eg.
def metagen():
def innergen():
yield 1
yield 2
yield 3
for i in range(3):
r = innergen()
yield r
for _ in r: pass
Or use a dark, secret hack method that I'll show in a mo' (I need to write it), but don't do it!
As promised, the hack (for Python 3, this time 'round):
from collections import UserList
from functools import partial
def objectitemcaller(key):
def inner(*args, **kwargs):
try:
return getattr(object, key)(*args, **kwargs)
except AttributeError:
return NotImplemented
return inner
class Listable(UserList):
def __init__(self, iterator):
self.iterator = iterator
self.iterated = False
def __iter__(self):
return self
def __next__(self):
self.iterated = True
return next(self.iterator)
def _to_list_hack(self):
self.data = list(self)
del self.iterated
del self.iterator
self.__class__ = UserList
for key in UserList.__dict__.keys() - Listable.__dict__.keys():
if key not in ["__class__", "__dict__", "__module__", "__subclasshook__"]:
setattr(Listable, key, objectitemcaller(key))
def metagen():
def innergen():
yield 1
yield 2
yield 3
for i in range(3):
r = Listable(innergen())
yield r
if not r.iterated:
r._to_list_hack()
else:
for item in r: pass
for item in metagen():
print(item)
print(list(item))
#>>> <Listable object at 0x7f46e4a4b850>
#>>> [1, 2, 3]
#>>> <Listable object at 0x7f46e4a4b950>
#>>> [1, 2, 3]
#>>> <Listable object at 0x7f46e4a4b990>
#>>> [1, 2, 3]
list(metagen())
#>>> [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
It's so bad I don't want to even explain it.
The key is that you have a wrapper that can detect whether it has been iterated, and if not you run a _to_list_hack that, I kid you not, changes the __class__ attribute.
Because of conflicting layouts we have to use the UserList class and shadow all of its methods, which is just another layer of crud.
Basically, please don't use this hack. You can enjoy it as humour, though.
A rather pragmatic way would be to tell the "generator of generators" upon creation whether to generate generators or lists. While this is not as convenient as having list magically know what to do, it still seems to be more comfortable than having a special to_list function.
def gengen(n, listmode=False):
for i in range(n):
def gen():
for k in range(i+1):
yield k
yield list(gen()) if listmode else gen()
Depending on the listmode parameter, this can either be used to generate generators or lists.
for gg in gengen(5, False):
print gg, list(gg)
print list(gengen(5, True))

restartable generator available in Python Standard Library?

I can't imagine that I'm the first to write a class like this:
class RestartableGenerator:
def __init__(self, g):
self.g = g
def __iter__(self):
return self.g().__iter__()
if __name__=='__main__':
def gen():
print 'Generating'
for i in range(5):
yield i
i = RestartableGenerator(gen)
print 'Using'
print list(i)
print list(i)
The test produces this output:
Using
Generating
[0, 1, 2, 3, 4]
Generating
[0, 1, 2, 3, 4]
But I didn't find it in the Standard Library. I looked in itertools and functools.
Is it really not there? If it is, where?
Was it considered unnecessary, because when you want to evaluate a sequence multiple times, you better store it in a list?
Edit 1:
My use case is that I want it to be tranparent for the consumer that the sequence is, for memory consumption reasons, a generator instead of a list.
Edit 2: If there's no such class in the Standard Library, what name do you think is appropriate? ParenthesisRemover? MultipleTimesIterable? Anything else? Why?
You could simplify it a little:
class RestartableGenerator:
def __init__(self, g):
self.g = g
def __iter__(self):
return self.g()
Calling gen() returns a generator object. The generator object has a next method, which is the kind of object which __iter__ must return.
There is no need for RestartableGenerator, however, since it does nothing that gen itself can not do. Instead of holding gen in a class attribute, just hold on to gen itself.
def gen():
print 'Generating'
for i in range(5):
yield i
print 'Using'
print list(gen())
print list(gen())
This isn't exactly restartable in the sense of going back to the beginning of the sequence. Each __iter__ call creates a new generator that will rerun the generator code, potentially reexecuting side effects and producing different results. If you want independent iterators over a generated sequence, that's what list or itertools.tee are for. Otherwise, it's clearer to explicitly call the generator function again, so this isn't very useful. You save a pair of parentheses at the cost of less explicit, more bug-prone code.
Note that if you want a lazy sequence type, where iterating over it generates elements on the fly, but you can iterate over it repeatedly, you should define its __iter__ method as a generator:
class Primes(object):
def __iter__(self):
for i in itertools.count():
if is_prime(i):
yield i
This isn't a "restartable generator", but it sounds like what you want.

Is there a map without result in python? [duplicate]

This question already has answers here:
Is it Pythonic to use list comprehensions for just side effects?
(7 answers)
Closed 4 months ago.
Sometimes, I just want to execute a function for a list of entries -- eg.:
for x in wowList:
installWow(x, 'installed by me')
Sometimes I need this stuff for module initialization, so I don't want to have a footprint like x in global namespace. One solution would be to just use map together with lambda:
map(lambda x: installWow(x, 'installed by me'), wowList)
But this of course creates a nice list [None, None, ...] so my question is, if there is a similar function without a return-list -- since I just don't need it.
(off course I can also use _x and thus not leaving visible footprint -- but the map-solution looks so neat ...)
You could make your own "each" function:
def each(fn, items):
for item in items:
fn(item)
# called thus
each(lambda x: installWow(x, 'installed by me'), wowList)
Basically it's just map, but without the results being returned. By using a function you'll ensure that the "item" variable doesn't leak into the current scope.
You can use the built-in any function to apply a function without return statement to any item returned by a generator without creating a list. This can be achieved like this:
any(installWow(x, 'installed by me') for x in wowList)
I found this the most concise idom for what you want to achieve.
Internally, the installWow function does return None which evaluates to False in logical operations. any basically applies an or reduction operation to all items returned by the generator, which are all None of course, so it has to iterate over all items returned by the generator. In the end it does return False, but that doesn't need to bother you. The good thing is: no list is created as a side-effect.
Note that this only works as long as your function returns something that evaluates to False, e.g., None or 0. If it does return something that evaluates to True at some point, e.g., 1, it will not be applied to any of the remaining elements in your iterator. To be safe, use this idiom mainly for functions without return statement.
How about this?
for x in wowList:
installWow(x, 'installed by me')
del x
Every expression evaluates to something, so you always get a result, whichever way you do it. And any such returned object (just like your list) will get thrown away afterwards because there's no reference to it anymore.
To clarify: Very few things in python are statements that don't return anything. Even a function call like
doSomething()
still returns a value, even if it gets discarded right away. There is no such thing as Pascal's function / procedure distinction in python.
You might try this:
filter(lambda x: installWow(x, 'installed by me') and False, wowList)
That way, the return result is an empty list no matter what.
Or you could just drop the and False if you can force installWow() to always return False (or 0 or None or another expression that evaluates false).
You could use a filter and a function that doesn't return a True value. You'd get an empty return list since filter only adds the values which evaluates to true, which I suppose would save you some memory. Something like this:
#!/usr/bin/env python
y = 0
def myfunction(x):
global y
y += x
input = (1, 2, 3, 4)
print "Filter output: %s" % repr(filter(myfunction, input))
print "Side effect result: %d" % y
Running it produces this output:
Filter output: ()
Side effect result: 10
I can not resist myself to post it as separate answer
reduce(lambda x,y: x(y, 'installed by me') , wowList, installWow)
only twist is installWow should return itself e.g.
def installWow(*args):
print args
return installWow
if it is ok to distruct wowList
while wowList: installWow(wowList.pop(), 'installed by me')
if you do want to maintain wowList
wowListR = wowList[:]
while wowListR: installWow(wowListR.pop(), 'installed by me')
and if order matters
wowListR = wowList[:]; wowListR.reverse()
while wowListR: installWow(wowListR.pop(), 'installed by me')
Though as the solution of the puzzle I like the first :)
I tested several different variants, and here are the results I got.
Python 2:
>>> timeit.timeit('for x in xrange(100): L.append(x)', 'L = []')
14.9432640076
>>> timeit.timeit('[x for x in xrange(100) if L.append(x) and False]', 'L = []')
16.7011508942
>>> timeit.timeit('next((x for x in xrange(100) if L.append(x) and False), None)', 'L = []')
15.5235641003
>>> timeit.timeit('any(L.append(x) and False for x in xrange(100))', 'L = []')
20.9048290253
>>> timeit.timeit('filter(lambda x: L.append(x) and False, xrange(100))', 'L = []')
27.8524758816
Python 3:
>>> timeit.timeit('for x in range(100): L.append(x)', 'L = []')
13.719769178002025
>>> timeit.timeit('[x for x in range(100) if L.append(x) and False]', 'L = []')
15.041426660001889
>>> timeit.timeit('next((x for x in range(100) if L.append(x) and False), None)', 'L = []')
15.448063717998593
>>> timeit.timeit('any(L.append(x) and False for x in range(100))', 'L = []')
22.087335471998813
>>> timeit.timeit('next(filter(lambda x: L.append(x) and False, range(100)), None)', 'L = []')
36.72446593800123
Note that the time values are not that precise (for example, the relative performance of the first three options varied from run to run). My conclusion is that you should just use a loop, it's more readable and performs at least as well as the alternatives. If you want to avoid polluting the namespace, just del the variable after using it.
first rewrite the for loop as a generator expression, which does not allocate any memory.
(installWow(x, 'installed by me') for x in wowList )
But this expression doesn't actually do anything without finding some way to consume it. So we can rewrite this to yield something determinate, rather than rely on the possibly None result of installWow.
( [1, installWow(x, 'installed by me')][0] for x in wowList )
which creates a list, but returns only the constant 1. this can be consumed conveniently with reduce
reduce(sum, ( [1, installWow(x, 'installed by me')][0] for x in wowList ))
Which conveniently returns the number of items in wowList that were affected.
Just make installWow return None or make the last statement be pass like so:
def installWow(item, phrase='installed by me'):
print phrase
pass
and use this:
list(x for x in wowList if installWow(x))
x won't be set in the global name space and the list returned is [] a singleton
If you're worried about the need to control the return value (which you need to do to use filter) and prefer a simpler solution than the reduce example above, then consider using reduce directly. Your function will need to take an additional first parameter, but you can ignore it, or use a lambda to discard it:
reduce(lambda _x: installWow(_x, 'installed by me'), wowList, None)
Let me preface this by saying that it seems the original poster was more concerned about namespace clutter than anything else. In that case, you can wrap your working variables in separate function namespace and call it after declaring it, or you can simply remove them from the namespace after you've used them with the "del" builtin command. Or, if you have multiple variables to clean up, def the function with all the temp variables in it, run it, then del it.
Read on if the main concern is optimization:
Three more ways, potentially faster than others described here:
For Python >= 2.7, use collections.deque((installWow(x, 'installed by me') for x in wowList),0) # saves 0 entries while iterating the entire generator, but yes, still has a byproduct of a final object along with a per-item length check internally
If worried about this kind of overhead, install cytoolz. You can use count which still has a byproduct of incrementing a counter but it may be a smaller number of cycles than deque's per-item check, not sure. You can use it instead of any() in the next way:
Replace the generator expression with itertools.imap (when installWow never returns True. Otherwise you may consider itertools.ifilter and itertools.ifilterfalse with None for the predicate): any(itertools.imap(installWow,wowList,itertools.repeat('installed by me')))
But the real problem here is the fact that a function returns something and you do not want it to return anything.. So to resolve this, you have 2 options. One is to refactor your code so installWow takes in the wowList and iterates it internally. Another is rather mindblowing, but you can load the installWow() function into a compiled ast like so:
lines,lineno=inspect.getsourcelines(func) # func here is installWow without the parens
return ast.parse(join(l[4:] for l in lines if l)) # assumes the installWow function is part of a class in a module file.. For a module-level function you would not need the l[4:]
You can then do the same for the outer function, and traverse the ast to find the for loop. Then in the body of the for loop, insert the instalWow() function ast's function definition body, matching up the variable names. You can then simply call exec on the ast itself, and provide a namespace dictionary with the right variables filled in. To make sure your tree modifications are correct, you can check what the final source code would look like by running astunparse.
And if that isn't enough you can go to cython and write a .pyx file which will generate and compile a .c file into a library with python bindings. Then, at least the lost cycles won't be spent converting to and from python objects and type-checking everything repeatedly.
A simple DIY whose sole purpose is to loop through a generator expression:
def do(genexpr):
for _ in genexpr:
pass
Then use:
do(installWow(x, 'installed by me') for x in wowList)
In python 3 there are some ways to use a function with no return(just use a semicolon in jupyter ot ommit the output from the cell):
[*map(print, MY_LIST)]; # form 1 - unpack the map generator to a list
any(map(print, MY_LIST)); # form 2 - force execution with any
list(map(print, MY_LIST)); # form 3 - collect list from generator
Someone needs to answer --
The more pythonic way here is to not worry about polluting the namespace, and using __all__ to define the public variables.
myModule/__init__.py:
__all__ = ['func1', 'func2']
for x in range(10):
print 'init {}'.format(x)
def privateHelper1(x):
return '{}:{}'.format(x,x)
def func1():
print privateHelper1('func1')
def func2():
print privateHelper1('func1')
Then
python -c "import myModule; help(myModule);"
init 0
init 1
init 2
init 3
init 4
init 5
init 6
init 7
init 8
init 9
Help on package mm:
NAME
myModule
FILE
h:\myModule\__init__.py
PACKAGE CONTENTS
FUNCTIONS
func1()
func2()
DATA
__all__ = ['func1', 'func2']

Categories

Resources