Recover defining expression for a python generator - python

Given a generator
g = ( <expr> for x in <iter> ),
is there any way to recover the expression and iterator used to define g?
E.g., a function that would behave like this:
expr, iter = f( ( x*x for x in range(10) ) )
expr(2) # 4
expr(5) # 25
iter[1] # 1
iter[9] # 9
iter[10] # raises IndexError
The reason I want this functionality is that I've made my own LazyList class. I want it to essentially behave like a generator, except allow access via getitem without having to iterate through k-1 elements before it can access the k'th element. Thanks.
Edit: Here's a snapshot of the lazy list class:
class LazyList(object):
def __init__(self, iter=None, expr=None):
if expr is None:
expr = lambda i: i
if iter is None:
iter = []
self._expr = expr
self._iter = iter
def __getitem__(self, key):
if hasattr(self._iter, '__getitem__'):
return self._expr(self._iter[key])
else:
return self._iter_getitem(key)
def __iter__(self):
for i in self._iter:
yield self._expr(i)
I've omitted the method _iter_getitem. All this does is iterate through _iter until it reaches the key'th element (or uses itertool's islice if key is a slice). There's also the common llmap, llreduce, etc. functions I've omitted but you can probably guess how those go.
One of my motivations for wanting to be able to decompose generators is to that I can elegantly initialize this class like
l = LazyList(x*x for x in range(10))
instead of
l = LazyList(range(10), lambda x: x*x)
But the real benefit is that this would be, with polish, a nice generalization of the generator concept and be able to be used in place of any generator (with the same memory saving benefits).
I'm using this with Django a lot because it works well with their querysets. I have a lot of code that is dependent on the list structures being lazy, because it returns multidimensional arrays that, if evaluated, would fetch way more data than I'd need.

The closest I can think of is to disassemble the code object that is inside the generator expression. Something like
>>> import dis
>>> g = ( x*x for x in range(10) )
>>> dis.dis(g.gi_code)
1 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 15 (to 21)
6 STORE_FAST 1 (x)
9 LOAD_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 BINARY_MULTIPLY
16 YIELD_VALUE
17 POP_TOP
18 JUMP_ABSOLUTE 3
>> 21 LOAD_CONST 0 (None)
24 RETURN_VALUE
That gives a little hint about what is happening, but tt's not very clear, IMHO.
There is another Stack Overflow question that deals with converting Python byte code into readable Python — maybe you can use that to get something more human readable back.

I think your concept of a LazyList is good, but your thinking about doing direct access to a generator's n'th value is flawed. Your example in using range(10) as the sequence to iterate over is a special case, one in which all values are knowable ahead of time. But many generators are computed incrementally, in which the n'th value is computed based on the n-1'th value. A fibonacci generator is one such:
def fibonacci(n=1000):
a,b=1,1
yield a
while n>0:
n -= 1
yield b
a,b = b,a+b
This gives the familiar series 1, 1, 2, 3, 5, 8, ... in which the n'th item is the sum of the n-1'th and n-2'th. So there is no way to jump directly to item 10, you have to get there through items 0-9.
That being said, your LazyList is nice for a couple of reasons:
it allows you to revisit earlier values
it simulates direct access even though under the covers the generator has to go through all the incremental values until it gets to 'n'
it only computes the values actually required, since the generator is evaluated lazily, instead of pre-emptively computing 1000 values only to find that the first 10 are used

Related

Why is a set object stored as a frozenset and a list object as a tuple?

I saw a blog post where it's mentioned "Use func.__code__.co_consts to check all the constants defined in the function".
def func():
return 1 in {1,2,3}
func.__code__.co_consts
(None, 1, frozenset({1, 2, 3}))
Why did it return a frozenset?
def func():
return 1 in [1,2,3]
func.__code__.co_consts
(None, 1, (1,2,3))
Why did it return a tuple instead of a list? Every object returned from __code__.co_consts is immutable. Why are the mutable constants made immutable? Why is the first element of the returned tuple always None?
This is a result of the Python Peephole optimizer
Under "Optimizations", it says:
BUILD_LIST + COMPARE_OP(in/not in): convert list to tuple
BUILD_SET + COMPARE_OP(in/not in): convert set to frozenset
See here for more information:
"Python uses peephole optimization of your code by either pre-calculating constant expressions or transforming certain data structures"
especially the part about "Membership Tests":
"What Python for membership tests is to transform mutable data structures to its inmutable version. Lists get transformed into tuples and sets into frozensets."
All objects in co_consts are constants, i.e. they are immutable. You shouldn't be able to, e.g., append to a list appearing as a literal in the source code and thereby modify the behaviour of the function.
The compiler usually represents list literals by listing all individual constants appearing in the list:
>>> def f():
... a = [1, 2, 3]
... return 1 in a
...
>>> f.__code__.co_consts
(None, 1, 2, 3)
Looking at the byte code of this function we can see that the function builds a list at execution time each time the function is executed:
>>> dis.dis(f)
2 0 LOAD_CONST 1 (1)
2 LOAD_CONST 2 (2)
4 LOAD_CONST 3 (3)
6 BUILD_LIST 3
8 STORE_FAST 0 (a)
3 10 LOAD_CONST 1 (1)
12 LOAD_FAST 0 (a)
14 COMPARE_OP 6 (in)
16 RETURN_VALUE
Creating a new list is required in general, because the function may modify or return the list defined by the literal, in which case it needs to operate on a new list object every time the funciton is executed.
In other contexts, creating a new list object is wasteful, though. For this reason, Python's peephole optimizer can replace the list with a tuple, or a set with a frozen_set, in certain situations where it is known to be safe. One such situation is when the list or set literal is only used in an expression of the form x [not] in <list_literal>. Another such situation is when a list literal is used in a for loop.
The peephole optimizer is very simple. It only looks at one expression at a time. For this reason, it can't detect that this optimization would be safe in my definition of f above, which is functionally equivalent to your example.

What is happening when I assign a list with self references to a list copy with the slice syntax `mylist[:] = [mylist, mylist, ...]`?

I was just looking at the implementation of functools.lru_cache, when I stumbled across this snippet:
root = [] # root of the circular doubly linked list
root[:] = [root, root, None, None] # initialize by pointing to self
I am familiar with circular and doubly linked lists. I am also aware that new_list = my_list[:] creates a copy of my_list.
When looking for slice assignments or other implementations of circular doubly linked lists, I could not find any further information on this specific syntax.
Questions:
What is going on in this situation.
Is there a different syntax to achieve the same result?
Is there a different common use case for some_list[:] =
some_iterable (without the self reference)?
in
root[:] = [root, root, None, None]
left hand slice assignment just says that the reference of root is reused to hold the contents of the right part.
So root reference never changes, and yes, in a list you can reference yourself (but don't try recursive flattening on those :). The representation displays a "recursion on list" in that case.
>>> root
[<Recursion on list with id=48987464>,
<Recursion on list with id=48987464>,
None,
None]
and printing it shows ellipsis:
>>> print(root)
[[...], [...], None, None]
note that you don't need slice assignment for this. There are simple ways to trigger a recursion:
>>> root = []
>>> root.append(root)
>>> root
[<Recursion on list with id=51459656>]
>>>
using append doesn't change reference as we all know, it just mutates the list, adding a reference to itself. Maybe easier to comprehend.
What is going on in this situation?
If l is the list, l[:] = items calls l.__setitem__(slice(None), items) is called. This method assigns the respective items from the given iterable to the list after clearing the same.
Is there a different syntax to achieve the same result?
You could do
l.clear()
l.extend(items)
Is there a different common use case for some_list[:] =
some_iterable (without the self reference)?
In theory, you could put any iterable into the list.
Just look into the disassembled code:
In [1]: def initializer():
...: root = [] # root of the circular doubly linked list
...: root[:] = [root, root, None, None]
...:
In [2]:
In [2]: import dis
In [3]: dis.dis(initializer)
2 0 BUILD_LIST 0
2 STORE_FAST 0 (root)
3 4 LOAD_FAST 0 (root)
6 LOAD_FAST 0 (root)
8 LOAD_CONST 0 (None)
10 LOAD_CONST 0 (None)
12 BUILD_LIST 4
14 LOAD_FAST 0 (root)
16 LOAD_CONST 0 (None)
18 LOAD_CONST 0 (None)
20 BUILD_SLICE 2
22 STORE_SUBSCR
24 LOAD_CONST 0 (None)
26 RETURN_VALUE
What you looking for is STORE_SUBSCR op code which is there to implement the following:
mplements TOS1[TOS] = TOS2
Which is due the do documentation an In-place operations. And if you wonder what's the In-place operations, here's how the doc defines it:
In-place operations are like binary operations, in that they remove TOS and TOS1, and push the result back on the stack, but the operation is done in-place when TOS1 supports it, and the resulting TOS may be (but does not have to be) the original TOS1.
This will verify what the inline doc in the source code says:
initialize by pointing to self.
Regarding your other questions:
Is there a different syntax to achieve the same result?
Yeah you can as it's mentioned in other answer clear and set the list items using list.extend attribute. Or assign the items one by one maybe lol
Is there a different common use case for some_list[:] =
some_iterable (without the self reference)?
This is a very vague question 'cause it is what it is. Assigning items in an Injective manner which could have the benefit of replacing items without recreating references, etc.

Do generators simply make new objects with __iter__ and next functions?

I tried to search for this answer on my own, but there was too much noise.
Are generators in python just a convenience wrapper for the user to make an iterator object?
When you define the generator:
def test():
x = 0
while True:
x += 1
yield x
is python simply making a new object, adding the __iter__ method, then putting the rest of the code into the next function?
class Test(object):
def __init__(self):
self.x = 0
def __iter__(self):
return self
def next(self):
self.x += 1
return self.x
Nope. Like so:
>>> def test():
... x = 0
... while True:
... x += 1
... yield x
...
>>> type(test)
<type 'function'>
So what it returns is a function object. Details from there get hairy; the short course is that the code object belonging to the function (test.func_code) is marked as a generator by one of the flags in test.func_code.co_flags.
You can disassemble the bytecode for test to see that it's just like any other function otherwise, apart from that a generator function always contains a YIELD_VALUE opcode:
>>> import dis
>>> dis.dis(test)
2 0 LOAD_CONST 1 (0)
3 STORE_FAST 0 (x)
3 6 SETUP_LOOP 25 (to 34)
>> 9 LOAD_GLOBAL 0 (True)
12 POP_JUMP_IF_FALSE 33
4 15 LOAD_FAST 0 (x)
18 LOAD_CONST 2 (1)
21 INPLACE_ADD
22 STORE_FAST 0 (x)
5 25 LOAD_FAST 0 (x)
28 YIELD_VALUE
29 POP_TOP
30 JUMP_ABSOLUTE 9
>> 33 POP_BLOCK
>> 34 LOAD_CONST 0 (None)
37 RETURN_VALUE
To do it the way you have in mind, the horrors just start ;-) if you think about how to create an object to mimic just this:
def test():
yield 2
yield 3
yield 4
Now your next() method would have to carry additional hidden state just to remember which yield comes next. Wrap that in some nested loops with some conditionals, and "unrolling" it into a single-entry next() becomes a nightmare.
No -- Generators also provide other methods (.send, .throw, etc) and can be used for more purposes than simply making iterators (e.g. coroutines).
Indeed, generators are an entirely different beast and a core language feature. It'd be very hard (possibly impossible) to create one in vanilla python if they weren't baked into the language.
With that said, one application of generators is to provide an easy syntax for creating an iterator :-).
Are generators in python just a convenience wrapper for the user to make an iterator object?
No. A generator is a function, where as iterators are class. Hence, generator can not be a object of iterator. But in some way you can say that generator is a simplified approach to get iterator like capability. It means:
All Generators are iterators, but not all iterators are generators.
I will strongly suggest you refer below wiki links:
Iterator - traverses a collection one at a time
Generator - generates a sequence, one item at a time
An iterator is typically something that has a next method to get the next element from a stream. A generator is an iterator that is tied to a function.
I will suggest you to refer: Difference between Python's Generators and Iterators.

Is looping through a generator in a loop over that same generator safe in Python?

From what I understand, a for x in a_generator: foo(x) loop in Python is roughly equivalent to this:
try:
while True:
foo(next(a_generator))
except StopIteration:
pass
That suggests that something like this:
for outer_item in a_generator:
if should_inner_loop(outer_item):
for inner_item in a_generator:
foo(inner_item)
if stop_inner_loop(inner_item): break
else:
bar(outer_item)
would do two things:
Not raise any exceptions, segfault, or anything like that
Iterate over y until it reaches some x where should_inner_loop(x) returns truthy, then loop over it in the inner for until stop_inner_loop(thing) returns true. Then, the outer loop resumes where the inner one left off.
From my admittedly not very good tests, it seems to perform as above. However, I couldn't find anything in the spec guaranteeing that this behavior is constant across interpreters. Is there anywhere that says or implies that I can be sure it will always be like this? Can it cause errors, or perform in some other way? (i.e. do something other than what's described above
N.B. The code equivalent above is taken from my own experience; I don't know if it's actually accurate. That's why I'm asking.
TL;DR: it is safe with CPython (but I could not find any specification of this), although it may not do what you want to do.
First, let's talk about your first assumption, the equivalence.
A for loop actually calls first iter() on the object, then runs next() on its result, until it gets a StopIteration.
Here is the relevant bytecode (a low level form of Python, used by the interpreter itself):
>>> import dis
>>> def f():
... for x in y:
... print(x)
...
>>> dis.dis(f)
2 0 SETUP_LOOP 24 (to 27)
3 LOAD_GLOBAL 0 (y)
6 GET_ITER
>> 7 FOR_ITER 16 (to 26)
10 STORE_FAST 0 (x)
3 13 LOAD_GLOBAL 1 (print)
16 LOAD_FAST 0 (x)
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 JUMP_ABSOLUTE 7
>> 26 POP_BLOCK
>> 27 LOAD_CONST 0 (None)
30 RETURN_VALUE
GET_ITER calls iter(y) (which itself calls y.__iter__()) and pushes its result on the stack (think of it as a bunch of local unnamed variables), then enters the loop at FOR_ITER, which calls next(<iterator>) (which itself calls <iterator>.__next__()), then executes the code inside the loop, and the JUMP_ABSOLUTE makes the execution comes back to FOR_ITER.
Now, for the safety:
Here are the methods of a generator: https://hg.python.org/cpython/file/101404/Objects/genobject.c#l589
As you can see at line 617, the implementation of __iter__() is PyObject_SelfIter, whose implementation you can find here. PyObject_SelfIter simply returns the object (ie. the generator) itself.
So, when you nest the two loops, both iterate on the same iterator.
And, as you said, they are just calling next() on it, so it's safe.
But be cautious: the inner loop will consume items that will not be consumed by the outer loop.
Even if that is what you want to do, it may not be very readable.
If that is not what you want to do, consider itertools.tee(), which buffers the output of an iterator, allowing you to iterate over its output twice (or more). This is only efficient if the tee iterators stay close to each other in the output stream; if one tee iterator will be fully exhausted before the other is used, it's better to just call list on the iterator to materialize a list out of it.
No, it's not safe (as in, we won't get the outcome that we might have expected).
Consider this:
a = (_ for _ in range(20))
for num in a:
print(num)
Of course, we will get 0 to 19 printed.
Now let's add a bit of code:
a = (_ for _ in range(20))
for num in a:
for another_num in a:
pass
print(num)
The only thing that will be printed is 0.
By the time that we get to the second iteration of the outer loop, the generator will already be exhausted by the inner loop.
We can also do this:
a = (_ for _ in range(20))
for num in a:
for another_num in a:
print(another_num)
If it was safe we would expect to get 0 to 19 printed 20 times, but we actually get it printed only once, for the same reason I mentioned above.
It's not really an answer to your question, but I would recommend not doing this because the code isn't readable. It took me a while to see that you were using y twice even though that's the entire point of your question. Don't make a future reader get confused by this. When I see a nested loop, I'm not expecting what you've done and my brain has trouble seeing it.
I would do it like this:
def generator_with_state(y):
state = 0
for x in y:
if isinstance(x, special_thing):
state = 1
continue
elif state == 1 and isinstance(x, signal):
state = 0
yield x, state
for x, state in generator_with_state(y):
if state == 1:
foo(x)
else:
bar(x)

What's the point of the iter() built-in?

With iter(), I can do this:
>>> listWalker = iter ( [23, 47, 'hike'] )
>>> for x in listWalker: print x,
But I could do this anyway:
>>> listWalker = [23, 47, 'hike']
>>> for x in listWalker: print x,
What value does it add?
In addition to using iter to explicitly get an iterator for an object that implements the __iter__ method, there is the lesser-known two-argument form of iter, which makes an iterator which repeatedly calls a function until it returns a given sentinel value.
for line in iter(f.readline, 'EOF'):
print line
The preceding code would call f.read (for, say, an open file handle f) until it reads a line consisting of the string EOF. It's roughly the same as writing
for line in f:
if line == "EOF":
break
print line
Additionally, an iterator may be a distinct object from the object it iterates over. This is true for the list type. That means you can create two iterators, both of which iterate independently over the same object.
itr1 = iter(mylist)
itr2 = iter(mylist)
x = next(itr1) # First item of mylist
y = next(itr1) # Second item of my list
z = next(itr2) # First item of mylist, not the third
File handles, however, act as their own iterator:
>>> f = open('.bashrc')
>>> id(f)
4454569712
>>> id(iter(f))
4454569712
In general, the object returned by iter depends on the __iter__ method implemented by the object's type.
The point of iter is that it allows you to obtain the iterator from an iterable object and use it yourself, either to implement your own variant of the for loop, or to maintain the state of the iteration across multiple loops. A trivial example:
it = iter(['HEADER', 0, 1, 2, 3]) # coming from CSV or such
title = it.next()
for item in it:
# process item
...
A more advanced usage of iter is provided by this grouping idiom:
def in_groups(iterable, n):
"""Yield element from iterables grouped in tuples of size n."""
it = iter(iterable)
iters = [it] * n
return zip(*iters)
When you're doing a for loop on a variable, it implicitly call the __iter__ method of the iterable you passed in fact.
You're always using iter() is some way when you're looping over lists, tuples... and every iterable.
I think this extract of byte-code can convince you:
>>> def a():
... for x in [1,2,3]:
... print x
...
>>> import dis
>>> dis.dis(a)
2 0 SETUP_LOOP 28 (to 31)
3 LOAD_CONST 1 (1)
6 LOAD_CONST 2 (2)
9 LOAD_CONST 3 (3)
12 BUILD_LIST 3
15 GET_ITER # <--- get iter is important here
>> 16 FOR_ITER 11 (to 30)
19 STORE_FAST 0 (x)
3 22 LOAD_FAST 0 (x)
25 PRINT_ITEM
26 PRINT_NEWLINE
27 JUMP_ABSOLUTE 16
>> 30 POP_BLOCK
>> 31 LOAD_CONST 0 (None)
34 RETURN_VALUE
But, iterables allows you also some other things in Python, such as the use of next() to walk into an iterable, or raising a StopIteration exception. It can be useful if you're dealing with different object types and you want to apply a generic algorithm.
From the docs:
iter(o[, sentinel])
[...] Without a second argument, o must be a collection object which supports the
iteration protocol (the __iter__() method), or it must support the
sequence protocol (the __getitem__() method with integer arguments
starting at 0). If it does not support either of those protocols,
TypeError is raised. [...]
So it constructs an iterator from an object.
As you say, this is done automatically in loops and comprehensions but some times you want to get an iterator and handle it directly. Just keep it in the back of your mind until you need it.
When using the second argument:
If the second argument, sentinel, is given, then o must be a callable object.
The iterator created in this case will call o with no arguments for each call
to its next() method; if the value returned is equal to sentinel,
StopIteration will be raised, otherwise the value will be returned.
This is useful for many things but particularly so for legacy style functions like file.read(bufsize) which has to be called repeatedly until it returns "". That can be converted to an iterator with iter(lambda : file.read(bufsize), ""). Nice and clean!

Categories

Resources