Python iterator and zip - python

With x = [1,2,3,4], I can get an iterator from i = iter(x).
With this iterator, I can use zip function to create a tuple with two items.
>>> i = iter(x)
>>> zip(i,i)
[(1, 2), (3, 4)]
Even I can use this syntax to get the same results.
>>> zip(*[i] * 2)
[(1, 2), (3, 4)]
How does this work? How an iterator with zip(i,i) and zip(*[i] * 2) work?

An iterator is like a stream of items. You can only look at the items in the stream one at a time and you only ever have access to the first element. To look at something in the stream, you need to remove it from the stream and once you take something from the top of the stream, it's gone from the stream for good.
When you call zip(i, i), zip first looks at the first stream and takes an item out. Then it looks at the second stream (which happens to be the same stream as the first one) and takes an item out. Then it makes a tuple out of those two items and repeats this over and over until there is nothing left in the stream.
Maybe it's easier to see if I were to write the zip function in pure python (with only 2 arguments for simplicity). It would look something like1:
def zip(a, b):
out = []
try:
while True:
item1 = next(a)
item2 = next(b)
out.append((item1, item2))
except StopIteration:
return out
Now imagine the case that you are talking about where a and b are the same object. In that case, we just call next twice on the iterator (i in your example case) which will just take the first two items from i in sequence and pack them into a tuple.
Once we've understood why zip(i, i) behaves the way it does, zip(*([i] * 2)) isn't too hard. Lets read the expression from the inside out...
[i] * 2
That just creates a new list (of length 2) where both of the elements are references to the iterator i. So it's the same thing as zip(*[i, i]) (it's just more convenient to write when you want to repeat something many more than 2 times). * unpacking is a common idiom in python and you can find more information in the python tutorial. The gist of it is that python takes the iterable and "unpacks" it as if each item of the iterable was a separate positional argument to the function. So:
zip(*[i, i])
does the same thing as:
zip(i, i)
And now Bob's our uncle. We've just come full-circle since zip(i, i) is where this discussion started.
1This example code is definitely simplified more than just the afore-mentioned only accepting 2 arguments. For example, zip is probably going to call iter on the input arguments so that it works for any iterable (not just iterators), but this should be enough to get the point across...

Every time you get an item from an iterator, it stays at that spot rather than "rewinding." So zip(i, i) gets the first item from i, then the second item from i, and returns that as a tuple. It continues to do this for each available pair, until the iterator is exhausted.
zip(*[i]*2) creates a list of [i, i] by multiplying i by 2, then unpacks it with the * at the far left, which, in effect, sends two arguments i and i to zip, producing the same result as the first snippet.

Related

double list in return statement. need explanation in python

So I was trying to complete this kata on code wars and I ran across an interesting solution. The kata states:
"Given an array of integers, find the one that appears an odd number of times.
There will always be only one integer that appears an odd number of times."
and one of the solutions for it was:
def find_it(seq):
return [x for x in seq if seq.count(x) % 2][0]
My question is why is there [0] at the end of the statement. I tried playing around with it and putting [1] instead and when testing, it passed some tests but not others with no obvious pattern.
Any explanation will be greatly appreciated.
The first brackets are a list comprehension, the second is indexing the resulting list. It's equivalent to:
def find_it(seq):
thelist = [x for x in seq if seq.count(x) % 2]
return thelist[0]
The code is actually pretty inefficient, because it builds the whole list just to get the first value that passed the test. It could be implemented much more efficiently with next + a generator expression (like a listcomp, but lazy, with the values produced exactly once, and only on demand):
def find_it(seq):
return next(x for x in seq if seq.count(x) % 2)
which would behave the same, with the only difference being that the exception raised if no values passed the test would be IndexError in the original code, and StopIteration in the new code, and it would operate more efficiently by stopping the search the instant a value passed the test.
Really, you should just give up on using the .count method and count all the elements in a single pass, which is truly O(n) (count solutions can't be, because count itself is O(n) and must be called a number of times roughly proportionate to the input size; even if you dedupe it, in the worst case scenario all elements appear twice and you have to call count n / 2 times):
from collections import Counter
def find_it(it):
# Counter(it) counts all items of any iterable, not just sequence,
# in a single pass, and since 3.6, it's insertion order preserving,
# so you can just iterate the items of the result and find the first
# hit cheaply
return next(x for x, cnt in Counter(it).items() if cnt % 2)
That list comprehension yields a sequence of values that occur an odd number of times. The first value of that sequence will occur an odd number of times. Therefore, getting the first value of that sequence (via [0]) gets you a value that occurs an odd number of times.
Happy coding!
That code [x for x in seq if seq.count(x) % 2] return the list which has 1 value appears in input list an odd numbers of times.
So, to make the output as number, not as list, he indicates 0th index, so it returns 0th index of list with one value.
There is a nice another answer here by ShadowRanger, so I won't duplicate it providing partially only another phrasing of the same.
The expression [some_content][0] is not a double list. It is a way to get elements out of the list by using indexing. So the second "list" is a syntax for choosing an element of a list by its index (i.e. the position number in the list which begins in Python with zero and not as sometimes intuitively expected with one. So [0] addresses the first element in the list to the left of [0].
['this', 'is', 'a', 'list'][0] <-- this an index of 'this' in the list
print( ['this', 'is', 'a', 'list'][0] )
will print
this
to the stdout.
The intention of the function you are showing in your question is to return a single value and not a list.
So to get the single value out of the list which is built by the list comprehension the index [0] is used. The index guarantees that the return value result is taken out of the list [result] using [result][0] as
[result][0] == result.
The same function could be also written using a loop as follows:
def find_it(seq):
for x in seq:
if seq.count(x) % 2 != 0:
return x
but using a list comprehension instead of a loop makes it in Python mostly more effective considering speed. That is the reason why it sometimes makes sense to use a list comprehension and then unpack the found value(s) out of the list. It will be in most cases faster than an equivalent loop, but ... not in this special case where it will slow things down as mentioned already by ShadowRanger.
It seems that your tested sequences not always have only one single value which occurs an odd number of times. This will explain why you experience that sometimes the index [1] works where it shouldn't because it was stated that the tested seq will contain one and only one such value.
What you experienced looking at the function in your question is a failed attempt to make it more effective by using a list comprehension instead of a loop. The actual improvement can be achieved but by using a generator expression and another way of counting as shown in the answer by ShadowRanger:
from collections import Counter
def find_it(it):
return next(x for x, cnt in Counter(it).items() if cnt % 2)

Why does functools.lru_cache break this function?

Consider the following function, which returns all the unique permutations of a set of elements:
def get_permutations(elements):
if len(elements) == 0:
yield ()
else:
unique_elements = set(elements)
for first_element in unique_elements:
remaining_elements = list(elements)
remaining_elements.remove(first_element)
for subpermutation in get_permutations(tuple(remaining_elements)):
yield (first_element,) + subpermutation
for permutation in get_permutations((1, 1, 2)):
print(permutation)
This prints
(1, 1, 2)
(1, 2, 1)
(2, 1, 1)
as expected. However, when I add the lru_cache decorator, which memoizes the function:
import functools
#functools.lru_cache(maxsize=None)
def get_permutations(elements):
if len(elements) == 0:
yield ()
else:
unique_elements = set(elements)
for first_element in unique_elements:
remaining_elements = list(elements)
remaining_elements.remove(first_element)
for subpermutation in get_permutations(tuple(remaining_elements)):
yield (first_element,) + subpermutation
for permutation in get_permutations((1, 1, 2)):
print(permutation)
it prints the following:
(1, 1, 2)
Why is it only printing the first permutation?
lru.cache memoizes the return value of your function. Your function returns a generator. Generators have state and can be exhausted (i.e., you come to the end of them and no more items are yielded). Unlike the undecorated version of the function, the LRU cache gives you the exact same generator object each time the function is called with a given set of arguments. It had better, because that's what it's for!
But some of the generators you're caching are used more than once and are partially or completely exhausted when they are used the second and subsequent times. (They may even be "in play" more than once simultaneously.)
To explain the result you're getting, consider what happens when the length of elements is 0 and you yield ()... the first time. The next time this generator is called, it is already at the end and doesn't yield anything at all. Thus your subpermutation loop does nothing and nothing further is yielded from it. As this is the "bottoming out" case in your recursion, it is vital to the program working, and losing it breaks the program's ability to yield the values you expect.
The generator for (1,) is also used twice, and this breaks the third result before it even gets down to ().
To see what's happening, add a print(elements) as the first line in your function (and add some kind of marker to the print call in the main for loop, so you can tell the difference). Then compare the output of the memoized version vs the original.
It seems like you probably want some way to memoize the result of a generator. What you want to do in that case is write it as a function that returns a list with all the items (rather than yielding an item ts a time) and memoize that.

python error: generator object P at 0x02DAC198

I'm using Python 3.4.* and I am trying to execute the following code:
def P(n):
if n == 0:
yield []
return
for p in P(n-1):
p.append(1)
yield p
p.pop()
if p and (len(p) < 2 or p[-2] > p[-1]):
p[-1] += 1
yield p
print(P(5)) # this line doesn't make sense
for i in P(5): # but this line does make sense thanks to furkle
print(i)
but I am getting <generator object P at 0x02DAC198> rather than the output.
Can someone explain where in my code needs to be fixed? I don't think py likes the function name P but I could be wrong.
Edit: furkle clarified <generator object P at 0x02DAC198>.
By the way, I'm currently trying to write my own modified partition function and I was trying to understand this one corresponding to the classical setting.
I think you're misunderstanding the concept of a generator. A generator object is like a list, but you can iterate through its results lazily, without having to wait for the whole list to be constructed. Calling an operation on a function that returns a generator will not perform that operation in sequence on every item yielded by the generator.
If you wanted to print all the output of P(5), you should write:
for i in P(5):
print(i)
If you just want to print a list of the content returned by the generator, that largely seems to defeat the purpose of the generator.
Many things are wrong with this code, and your understanding of how generators work and what they are used for.
First, with respect to your print statement, that is exactly what it should print. Generators are never implicitly expanded, because there is no guarantee that a generator would ever terminate. It's perfectly valid, and sometimes very desirable, to construct a generator that produces an endless sequence. To get what you'd want (which I assume is produce output similar to a list), you'd do:
print(list(P(5))
But that brings me to my second point; Generators yield values in a sequence (in 99% of uses for them, unless you're using it as a coroutine). You are trying to use your generator to construct a list; however, if n is not 0 this will never yield a value and will immediately return. If you goal is to construct a generator that makes a list of 1's of a given length, it should look like this:
def P(n):
while n >= 0:
yield n
n -=1
This will produce a sequence of 1's of length n. To get the list form, you'd do list(P(n)).
I suggest you have another read over the Generator Documentation and get a better feel for them and see if they're really the right tool for the job.
In reading the function, I try to find what the call will produce. Let's start with the full original code:
def P(n):
if n == 0:
yield []
return
for p in P(n-1):
p.append(1)
yield p
p.pop()
if p and (len(p) < 2 or p[-2] > p[-1]):
p[-1] += 1
yield p
print(P(5)) # this line doesn't make sense
Okay, so it calls P(5). Since that's not 0, P recurses, until we reach P(0) which yields an empty list. That's the first time p receives a value. Then P(1) appends 1 into that list, and yields it to P(2) which repeats the process.. and so on. All the same list, originally created by P(0), and eventually yielded out as [1,1,1,1,1] by P(5) - but then the magic happens. Let's call this first list l0.
When you ask the generator for the second item, control returns to P(5) which now removes a value from l0. Depending on a bunch of conditions, it may increment the last value and yield p, which is l0, again. So the first item we received has been changing while we asked for the second. This will eventually terminate, but means there's a difference between these two:
print list(P(5)) # Eventually prints a list of l0 which has been emptied!
for item in P(5):
print item # Prints l0 at each point it was yielded
In [225]: for i in P(5): print i
[1, 1, 1, 1, 1]
[2, 1, 1, 1]
[2, 2, 1]
[3, 1, 1]
[3, 2]
[4, 1]
[5]
In [226]: list(P(5))
Out[226]: [[], [], [], [], [], [], []]
This is why I called it post-modifying; the values it returns keep changing after they've been produced (since they are in fact the same object being manipulated).

Explain the use of yields in this Game of Life implementation

In this PyCon talk, Jack Diederich shows this "simple" implementation of Conway's Game of Life. I am not intimately familiar with either GoL or semi-advanced Python, but the code seems quite easy to grasp, if not for two things:
The use of yield. I have seen the use of yield to create generators before, but eight of them in a row is new... Does it return a list of eight generators, or how does this thing work?
set(itertools.chain(*map(neighbors, board))). The star unpacks the resulting list (?) from applying neighbours to board, and ... my mind just blew.
Could someone try to explain these two parts for a programmer that is used to hacking together some python code using map, filter and reduce, but that is not using Python on a daily basis? :-)
import itertools
def neighbors(point):
x, y = point
yield x + 1, y
yield x - 1, y
yield x, y + 1
yield x, y - 1
yield x + 1, y + 1
yield x + 1, y - 1
yield x - 1, y + 1
yield x - 1, y - 1
def advance(board):
newstate = set()
recalc = board | set(itertools.chain(*map(neighbors, board)))
for point in recalc:
count = sum((neigh in board) for neigh in neighbors(point))
if count == 3 or (count == 2 and point in board):
newstate.add(point)
return newstate
glider = set([(0,0), (1,0), (2, 0), (0,1), (1,2)])
for i in range(1000):
glider = advance(glider)
print glider
Generators operate on two principles: they produce a value each time a yield statement is encountered, and unless it is iterated over, their code is paused.
It doesn't matter how many yield statements are used in a generator, the code is still run in normal python ordering. In this case, there is no loop, just a series of yield statements, so each time the generator is advanced, python executes the next line, which is another yield statement.
What happens with the neighbors generator is this:
Generators always start paused, so calling neighbors(position) returns a generator that hasn't done anything yet.
When it is advanced (next() is called on it), the code is run until the first yield statement. First x, y = point is executed, then x + 1, y is calculated and yielded. The code pauses again.
When advanced again, the code runs until the next yield statement is encountered. It yields x - 1, y.
etc. until the function completes.
The set(itertools.chain(*map(neighbors, board))) line does:
map(neighbors, board) produces an iterator for each and every position in the board sequence. It simply loops over board, calls neighbors on each value, and returns a new sequence of the results. Each neighbors() function returns a generator.
The *parameter syntax expands the parameter sequence into a list of parameters, as if the function was called with each element in parameter as a separate positional parameter instead. param = [1, 2, 3]; foo(*param) would translate to foo(1, 2, 3).
itertools.chain(*map(..)) takes each and every generator produced by the map, and applies that as a series of positional parameters to itertools.chain(). Looping over the output of chain means that each and every generator for each and every board position is iterated over once, in order.
All the generated positions are added to a set, essentially removing duplicates
You could expand the code to:
positions = set()
for board_position in board:
for neighbor in neighbors(board):
positions.add(neighbor)
In python 3, that line could be expressed a little more efficiently still by using itertools.chain.from_iterable() instead, because map() in Python 3 is a generator too; .from_iterable() doesn't force the map() to be expanded and will instead loop over the map() results one by one as needed.
Wow, that's a neat implementation, thanks for posting it !
For the yield, there is nothing to add to Martijn's answer.
As for the star : the map returns a generator or a list (depending on python 2 or 3), and each item of this list is a generator (from neighbors), so we have a list of generators.
chain takes many arguments that are iterables and chains them, meaning it returns a single iterable while iterate over all of them in turn.
Because we have a list of generators, and chain takes many arguments, we use a star to convert the list of generator to arguments. We could have done the same with chain.from_iterable.
it just returns a tuple of all cell's neighbours. If you do understand what generators do, it is pretty clear that using generators is a good practice when working with big amount of data. you do not need to store all this in memory, you calculate it only when you need it.

parsing python flat and nested lists/tuples

I'm trying to parse a tuple of the form:
a=(1,2)
or
b=((1,2), (3,4)...)
where for a's case the code would be:
x, y = a
and b would be:
for element in b:
x, y = element
is there an fast and clean way to accept both forms? This is in a MIDI receive callback
(x is a pointer to a function to run, and y is intensity data to be passed to a light).
# If your input is in in_seq...
if hasattr(in_seq[0], "__iter__"):
# b case
else:
# a case
This basically checks to see if the first element of the input sequence is iterable. If it is, then it's your second case (since a tuple is iterable), if it's not, then it's your first case.
If you know for sure that the inputs will be tuples, then you could use this instead:
if isinstance(in_seq[0], tuple):
# b case
else:
# a case
Depending on what you want to do, your handling for the 'a' case could be as simple as bundling the single tuple inside a larger tuple and then calling the same code on it as the 'b' case, e.g...
b_case = (a_case,)
Edit: as pointed out in the comments, a better version might be...
from collections import Iterable
if isinstance(in_seq[0], Iterable):
# ...
The right way to do that would be:
a = ((1,2),) # note the difference
b = ((1,2), (3,4), ...)
for pointer, intensity in a:
pass # here you do what you want

Categories

Resources