I am trying to learn how to use itertools.groupby in Python and I wanted to find the size of each group of characters. At first I tried to see if I could find the length of a single group:
from itertools import groupby
len(list(list( groupby("cccccaaaaatttttsssssss") )[0][1]))
and I would get 0 every time.
I did a little research and found out that other people were doing it this way:
from itertools import groupby
for key,grouper in groupby("cccccaaaaatttttsssssss"):
print key,len(list(grouper))
Which works great. What I am confused about is why does the latter code work, but the former does not? If I wanted to get only the nth group like I was trying to do in my original code, how would I do that?
The reason that your first approach doesn't work is that the the groups get "consumed" when you create that list with
list(groupby("cccccaaaaatttttsssssss"))
To quote from the groupby docs
The returned group is itself an iterator that shares the underlying
iterable with groupby(). Because the source is shared, when the
groupby() object is advanced, the previous group is no longer
visible.
Let's break it down into stages.
from itertools import groupby
a = list(groupby("cccccaaaaatttttsssssss"))
print(a)
b = a[0][1]
print(b)
print('So far, so good')
print(list(b))
print('What?!')
output
[('c', <itertools._grouper object at 0xb715104c>), ('a', <itertools._grouper object at 0xb715108c>), ('t', <itertools._grouper object at 0xb71510cc>), ('s', <itertools._grouper object at 0xb715110c>)]
<itertools._grouper object at 0xb715104c>
So far, so good
[]
What?!
Our itertools._grouper object at 0xb715104c is empty because it shares its contents with the "parent" iterator returned by groupby, and those items are now gone because that first list call iterated over the parent.
It's really no different to what happens if you try to iterate twice over any iterator, eg a simple generator expression.
g = (c for c in 'python')
print(list(g))
print(list(g))
output
['p', 'y', 't', 'h', 'o', 'n']
[]
BTW, here's another way to get the length of a groupby group if you don't actually need its contents; it's a little cheaper (and uses less RAM) than building a list just to find its length.
from itertools import groupby
for k, g in groupby("cccccaaaaatttttsssssss"):
print(k, sum(1 for _ in g))
output
c 5
a 5
t 5
s 7
Related
x = 'aaaabbbccd'
new = list(itertools.groupby(x))
[print(i) for i in new]
for i in new:
print(i)
The result for line 2 is is something like:
('a', <itertools._grouper object at 0x0000014163062EB0>)
('b', <itertools._grouper object at 0x0000014163062FD0>)
('c', <itertools._grouper object at 0x0000014163062F70>)
('d', <itertools._grouper object at 0x0000014162991BB0>)
[None, None, None, None]
Where as the result for the normal for loop is:
('a', <itertools._grouper object at 0x0000014163062EB0>)
('b', <itertools._grouper object at 0x0000014163062FD0>)
('c', <itertools._grouper object at 0x0000014163062F70>)
('d', <itertools._grouper object at 0x0000014162991BB0>)
Why do I get the extra [None, None, None, None] in case of list comprehension?
A list comprehension is used to comprehend (Make) a list. It is useful only when making lists. However, here you are not making a list, so it is not recommended to use list comprehension. You only print the value and not store it as a list. Here, use a for a loop.
The reason you get None is - the list comprehension basically becomes a list of print() functions like [print(...),print(...)....]
So when you call them it becomes like - print(print(...)), which, if you try this code, will return a None along with the output.
So, do not use list comprehension unless you are using it to build a list.
References - This and That
When you write a calculation in the interpreter, the result is printed back. Your list comprehension result was '[None, None, None, None]', because print result value is None, and therefor was printed.
>>> 1+1
2
>>> 2
2
>>> [1,2,3]
[1, 2, 3]
>>> [None for i in new]
[None, None, None, None]
>>>
its reason is on list comprehension code creates a list but you dont put value into list so code puts None but on normal loop code doesn't create a list
Generally speaking, the list comprehensions are more efficient both computationally and in terms of coding space and time than a for-loop. Typically, they are written in a single line of code. read this article
as demoed in the article and I'm quoting here!:
import timeit
def squares(size):
result = []
for number in range(size):
result.append(number*number)
return result
def squares_comprehension(size):
return [number*number for number in range(size)]
print(f""" Timed using for loop: {timeit.timeit("squares(50)", "from __main__ import squares", number = 1_000_000)}""")
print(f""" Timed using for List Comprehension: {timeit.timeit("squares_comprehension(50)", "from __main__ import squares_comprehension", number = 1_000_000)}""")
output:
Timed using for loop: 6.206269975002215
Timed using for List Comprehension: 4.1636438860005
List comprehensions are often not only more readable but also faster than using “for loops.” They can simplify your code, but if you put too much logic inside, they will instead become harder to read and understand.
Even though list comprehensions are popular in Python, they have a specific use case: when you want to perform some operations on a list and return another list. And they have limitations - you can’t break out of a list comprehension or put comments inside. In many cases, “for loops” will be your only choice. read this article
also, faster is not always the case! If iterations are performed over computationally expensive functions, list and for-loop runtime may be almost the same. an awesome read
Consider this:
>>> res = [list(g) for k,g in itertools.groupby('abbba')]
>>> res
[['a'], ['b', 'b', 'b'], ['a']]
and then this:
>>> res = [g for k,g in itertools.groupby('abbba')]
>>> list(res[0])
[]
I'm baffled by this. Why do they return different results?
This is expected behavior. The documentation is pretty clear that the iterator for the grouper is shared with the groupby iterator:
The returned group is itself an iterator that shares the underlying
iterable with groupby(). Because the source is shared, when the
groupby() object is advanced, the previous group is no longer visible.
So, if that data is needed later, it should be stored as a list...
The reason you are getting empty lists as that the iterator is already consumed by the time you are trying to iterate over it.
import itertools
res = [g for k,g in itertools.groupby('abbba')]
next(res[0])
# Raises StopIteration:
I am trying to pass a list of hex char, into a lambda function, reduce to calculate a total decimal value. I am not sure what I am doing wrong but the python interpreter wouldn't recognize list(enumerate(reversed(numList)) as a list of tuples.
numList = ['3', '0', 'e', 'f', 'e', '1']
reduce(lambda sum,(up,x):sum+ int(x,16)*16**up,
enumerate(reversed(numList)))
when I print out
list(enumerate(reversed(numList))
It is a list of tuples.
[(0, '1'), (1, 'e'), (2, 'f'), (3, 'e'), (4, '0'), (5, '3')]
But it spit our error: can only concatenate tuple (not "int") to tuple
UPDATE:
The code is now working with a minor addition ",0" added to the lambda
reduce(lambda sum,(up,x):sum+ int(x,16)*16**up,
list(enumerate(reversed(numList))),0)
I don't understand what that means. Also I am not sure what is the best way to approach this.
that means you make sure, that it starts with 0 instead of the first Argument - in this case (0,'1') - because otherwise the types dont match? – am2 1 min ago
.
the third argument you add is initializer. without it, the sum in first iteration will be (0,'1'). so you were trying to evaluate (0,'1')+int(x,16)*16**up which is invalid. – ymonad 14 mins ago
UPDATE 2:
reduce(lambda sum,(up,x):sum+ int(x,16)*16**up,enumerate(reversed(numList)),0)
is just as good and enumerate() returns iter and list(enumerate...) is redundant.
Marked it as solved.
You don't need to use the generic reduce function when all you really need is to calculate the sum.
This works and is vastly simpler:
sum( int(x,16)*16**up for up,x in enumerate(reversed(numList)) )
Also, I'm going to guess you already know you can do the exact same thing like this:
int(''.join(numList), 16)
I'm trying to find all combinations of A,B repeated 3 times.
Once I've done this I would like to count how many A's there are in a row, by splitting the string and returning the len.max value. However this is going crazy on me. I must have misunderstood the len(max(tmp.split="A")
Can anyone explain what this really does (len returns the length of the string, and max returns the highest integer of that string, based on my split?) I expect it to return the number of A's in a row. "A,B,A" should return 1 even though there are two A's.
Suggestions and clarifications would be sincerely welcome
import itertools
list = list(itertools.product(["A", "B"], repeat=3))
count = 0;
for i in list:
count += 1;
tmp = str(i);
var = len(max(tmp.split("B")))
print(count, i, var)
You can use itertools.groupby to find groups of identical elements in an iterable. groupby generates a sequence of (key, group) tuples, where key is the value of the elements in the group, and group is an iterator of that group (which shares the underlying iterable with groupby. To get the length of the group we need to convert it to a list.
from itertools import product, groupby
for t in product("AB", repeat=3):
a = max([len(list(g)) for k, g in groupby(t) if k == "A"] or [0])
print(t, a)
output
('A', 'A', 'A') 3
('A', 'A', 'B') 2
('A', 'B', 'A') 1
('A', 'B', 'B') 1
('B', 'A', 'A') 2
('B', 'A', 'B') 1
('B', 'B', 'A') 1
('B', 'B', 'B') 0
We need to append or [0] to the list comprehension to cover the situation where no "A"s are found, otherwise max complains that we're trying to find the maximum of an empty sequence.
Update
Padraic Cunningham reminded me that the Python 3 version of max accepts a default arg to handle the situation when you pass it an empty iterable. He also shows another way to calculate the length of an iterable that is a bit nicer since it avoids capturing the iterable into a list, so it's a bit faster and consumes less RAM, which can be handy when working with large iterables. So we can rewrite the above code as
from itertools import product, groupby
for t in product("AB", repeat=3):
a = max((sum(1 for _ in g) for k, g in groupby(t) if k == "A"), default=0)
print(t, a)
I would have expected these two pieces of code to produce the same results
from itertools import groupby
for i in list(groupby('aaaabb')):
print i[0], list(i[1])
for i, j in groupby('aaaabb'):
print i, list(j)
In one I convert the iterator returned by groupby to a list and iterate over that, and in the other I iterate over the returned iterator directly.
The output of this script is
a []
b ['b']
a ['a', 'a', 'a', 'a']
b ['b', 'b']
Why is this the case?
Edit: for reference, the result of groupby('aabbaa') looks like
('a', <itertools._grouper object at 0x10c1324d0>)
('b', <itertools._grouper object at 0x10c132250>)
This is a quirk of the groupby function, presumably for performance.
From the itertools.groupby documentation:
The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list:
groups = []
uniquekeys = []
data = sorted(data, key=keyfunc)
for k, g in groupby(data, keyfunc):
groups.append(list(g)) # Store group iterator as a list
uniquekeys.append(k)
So, you can do this:
for i in [x, list(y) for x, y in groupby('aabbaa')]:
print i[0], i[1]