python groupby and list interaction - python

If we run the following code,
from itertools import groupby
s = '1223'
r = groupby(s)
x = list(r)
a = [list(g) for k, g in r]
print(a)
b =[list(g) for k, g in groupby(s)]
print(b)
then surprisingly the two output lines are DIFFERENT:
[]
[['1'], ['2', '2'], ['3']]
But if we remove the line "x=list(r)", then the two lines are the same, as expected. I don't understand why the list() function will change the groupby result.

The result of groupby, as with many objects in the itertools library, is an iterable that can only be iterated over once. This is to allow lazy evaluation. Therefore, when you call something like list(r), r is now an empty iterable.
When you iterate over the empty iterable, of course the resulting list is empty. In your second case, you don't consume the iterable before you iterate over it. Thus, the iterable is not empty.

Related

Unable to retrieve itertools iterators outside list comprehensions

Consider this:
>>> res = [list(g) for k,g in itertools.groupby('abbba')]
>>> res
[['a'], ['b', 'b', 'b'], ['a']]
and then this:
>>> res = [g for k,g in itertools.groupby('abbba')]
>>> list(res[0])
[]
I'm baffled by this. Why do they return different results?
This is expected behavior. The documentation is pretty clear that the iterator for the grouper is shared with the groupby iterator:
The returned group is itself an iterator that shares the underlying
iterable with groupby(). Because the source is shared, when the
groupby() object is advanced, the previous group is no longer visible.
So, if that data is needed later, it should be stored as a list...
The reason you are getting empty lists as that the iterator is already consumed by the time you are trying to iterate over it.
import itertools
res = [g for k,g in itertools.groupby('abbba')]
next(res[0])
# Raises StopIteration:

itertools groupby object not outputting correctly

I was trying to use itertools.groupby to help me group a list of integers by positive or negative property, for example:
input
[1,2,3, -1,-2,-3, 1,2,3, -1,-2,-3]
will return
[[1,2,3],[-1,-2,-3],[1,2,3],[-1,-2,-3]]
However if I:
import itertools
nums = [1,2,3, -1,-2,-3, 1,2,3, -1,-2,-3]
group_list = list(itertools.groupby(nums, key=lambda x: x>=0))
print(group_list)
for k, v in group_list:
print(list(v))
>>>
[]
[-3]
[]
[]
But if I don't list() the groupby object, it will work fine:
nums = [1,2,3, -1,-2,-3, 1,2,3, -1,-2,-3]
group_list = itertools.groupby(nums, key=lambda x: x>=0)
for k, v in group_list:
print(list(v))
>>>
[1, 2, 3]
[-1, -2, -3]
[1, 2, 3]
[-1, -2, -3]
What I don't understand is, a groupby object is a iterator composed by a pair of key and _grouper object, a call of list() of a groupby object should not consume the _grouper object?
And even if it did consume, how did I get [-3] from the second element?
Per the docs, it is explicitly noted that advancing the groupby object renders the previous group unusable (in practice, empty):
The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list.
Basically, instead of list-ifying directly with the list constructor, you'd need a listcomp that converts from group iterators to lists before advancing the groupby object, replacing:
group_list = list(itertools.groupby(nums, key=lambda x: x>=0))
with:
group_list = [(k, list(g)) for k, g in itertools.groupby(nums, key=lambda x: x>=0)]
The design of most itertools module types is intended to avoid storing data implicitly, because they're intended to be used with potentially huge inputs. If all the groupers stored copies of all the data from the input (and the groupby object had to be sure to retroactively populate them), it would get ugly, and potentially blow memory by accident. By forcing you to make storing the values explicit, you don't accidentally store unbounded amounts of data unintentionally, per the Zen of Python:
Explicit is better than implicit.

python: flat list of dict values

I have a list of dicts like so:
a = [ {'list':[1,2,3]}, {'list':[1,4,5]} ]
Am trying to get a flat set of the values in the list key like {1,2,3,4,5}. What's the quickest way?
You can write a loop like:
result = set()
for row in a:
result.update(row['list'])
which I think will work reasonably fast.
Or you can simply use set comprehension and that will result in the following one-liner:
result = {x for row in a for x in row['list']}
In case not all elements contain a 'list' key, you can use .get(..) with an empty tuple (this will reduce construction time):
result = {x for row in a for x in row.get('list',())}
It is not clear what your definition of "quickest" is, but whether it is speed or number of lines I would use a combination of itertools and a generator.
>>> import itertools
>>> a = [ {'list':[1,2,3]}, {'list':[1,4,5]} ]
>>> b = set(itertools.chain.from_iterable(x['list'] for x in a if 'list' in x))
Note that I have added a guard against any elements that may not contain a 'list' key; you can omit that if you know this will always be true.
flat list can be made through reduce easily.
All you need to use initializer - third argument in the reduce function.
reduce(
lambda _set, _dict, key='list': _set.update(
_dict.get(key) or set()) or _set,
a,
set())
Above code works for both python2 and python3, but you need to import reduce module as from functools import reduce. Refer below link for details.
for python2
for python3

Incorporate string with list entries - alternating

So SO, i am trying to "merge" a string (a) and a list of strings (b):
a = '1234'
b = ['+', '-', '']
to get the desired output (c):
c = '1+2-34'
The characters in the desired output string alternate in terms of origin between string and list. Also, the list will always contain one element less than characters in the string. I was wondering what the fastest way to do this is.
what i have so far is the following:
c = a[0]
for i in range(len(b)):
c += b[i] + a[1:][i]
print(c) # prints -> 1+2-34
But i kind of feel like there is a better way to do this..
You can use itertools.zip_longest to zip the two sequences, then keep iterating even after the shorter sequence ran out of characters. If you run out of characters, you'll start getting None back, so just consume the rest of the numerical characters.
>>> from itertools import chain
>>> from itertools import zip_longest
>>> ''.join(i+j if j else i for i,j in zip_longest(a, b))
'1+2-34'
As #deceze suggested in the comments, you can also pass a fillvalue argument to zip_longest which will insert empty strings. I'd suggest his method since it's a bit more readable.
>>> ''.join(i+j for i,j in zip_longest(a, b, fillvalue=''))
'1+2-34'
A further optimization suggested by #ShadowRanger is to remove the temporary string concatenations (i+j) and replace those with an itertools.chain.from_iterable call instead
>>> ''.join(chain.from_iterable(zip_longest(a, b, fillvalue='')))
'1+2-34'

defaultdict(list) concatenating all the values into one list

What I am trying to do:
Write a method to sort an array of strings so that all the anagrms are
next to each other.
I have the following code:
from collections import defaultdict
res = defaultdict(list)
L = ['foo', 'poo', 'k', 'fo', 'ofo', 'oof']
for w in L:
res["".join(sorted(w))].append(w)
But now I want to take all the values in res and combine them into one list.
I tried this:
output =[]
for items in res.values():
output.append(i for i in items)
But that gave me:
>>> output
[<generator object <genexpr> at 0x102a4d1e0>, <generator object <genexpr> at 0x102a95870>, <generator object <genexpr> at 0x102a958c0>, <generator object <genexpr> at 0x102a95910>]
How can I display the items in one list properly?
Desired:
['foo','ofo', 'oof','poo', 'k', 'fo',]
(all the anagrams are together, order doesn't matter as long as they are adjacent in the list.)
When you do -
output.append(i for i in items)
You are actually appending the generator expression - i for i in items - into output list, as you can also see from the repr result of your output list. This does not automatically evaluate the generator expression and add the results. What you should have done was to use output.extend(). Example -
>>> output = []
>>> for items in res.values():
... output.extend(items)
...
>>> output
['fo', 'k', 'foo', 'ofo', 'oof', 'poo']
This would evaluate the expression and append each element from the iterable (generator expression) to the list as a separate element.
But If you want to convert a single list from all the list values of res dictionary, An much easier way would be to use itertools.chain.from_iterable. Example -
>>> from itertools import chain
>>> output = list(chain.from_iterable(res.values()))
>>> output
['fo', 'k', 'foo', 'ofo', 'oof', 'poo']
EDITED
I believe that replacing
output.append(i for i in items)
with
output.extend(items)
will do what you expect.

Categories

Resources