Python: If statement inside list comprehension on a generator - python

Python 3.6
Consider this code:
from itertools import groupby
result = [list(group) for key, group in groupby(range(5,15), key= lambda x: str(x)[0])]
print(result)
outputs:
[[5], [6], [7], [8], [9], [10, 11, 12, 13, 14]]
Can I filter out the lists with len < 2 inside the list comprehension?
Update:
Due to the two excellent answers given. I felt it might be worth a bench mark
import timeit
t1 = timeit.timeit('''
from itertools import groupby
result = [group_list for group_list in (list(group) for key, group in groupby(range(5,15), key= lambda x: str(x)[0])) if len(group_list) >= 2]
''', number=1000000)
print(t1)
t2 = timeit.timeit('''
from itertools import groupby
list(filter(lambda group: len(group) >= 2, map(lambda key_group: list(key_group[1]),groupby(range(5,15), key=lambda x: str(x)[0]))))
''', number=1000000)
print(t2)
Results:
8.74591397369441
9.647086477861325
Looks like the list comprehension has an edge.

Yes
A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it. For example, this listcomp combines the elements of two lists if they are not equal:
>>> [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]
and it’s equivalent to:
>>> combs = []
>>> for x in [1,2,3]:
... for y in [3,1,4]:
... if x != y:
... combs.append((x, y))
...
>>> combs
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]
Note how the order of the for and if statements is the same in both these snippets.
Since calling list(group) twice doesn't work in your particular example (as it consumes the generator yielded by groupby), you can introduce a temporary variable in your list comprehension by using a generator expression:
>>> [group_list for group_list in (list(group) for key, group in groupby(range(5,15), key= lambda x: str(x)[0])) if len(group_list) >= 2]
[[10, 11, 12, 13, 14]]
Alternately, using filter, map, and list:
>>> list(\
... filter(lambda group: len(group) >= 2,\
... map(lambda key_group: list(key_group[1]),\
... groupby(range(5,15), key=lambda x: str(x)[0])\
... )\
... )\
... )
[[10, 11, 12, 13, 14]]

Related

Efficiently slice iterable by another iterable

So I'm trying to slice an iterable to the same length of another iterable. For context I was answering this question to get the sum of values grouped by key essentially and I think I can do this more efficiently
from itertools import groupby
x = [(5, 65), (2, 12), (5, 18), (3, 35), (4, 49), (4, 10), (1, 27), (1, 1), (4, 71), (2, 41), (2, 17), (1, 25), (2, 62), (5, 65), (4, 5), (1, 51), (1, 13), (5, 92), (2, 62), (5, 81)]
keys, values = map(iter, zip(*sorted(x)))
print([sum(next(values) for _ in g) for _, g in groupby(keys)])
#[117, 194, 35, 135, 321]
I believe the next(values) for _ in g can be done functionally or more concisely. Essentially in pseudocode:
#from this
sum(next(values) for _ in g)
#to this
sum(values[length of g])
I know the above won't work but all I can think of is using zip because it only iterates to the end of the smallest iterable. Although, when I tried that it's consuming more than the group is long. (Also it isn't very readable) See below what I tried:
print([sum(next(zip(*zip(values, g)))) for _, g in groupby(keys)])
#[117, 217, 10, 219, 92]
I've tried searching for this with no results unless I'm not searching the right thing.
I've thought of other solutions such as using islice but I would need the length of g and thats another messy solution. Another being I could just use operator.itemgetter but if I could figure out how to do what I am doing more concisely then maybe I can use it in other solutions too.
You don't have to separate the keys and values at all. It can be handled by the key functions:
from operator import itemgetter as ig
[sum(map(ig(1), g)) for _, g in groupby(sorted(x), key=ig(0))]
You could use ilen from more-itertools and then islice:
[sum(islice(values, ilen(g))) for _, g in groupby(keys)]
Or with zip, but the group first:
[sum(x for _, x in zip(g, values)) for _, g in groupby(keys)]
Don't know how "efficient" these are for you, as you only showed very small data and I'm not sure how you'd generalize it (in particular, how long your groups are).
Maybe what you are asking can be accomplished by the following class. The class groupby_other takes an iterable it1 and an iterable of iterables it2: it2_0, it2_1, ... and yields groups of elements from it1 with lengths equal to it2_0, it2_1 and so on, until one of it1 or it2 is exhausted.
class groupby_other:
"""
Make an iterator that returns groups from iterable1. Each group has the same
number of elements as each element in iterable2.
>>> y = [1,2,3,4,5]
>>> z = [[1,1], [4,4,4]]
>>> [list(x) for x in groupby_other(y,z)]
[[1, 2], [3, 4, 5]]
Grouping is terminated when one of the iterables is exhausted
>>> y = [1,2,3,4,5]
>>> z = [[1,1], [4,4,4], [4,5]]
>>> [list(x) for x in groupby_other(y,z)]
[[1, 2], [3, 4, 5]]
>>> z = [[1,1], [4,4,4]]
>>> [list(x) for x in groupby_other(y,z)]
[[1, 2], [3, 4, 5]]
>>> z = [[1,1], [4,4]]
>>> [list(x) for x in groupby_other(y,z)]
[[1, 2], [3, 4]]
>>> [list(x) for x in groupby_other([],z)]
[]
>>> [list(x) for x in groupby_other(y,[])]
[]
"""
def __init__(self, iterable1, iterable2, key=None):
self.it1 = iter(iterable1)
self.it2 = iter(iterable2)
self.current_it = None
self.done = False
def __iter__(self):
return self
def __next__(self):
if self.done:
raise StopIteration
current_group = iter(next(self.it2)) # Exit on StopIteration
current_item = next(self.it1)
return self._grouper(current_item, current_group)
def _grouper(self, current_item, current_group):
try:
next(current_group)
yield current_item
for _ in current_group:
yield next(self.it1)
except StopIteration:
self.done=True
return
Then you can do:
>>> [sum(x) for x in groupby_other(values, (g for _, g in groupby(keys)))]
[117, 194, 35, 135, 321]

combine 2 lists of list to list of tuples

I'm trying to combine to different nested list into a list of tuples (x,y)
where x comes from the first nested list and y from the second nested list.
nested_list1 = [[1, 2, 3],[3],[0, 3],[1]]
nested_list2 = [[.0833, .0833, .0833], [.2], [.175, .175], [.2]]
when you combine them it should be:
result = [(1,.0833), (2,.0833), (3,.0833), (3,.2), (0,.175), (3,.175), (1,.2)]
my approach is that i need to iterate through the list of lists and join them 1 at a time.
I know to iterate through 1 nested list like so:
for list in nested_list1:
for number in list:
print(number)
but I can't iterate through 2 nested list at the same time.
for list, list in zip(nested_list1, nested_list2):
for number, prob in zip(list,list):
print(tuple(number, prob)) #will not work
any ideas?
You could do a double zip through lists:
lst1 = [[1, 2, 3],[3],[0, 3],[1]]
lst2 = [[.0833, .0833, .0833], [.2], [.175, .175], [.2]]
print([(u, v) for x, y in zip(lst1, lst2) for u, v in zip(x, y)])
Or use itertools.chain.from_iterable to flatten list and zip:
from itertools import chain
lst1 = [[1, 2, 3],[3],[0, 3],[1]]
lst2 = [[.0833, .0833, .0833], [.2], [.175, .175], [.2]]
print(list(zip(chain.from_iterable(lst1), chain.from_iterable(lst2))))
Use itertools.chain:
>>> nested_list1 = [[1, 2, 3],[3],[0, 3],[1]]
>>> nested_list2 = [[.0833, .0833, .0833], [.2], [.175, .175], [.2]]
>>> import itertools
>>> res = list(zip(itertools.chain.from_iterable(nested_list1), itertools.chain.from_iterable(nested_list2)))
>>> res
[(1, 0.0833), (2, 0.0833), (3, 0.0833), (3, 0.2), (0, 0.175), (3, 0.175), (1, 0.2)]
Flatten your lists and then pass to zip():
list1 = [item for sublist in nested_list1 for item in sublist]
list2 = [item for sublist in nested_list2 for item in sublist]
final = list(zip(list1, list2))
Yields:
[(1, 0.0833), (2, 0.0833), (3, 0.0833), (3, 0.2), (0, 0.175), (3, 0.175), (1, 0.2)]
There are 2 errors in your code:
You shadow built-in list twice and in a way that you can't differentiate between two variables. Don't do this.
You use tuple(x, y) to create a tuple form 2 variables. This is incorrect, as tuple takes one argument only. To construct a tuple of two variables just use syntax (x, y).
So this will work:
for L1, L2 in zip(nested_list1, nested_list2):
for number, prob in zip(L1, L2):
print((number, prob))
More idiomatic would be to flatten your nested lists; for example, via itertools.chain:
from itertools import chain
res = list(zip(chain.from_iterable(nested_list1),
chain.from_iterable(nested_list2)))
[(1, 0.0833), (2, 0.0833), (3, 0.0833), (3, 0.2), (0, 0.175), (3, 0.175), (1, 0.2)]
This one liner will achieve what you want.
reduce(lambda x, y: x+y, [[(i, j) for i, j in zip(x,y)] for x, y in zip(nested_list1, nested_list2)])
One way is to convert both the nested lists into full lists and then use zip. Sample code below:
>>> nested_list1 = [[1, 2, 3],[3],[0, 3],[1]]
>>> nested_list2 = [[.0833, .0833, .0833], [.2], [.175, .175], [.2]]
>>> new_list1 = [x for val in nested_list1 for x in val]
>>> new_list2 = [x for val in nested_list2 for x in val]
>>> print new_list1
[1, 2, 3, 3, 0, 3, 1]
>>> print new_list2
[0.0833, 0.0833, 0.0833, 0.2, 0.175, 0.175, 0.2]
>>> new_val = zip(new_list1, new_list2)
>>> print new_val
[(1, 0.0833), (2, 0.0833), (3, 0.0833), (3, 0.2), (0, 0.175), (3, 0.175), (1, 0.2)]
result = []
[result.extend(list(zip(x, y))) for x in nested_list1 for y in nested_list2]
print(result)
Use two times zip and flatten the list
from functools import reduce
reduce(lambda x,y: x+y,[(zip(i,j)) for i,j in zip(nested_list1,nested_list2)])
You can flatten using chain as well
from itertools import chain
list(chain(*[(zip(i,j)) for i,j in zip(nested_list1,nested_list2)]))
output
[(1, 0.0833), (2, 0.0833), (3, 0.0833), (3, 0.2), (0, 0.175), (3, 0.175), (1, 0.2)]

How can I remove duplicate tuples from a list based on index value of tuple while maintaining the order of tuple? [duplicate]

This question already has answers here:
How do I remove duplicates from a list, while preserving order?
(30 answers)
Closed 4 years ago.
I want to remove those tuples which had same values at index 0 except the first occurance. I looked at other similar questions but did not get a particular answer I am looking for. Can somebody please help me?
Below is what I tried.
from itertools import groupby
import random
Newlist = []
abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)]
Newlist = [random.choice(tuple(g)) for _, g in groupby(abc, key=lambda x: x[0])]
print Newlist
my expected output : [(1,2,3), (2,3,4), (0,2,0), (5,4,3)]
A simple way is to loop over the list and keep track of which elements you've already found:
abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)]
found = set()
NewList = []
for a in abc:
if a[0] not in found:
NewList.append(a)
found.add(a[0])
print(NewList)
#[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
found is a set. At each iteration we check if the first element in the tuple is already in found. If not, we append the whole tuple to NewList. At the end of each iteration we add the first element of the tuple to found.
A better alternative using OrderedDict:
from collections import OrderedDict
abc = [(1,2,3), (2,3,4), (1,0,3), (0,2,0), (2,4,5),(5,4,3), (0,4,1)]
d = OrderedDict()
for t in abc:
d.setdefault(t[0], t)
abc_unique = list(d.values())
print(abc_unique)
Output:
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
Simple although not very efficient:
abc = [(1,2,3), (2,3,4), (1,0,3), (0,2,0), (2,4,5),(5,4,3), (0,4,1)]
abc_unique = [t for i, t in enumerate(abc) if not any(t[0] == p[0] for p in abc[:i])]
print(abc_unique)
Output:
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
The itertools recipes (Python 2: itertools recipes, but basically no difference in this case) contains a recipe for this, which is a bit more general than the implementation by #pault. It also uses a set:
Python 2:
from itertools import ifilterfalse as filterfalse
Python 3:
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Use it with:
abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)]
Newlist = list(unique_everseen(abc, key=lambda x: x[0]))
print Newlist
# [(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
This should be slightly faster because of the caching of the set.add method (only really relevant if your abc is large) and should also be more general because it makes the key function a parameter.
Apart from that, the same limitation I already mentioned in a comment applies: this only works if the first element of the tuple is actually hashable (which numbers, like in the given example, are, of course).
#PatrickHaugh claims:
but the question is explicitly about maintaining the order of the
tuples. I don't think there's a solution using groupby
I never miss an opportunity to (ab)use groupby(). Here's my solution sans sorting (once or twice):
from itertools import groupby, chain
abc = [(1, 2, 3), (2, 3, 4), (1, 0, 3), (0, 2, 0), (2, 4, 5), (5, 4, 3), (0, 4, 1)]
Newlist = list((lambda s: chain.from_iterable(g for f, g in groupby(abc, lambda k: s.get(k[0]) != s.setdefault(k[0], True)) if f))({}))
print(Newlist)
OUTPUT
% python3 test.py
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
%
To use groupby correctly, the sequence must be sorted:
>>> [next(g) for k,g in groupby(sorted(abc, key=lambda x:x[0]), key=lambda x:x[0])]
[(0, 2, 0), (1, 2, 3), (2, 3, 4), (5, 4, 3)]
or if you need that very exact order of your example (i.e. maintaining original order):
>>> [t[2:] for t in sorted([next(g) for k,g in groupby(sorted([(t[0], i)+t for i,t in enumerate(abc)]), lambda x:x[0])], key=lambda x:x[1])]
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
the trick here is to add one field for keeping the original order to restore after the groupby() step.
Edit: even a bit shorter:
>>> [t[1:] for t in sorted([next(g)[1:] for k,g in groupby(sorted([(t[0], i)+t for i,t in enumerate(abc)]), lambda x:x[0])])]
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]

How to sum up a list of tuples having the same first element?

I have a list of tuples, for example:
(1,3)
(1,2)
(1,7)
(2,4)
(2,10)
(3,8)
I need to be able to sum up the second values based upon what the first value is, getting this result for the example list:
(1,12)
(2,14)
(3,8)
This question is very similar in nature to this one, however, my solution may not use any imports or for loops, and all answers to that question use one or the other. It's supposed to rely on list and set comprehension.
my_set = {x[0] for x in my_tuples}
my_sums = [(i,sum(x[1] for x in my_tuples if x[0] == i)) for i in my_set]
I guess. ... those requirements are not very good for this problem (this solution will be slow ...)
If you can use dictionaries the following should work
x = [(1,3), (1, 2), (1, 7), (2, 4), (2, 10), (3, 8)]
d = {}
[d.__setitem__(first, d.get(first, 0) + second) for first, second in x]
print(list(d.items()))
If you are using python2, you can use map to behave like izip_longest and get the index of where the groups end:
def sums(l):
st = set()
inds = [st.add(a) or ind for ind, (a, b) in enumerate(l) if a not in st]
return [(l[i][0], sum(sub[1] for sub in l[i:j])) for i, j in map(None, inds, inds[1:])]
Output:
In [10]: print(sums(l))
[(1, 12), (2, 14), (3, 8)]
for python 2 or 3 you can just use enumerate and check the index:
def sums(l):
st = set()
inds = [st.add(a) or ind for ind, (a, b) in enumerate(l) if a not in st]
return [(l[j][0], sum(sub[1] for sub in (l[j:inds[i]] if i < len(inds) else l[inds[-1]:])))
for i, j in enumerate(inds, 1)]
same output:
In [12]: print(sums(l))
[(1, 12), (2, 14), (3, 8)]
This is pretty straightforward, but it's definitely O(n**2), so keep your input data small:
data = (
(1,3),
(1,2),
(1,7),
(2,4),
(2,10),
(3,8),
)
d = { k:v for k,v in data }
d2 = [(t1,sum( v for k,v in data if k == t1 )) for t1 in d.keys() ]
print(d2)
Output is
[(1, 12), (2, 14), (3, 8)]
I'd use a defaultdict
from collections import defaultdict
x = [(1,3), (1, 2), (1, 7), (2, 4), (2, 10), (3, 8)]
d = defaultdict(int)
for k, v in x:
d[k] += v
print(list(d.items()))
if you need a one-liner (lambda inline function) using itertools
from itertools import groupby
myfunc = lambda tu : [(k, sum(v2[1] for v2 in v)) for k, v in groupby(tu, lambda x: x[0])])
print(myfunc(x))

compare to lists and return the different indices and elements in python

I want to compare to lists and return the different indices and elements.
So I write the following code:
l1 = [1,1,1,1,1]
l2 = [1,2,1,1,3]
ind = []
diff = []
for i in range(len(l1)):
if l1[i] != l2[i]:
ind.append(i)
diff.append([l1[i], l2[i]])
print ind
print diff
# output:
# [1, 4]
# [[1, 2], [1, 3]]
The code works, but are there any better ways to do that?
Update the Question:
I want to ask for another solutions, for example with the iterator, or ternary expression like [a,b](expression) (Not the easiest way like what I did. I want to exclude it.) Thanks very much for the patient! :)
You could use a list comprehension to output all the information in a single list.
>>> [[idx, (i,j)] for idx, (i,j) in enumerate(zip(l1, l2)) if i != j]
[[1, (1, 2)], [4, (1, 3)]]
This will produce a list where each element is: [index, (first value, second value)] so all the information regarding a single difference is together.
An alternative way is the following
>>> l1 = [1,1,1,1,1]
>>> l2 = [1,2,1,1,3]
>>> z = zip(l1,l2)
>>> ind = [i for i, x in enumerate(z) if x[0] != x[1]]
>>> ind
[1, 4]
>>> diff = [z[i] for i in ind]
>>> diff
[(1, 2), (1, 3)]
In Python3 you have to add a call to list around zip.
You can try functional style:
res = filter(lambda (idx, x): x[0] != x[1], enumerate(zip(l1, l2)))
# [(1, (1, 2)), (4, (1, 3))]
to unzip res you can use:
zip(*res)
# [(1, 4), ((1, 2), (1, 3))]

Categories

Resources