Splitting a list of tuples by 2nd element - python - python

How can i split a list of tuples by the 2nd element?
I can do it with 2 list comprehension:
tup = [('x',1),('y',2),('z',1)]
ones = [i for i in tup if i[1] == 1]
twos = [i for i in tup if i[1] == 2]
but is there a way to avoid looping through the list twice? like this?
ones, twos = [], []
for i in tup:
if i[1] == 1:
ones.append(i)
if i[1] == 2:
twos.append(i)
any other way?

Using a collections.defaultdict() object:
from collections import defaultdict
numbered = defaultdict(list)
for i in tup:
numbered[i[1]].append(i)
Now numbered[1] contains all ones, numbered[2] a list of all twos. This solution extends to more values of i[1] naturally without having to define any additional lists or if statements.
Demo:
>>> from collections import defaultdict
>>> tup = [('x',1),('y',2),('z',1)]
>>> numbered = defaultdict(list)
>>> for i in tup:
... numbered[i[1]].append(i)
...
>>> numbered
defaultdict(<type 'list'>, {1: [('x', 1), ('z', 1)], 2: [('y', 2)]})
>>> numbered[1]
[('x', 1), ('z', 1)]
>>> numbered[2]
[('y', 2)]
A defaultdict is just a dict subclass with additional behaviour; you can do without it too with a little more complexity and a slight loss in speed:
numbered = {}
for i in tup:
numbered.setdefault(i[1], []).append(i)

Related

Matching list elements with element of tuple in list of tuples

I have a list containing strings:
lst = ['a', 'a', 'b']
where each string is, in fact, a category of a corpus, and I need a list of integers that corresponds to the index of that category.
For this purpose, I built a list of tuples where I have each (unique) category and its index, f.ex:
catlist = [(0, 'a'), (1, 'b')]
I now need to iterate over the first list of strings, and if the element matches any of the second elements of the tuple, return the tuple's first element to an array, like this:
[0, 0, 1]
for now I have
catindexes = []
for item in lst:
for i in catlist:
if cat == catlist[i][i]:
catindexes.append(i)
but this clearly doesn't work and I'm failing to get to the solution.
Any tips would be appreciated.
You were close, after iterating the inner loop, you should check whether the item from the outer loop is actually equal to the tup[1] (each tup represent (0, 'a') or (1, 'b') for example).
if they equal, just append the first element in tup (tup[0]) to the result list.
lst = ['a', 'a', 'b']
catlist = [(0, 'a'), (1, 'b')]
catindexes = []
for item in lst:
for tup in catlist:
if item == tup[1]:
catindexes.append(tup[0])
print (catindexes)
You also can use list comprehension:
catindexes = [tup[0] for item in lst for tup in catlist if tup[1] == item]
>>> lst = ['a', 'a', 'b']
>>> catlist = [(0, 'a'), (1, 'b')]
>>> catindexes = []
>>> for item in lst:
... for i in catlist:
... if i[1] == item:
... catindexes.append(i[0])
...
>>> catindexes
[0, 0, 1]
During the iteration, i is a direct reference to an element of catlist, not its index. I'm not using i to extract an element from lst, the for ... in ... already takes care of that. As i is a direct reference to a tuple, I can simply extract the relevant fields for matching and appending without the need to mess with the indexing of lst.
I would recommend using a dictionary for your catlist instead. I think it more naturally fits what you are trying to do:
lst = ['a', 'a', 'b']
catdict = {'a': 0, 'b': 1}
res = [catdict[k] for k in lst] # res = [0, 0, 1]
Condition defines in if block is not correct.
Try this..
lst = ['a', 'a', 'b']
catlist = [(0, 'a'), (1, 'b')]
catindexes = []
for item in lst:
for i in catlist:
if i[1]==item:
catindexes.append(i[0]);
print catindexes
You can create a dictionary (we call it d) from catlist and reverse it. Now, for each element i of lst, what you're looking for is d[i]:
d = {v: k for k, v in catlist}
res = [d[i] for i in lst]
Output:
>>> lst = ['a', 'a', 'b']
>>> d = {v: k for k, v in catlist}
>>> d
{'a': 0, 'b': 1}
>>>
>>> res = [d[i] for i in lst]
>>> res
[0, 0, 1]
An efficient way for big lists :
step 1 : build the good dictionary.
d=dict((v,k) for (k,v) in catlist)
step 2 : use it.
[d[k] for k in lst]
This way the execution time will grow like len(lst) + len(catlist) instead of
len(lst) x len(catlist).

Python: Possible combinations for items in a list of lists

Suppose I have a nested list that looks like this:
nested_list = [[a,b,1],[c,d,3],[a,f,8],[a,c,5]]
Now I want to get all possible combinations between the second items of the lists, but only if the first items in both lists are equal. So what should be printed is the following:
Combinations for 'a' as first item in the lists: [b,f], [b,c], [f,c]
I came up with this:
comprehension = [x[0] for x in nested_list if x[1] in nested_list]
But this does not work (ofcourse). I'm not sure how to loop over all lists in order to find the combinations.
You can use collectoins.defaultdict and combinations:
In [2]: nested_list = [['a','b',1],['c','d',3],['a','f',8],['a','c',5]]
In [3]: from collections import defaultdict
In [4]: from itertools import combinations
In [5]: d = defaultdict(list)
In [6]: for i, j, _ in nested_list:
...: d[i].append(j)
...:
In [7]: {k: list(combinations(v, 2)) for k, v in d.items()}
Out[7]: {'a': [('b', 'f'), ('b', 'c'), ('f', 'c')], 'c': []}

Python: map a list of tuples onto the keys of a dictionary

Given the following list of tuples:
l=[((459301.5857207412, 443923.4365563169),
(458772.4179957388, 446370.8372844439))]
And the following dictionary:
mydict={0: (459301.5857207412, 443923.4365563169),
25: (458772.4179957388, 446370.8372844439)}
How could I create a new list where the tuple contains the key in mydict associated with the values of the tuple itself?
The result would be, given the two examples above:
mapped=[(0,25)]
If
l=[((459301.5857207412, 443923.4365563169),
(458772.4179957388, 446370.8372844439))]
mydict={0: (459301.5857207412, 443923.4365563169),
25: (458772.4179957388, 446370.8372844439)}
I suppose I could generalize your simple case with this oneliner:
[tuple(k for k,v in mydict.items() if v in sl) for sl in l]
result:
[(0, 25)]
Note: for better performance, it would be better to pre-process l to create sets inside like this so lookup with in is faster (tuples are immutable/hashable, so let's take advantage of it):
l = [set(x) for x in l]
In most simple case it can be achieved using a regular for in loop:
mapped = [()]
for k in mydict:
if mydict[k] in l:
mapped[0] += (k, )
print(mapped)
The output:
[(0, 25)]
Using a reverse dictionary. Fast and (unlike the other solutions) keeps the order.
>>> reverse = {v: k for k, v in mydict.items()}
>>> [tuple(map(reverse.get, sub)) for sub in l]
[(0, 25)]
What about this?
l = [((459301.5857207412, 443923.4365563169),(458772.4179957388, 446370.8372844439))]
d = {0: (459301.5857207412, 443923.4365563169),
25: (458772.4179957388, 446370.8372844439)}
mapped = []
for t in l:
m = []
for k,v in d.items():
if v in l[0]:
m.append(k)
mapped.append(tuple(m))
>>>print(mapped)
>>>[(0, 25)]

weighted counting in python

I want to count the instances of X in a list, similar to
How can I count the occurrences of a list item in Python?
but taking into account a weight for each instance.
For example,
L = [(a,4), (a,1), (b,1), (b,1)]
the function weighted_count() should return something like
[(a,5), (b,2)]
Edited to add: my a, b will be integers.
you can still use counter:
from collections import Counter
c = Counter()
for k,v in L:
c.update({k:v})
print c
The following will give you a dictionary of all the letters in the array and their corresponding counts
counts = {}
for value in L:
if value[0] in counts:
counts[value[0]] += value[1]
else:
counts[value[0]] = value[1]
Alternatively, if you're looking for a very specific value. You can filter the list for that value, then map the list to the weights and find the sum of them.
def countOf(x,L):
filteredL = list(filter(lambda value: value[0] == x,L))
return sum(list(map(lambda value: value[1], filteredL)))
>>> import itertools
>>> L = [ ('a',4), ('a',1), ('b',1), ('b',1) ]
>>> [(k, sum(amt for _,amt in v)) for k,v in itertools.groupby(sorted(L), key=lambda tup: tup[0])]
[('a', 5), ('b', 2)]
defaultdict will do:
from collections import defaultdict
L = [('a',4), ('a',1), ('b',1), ('b',1)]
res = defaultdict(int)
for k, v in L:
res[k] += v
print(list(res.items()))
prints:
[('b', 2), ('a', 5)]
Group items with the occurrence of first element of each tuple using groupby from itertools:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> L = [('a',4), ('a',1), ('b',1), ('b',1)]
>>> L_new = []
>>> for k,v in groupby(L,key=itemgetter(0)):
L_new.append((k,sum(map(itemgetter(1), v))))
>>> L_new
[('a', 5), ('b', 2)]
>>> L_new = [(k,sum(map(itemgetter(1), v))) for k,v in groupby(L, key=itemgetter(0))] #for those fun of list comprehension and one liner expression
>>> L_new
[('a', 5), ('b', 2)]
Tested in both Python2 & Python3
Use the dictionaries get method.
>>> d = {}
>>> for item in L:
... d[item[0]] = d.get(item[0], 0) + item[1]
...
>>> d
{'a': 5, 'b': 2}

Convert a list of tuples with repeated keys to a dictionary of lists

I have an association list with repeated keys:
l = [(1, 2), (2, 3), (1, 3), (2, 4)]
and I want a dict with list values:
d = {1: [2, 3], 2: [3, 4]}
Can I do better than:
for (x,y) in l:
try:
z = d[x]
except KeyError:
z = d[x] = list()
z.append(y)
You can use the dict.setdefault() method to provide a default empty list for missing keys:
for x, y in l:
d.setdefault(x, []).append(y)
or you could use a defaultdict() object to create empty lists for missing keys:
from collections import defaultdict
d = defaultdict(list)
for x, y in l:
d[x].append(y)
but to switch off the auto-vivication behaviour you'd have to set the default_factory attribute to None:
d.default_factory = None # switch off creating new lists
You can use collections.defaultdict:
d = collections.defaultdict(list)
for k, v in l:
d[k].append(v)

Categories

Resources