Apologies if this has been asked before, but I couldn't find it. If I have something like:
lst = [(('a', 'b'), 1, 2), (('a', 'b'), 3, 4), (('b', 'c'), 5, 6)]
and I want to obtain a shorter list:
new = [(('a', 'b'), (1, 3), (2, 4)), (('b', 'c'), 5, 6)]
so that it groups together the other elements in a tuple by first matching element, what is the fastest way to go about it?
You are grouping, based on a key. If your input groups are always consecutive, you can use itertools.groupby(), otherwise use a dictionary to group the elements. If order matters, use a dictionary that preserves insertion order (> Python 3.6 dict or collections.OrderedDict).
Using groupby():
from itertools import groupby
from operator import itemgetter
new = [(k, *zip(*(t[1:] for t in g))) for k, g in groupby(lst, key=itemgetter(0))]
The above uses Python 3 syntax to interpolate tuple elements from an iterable (..., *iterable)`.
Using a dictionary:
groups = {}
for key, *values in lst:
groups.setdefault(key, []).append(values)
new = [(k, *zip(*v)) for k, v in groups.items()]
In Python 3.6 or newer, that'll preserve the input order of the groups.
Demo:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> lst = [(('a', 'b'), 1, 2), (('a', 'b'), 3, 4), (('b', 'c'), 5, 6)]
>>> [(k, *zip(*(t[1:] for t in g))) for k, g in groupby(lst, key=itemgetter(0))]
[(('a', 'b'), (1, 3), (2, 4)), (('b', 'c'), (5,), (6,))]
>>> groups = {}
>>> for key, *values in lst:
... groups.setdefault(key, []).append(values)
...
>>> [(k, *zip(*v)) for k, v in groups.items()]
[(('a', 'b'), (1, 3), (2, 4)), (('b', 'c'), (5,), (6,))]
If you are using Python 2, you'd have to use:
new = [(k,) + tuple(zip(*(t[1:] for t in g))) for k, g in groupby(lst, key=itemgetter(0))]
or
from collections import OrderedDict
groups = OrderedDict()
for entry in lst:
groups.setdefault(entry[0], []).append(entry[1:])
new = [(k,) + tuple(zip(*v)) for k, v in groups.items()]
You could also use a collections.defaultdict to group your tuple keys:
from collections import defaultdict
lst = [(('a', 'b'), 1, 2), (('a', 'b'), 3, 4), (('b', 'c'), 5, 6)]
d = defaultdict(tuple)
for tup, fst, snd in lst:
d[tup] += fst, snd
# defaultdict(<class 'tuple'>, {('a', 'b'): (1, 2, 3, 4), ('b', 'c'): (5, 6)})
for key, value in d.items():
d[key] = value[0::2], value[1::2]
# defaultdict(<class 'tuple'>, {('a', 'b'): ((1, 3), (2, 4)), ('b', 'c'): ((5,), (6,))})
result = [(k, v1, v2) for k, (v1, v2) in d.items()]
Which Outputs:
[(('a', 'b'), (1, 3), (2, 4)), (('b', 'c'), (5,), (6,))]
The logic of the above code:
Group the tuples into a defaultdict of tuples.
Split the values into firsts and seconds with slicing [0::2] and [1::2].
Wrap this updated dictionary into the correct tuple structure with a list comprehension.
Depending on your use case, you might find using a dictionary or defaultdict more useful. It will scale better too.
from collections import defaultdict
listmaker = lambda: ([],[]) # makes a tuple of 2 lists for the values.
my_data = defaultdict(listmaker)
for letter_tuple, v1, v2 in lst:
my_data[letter_tuple][0].append(v1)
my_data[letter_tuple][1].append(v2)
Then you’ll get a new tuple of lists for each unique (x,y) key. Python handles the checking to see if the key already exists and it’s fast. If you absolutely need it to be a list, you can always convert it too:
new = [(k, tuple(v1s), tuple(v2s)) for k, (v1s, v2s) in my_data.items()]
This list comprehension is a bit opaque, but it will unpack your dictionary into the form specified [(('a', 'b'), (1,3), (2,4)), ... ]
Related
I have a list of tuples:
lst=[(6, 'C'), (6, 'H'), (2, 'C'), (2, 'H')]
And a dictionary:
dct={'6C': (6, 'C'), '6H': (6, 'H'), '9D': (9, 'D'), '10D': (10, 'D'), '11S': (11, 'S'), '2C': (2, 'C'), '2H': (2, 'H')}
How can I remove the elements from the dictionary that are in the list? In this example my desired output would be:
dct2={'9D': (9, 'D'), '10D': (10, 'D'), '11S': (11, 'S')}
I would use a dictionary comprehension to map the keys with the values that aren't found within a list:
new_dict = {k: v for k, v in old_dict.items() if v not in the_list} # filter from the list
If you're on Python 2 try this:
for key, value in dct.items():
if value in lst:
del dct[key]
EDIT:
A solution that works in both Python 2 and 3:
dict((key, value) for key, value in dct.items() if value not in lst)
Using the valfilter function from toolz:
from toolz import valfilter
valfilter(lst.__contains__, dct)
I would make the lst set before filtering out elements, since it is data structure which let's you test if element is present more efficiently.
purge_set = set(lst)
dict(filter(lambda (k, v): v not in purge_set, dct.iteritems()))
map={"a":5, "b":2, "c":7, "d":5, "e":5}
output should be:
['c', 'a', 'd', 'e', 'b']
So, the code should first assort the dictionary in descending order by its value, and then if its value is the same it should sort by the key in ascending order. So far I have...
newmap=map
newmap=sorted(newmap.iteritems(), key=operator.itemgetter(1,0),reverse=True)
print newmap
This gives me the output [('c', 7), ('e', 5), ('d', 5), ('a', 5), ('b', 2)]. So, I need to get the e, d, a in ascending order... without messing up the sorts of the numbers. How do I do this?
In my answer, I replaced map with dct to not mask the built-in function.
Sorted keys by inverse value, then by key in ascending order:
sorted(dct, key=lambda k: (-dct[k], k))
By turning the value into a negative number, this sorts on value in reverse, while keys are sorted in ascending order.
Demo:
>>> dct = {'a': 5, 'c': 7, 'b': 2, 'e': 5, 'd': 5}
>>> sorted(dct, key=lambda k: (-dct[k], k))
['c', 'a', 'd', 'e', 'b']
Timing comparisons:
>>> import timeit
>>> timeit.timeit("sorted(dct, key=lambda k: (-dct[k], k))", 'from __main__ import dct')
4.741436004638672
>>> timeit.timeit("map(operator.itemgetter(0), sorted(dct.items(), key=lambda i: (-i[1], i[0])))", 'from __main__ import dct; import operator')
7.489126920700073
>>> timeit.timeit("map(operator.itemgetter(0), sorted(sorted(dct.iteritems()), key=operator.itemgetter(1), reverse=True))", 'from __main__ import dct; import operator')
10.01669192314148
Sorting is guaranteed to be stable in Python, so all you have to do is sort twice: first on the key, then on the value.
sorted_pairs = sorted(sorted(map.iteritems()), key=operator.itemgetter(1), reverse=True)
To get just the keys from this output you can use a list comprehension:
[k for k,v in sorted_pairs]
P.S. don't name your variables the same as Python types or you're going to be very surprised some day.
Lets say you have a list like:
l = [('a', 1), ('c', 1), ('b', 1), ('aa', 2), ('aa', 3), ('aa', 1)]
Case 1: Expected: Values lowest first, then for same value, key wise in ascending order.
Idea:
Combine key value in a way that numbers get first priority , then keys.
>>> sorted(l, key=lambda y: str(y[1])+y[0])
[('a', 1), ('aa', 1), ('b', 1), ('c', 1), ('aa', 2), ('aa', 3)]
Case 2: Expected: Highest value to be shown first. If a key has same value, sort in ascending order of alphabets.
>>> sorted(l, key=lambda y: str(10-y[1]) + y[0])
[('aa', 3), ('aa', 2), ('a', 1), ('aa', 1), ('b', 1), ('c', 1)]
I have a dictionary like that:
d = {11:{'a':2.1, 'b':2.2,'c':3.0},
12:{'b':4.5,'g':1.2},
4:{'g':5.6,'a':4.5,'f':0.5,'r':1.3}
}
What I want to get is:
[(4,'g'),(4,'a'),(12,'b'),(11,'c'),(11,'b'),(11,'a'),(4,'r'),(12,'g'),(4,'f')]
So I want to do is sort in descending order the values and get the dictionary keys, that realize this order.
I would like to use key = lambda x, y: d[x][y] something, but I don't know how to return a list I want.
This should do it:
sorted(((k1, k2) for k1 in d for k2 in d[k1]), key=lambda t: d[t[0]][t[1]], reverse=True)
The generator expression lists all key 'paths' to the values first, then sorts those on the value, reversed.
Demo:
>>> sorted(((k1, k2) for k1 in d for k2 in d[k1]), key=lambda t: d[t[0]][t[1]], reverse=True)
[(4, 'g'), (4, 'a'), (12, 'b'), (11, 'c'), (11, 'b'), (11, 'a'), (4, 'r'), (12, 'g'), (4, 'f')]
What's the Pythonic way to sort a zipped list?
code :
names = list('datx')
vals = reversed(list(xrange(len(names))))
zipped = zip(names, vals)
print zipped
The code above prints [('d', 3), ('a', 2), ('t', 1), ('x', 0)]
I want to sort zipped by the values. So ideally it would end up looking like this [('x', 0), ('t', 1), ('a', 2), ('d', 3)].
Quite simple:
sorted(zipped, key=lambda x: x[1])
sorted(zipped, key = lambda t: t[1])
import operator
sorted(zipped, key=operator.itemgetter(1))
If you want it a little bit more faster, do ig = operator.itemgetter(1) and use ig as key function.
In your case you don't need to sort at all because you just want an enumerated reversed list of your names:
>>> list(enumerate(names[::-1])) # reverse by slicing
[(0, 'x'), (1, 't'), (2, 'a'), (3, 'd')]
>>> list(enumerate(reversed(names))) # but reversed is also possible
[(0, 'x'), (1, 't'), (2, 'a'), (3, 'd')]
But if you need to sort it then you should use sorted (as provided by #utdemir or #Ulrich Dangel) because it will work on Python2 (zip and itertools.zip) and Python3 (zip) and won't fail with an AttributeError like .sort(...) (which only works on Python2 zip because there zip returns a list):
>>> # Fails with Python 3's zip:
>>> zipped = zip(names, vals)
>>> zipped.sort(lambda x: x[1])
AttributeError: 'zip' object has no attribute 'sort'
>>> # Fails with Python 2's itertools izip:
>>> from itertools import izip
>>> zipped = izip(names, vals)
>>> zipped.sort(lambda x: x[1])
AttributeError: 'itertools.izip' object has no attribute 'sort'
But sorted does work in each case:
>>> zipped = izip(names, vals)
>>> sorted(zipped, key=lambda x: x[1])
[('x', 0), ('t', 1), ('a', 2), ('d', 3)]
>>> zipped = zip(names, vals) # python 3
>>> sorted(zipped, key=lambda x: x[1])
[('x', 0), ('t', 1), ('a', 2), ('d', 3)]
It's simpler and more efficient to zip them in order in the first place (if you can). Given your example it's pretty easy:
>>> names = 'datx'
>>> zip(reversed(names), xrange(len(names)))
<<< [('x', 0), ('t', 1), ('a', 2), ('d', 3)]
Sort feature importance in a classifier (dtc=decision_tree):
for name, importance in sorted(zip(X_train.columns,
dtc.feature_importances_),key=lambda x: x[1]):
print(name, importance)
Sorting the content of a dictonary by the value has been throughly described already, so it can be acheived by something like this:
d={'d':1,'b':2,'c':2,'a':3}
sorted_res_1= sorted(d.items(), key=lambda x: x[1])
# or
from operator import itemgetter
sorted_res_2 = sorted(d.items(), key=itemgetter(1))
My question is, what would be the best way to acheive the following output:
[('d', 1), ('b', 2), ('c', 2), ('a', 3)] instead of [('d', 1), ('c', 2), ('b', 2), ('a', 3)]
so that the tuples are sorted by value and then by the key, if the value was equal.
Secondly - would such be possible for reversed:
[('a', 3), ('b', 2), ('c', 2), ('d', 1)] instead of [('a', 3), ('c', 2), ('b', 2), ('d', 1)]?
The sorted key parameter can return a tuple. In that case, the first item in the tuple is used to sort the items, and the second is used to break ties, and the third for those still tied, and so on...
In [1]: import operator
In [2]: d={'d':1,'b':2,'c':2,'a':3}
In [3]: sorted(d.items(),key=operator.itemgetter(1,0))
Out[3]: [('d', 1), ('b', 2), ('c', 2), ('a', 3)]
operator.itemgetter(1,0) returns a tuple formed from the second, and then the first item. That is, if f=operator.itemgetter(1,0) then f(x) returns (x[1],x[0]).
You just want standard tuple comparing, but in reversed mode:
>>> sorted(d.items(), key=lambda x: x[::-1])
[('d', 1), ('b', 2), ('c', 2), ('a', 3)]
An alternative approach, very close to your own example:
sorted(d.items(), key=lambda x: (x[1], x[0]))