How to sort Counter by value? - python - python

Other than doing list comprehensions of reversed list comprehension, is there a pythonic way to sort Counter by value? If so, it is faster than this:
>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> sorted(x)
['a', 'b', 'c']
>>> sorted(x.items())
[('a', 5), ('b', 3), ('c', 7)]
>>> [(l,k) for k,l in sorted([(j,i) for i,j in x.items()])]
[('b', 3), ('a', 5), ('c', 7)]
>>> [(l,k) for k,l in sorted([(j,i) for i,j in x.items()], reverse=True)]
[('c', 7), ('a', 5), ('b', 3)

Use the Counter.most_common() method, it'll sort the items for you:
>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> x.most_common()
[('c', 7), ('a', 5), ('b', 3)]
It'll do so in the most efficient manner possible; if you ask for a Top N instead of all values, a heapq is used instead of a straight sort:
>>> x.most_common(1)
[('c', 7)]
Outside of counters, sorting can always be adjusted based on a key function; .sort() and sorted() both take callable that lets you specify a value on which to sort the input sequence; sorted(x, key=x.get, reverse=True) would give you the same sorting as x.most_common(), but only return the keys, for example:
>>> sorted(x, key=x.get, reverse=True)
['c', 'a', 'b']
or you can sort on only the value given (key, value) pairs:
>>> sorted(x.items(), key=lambda pair: pair[1], reverse=True)
[('c', 7), ('a', 5), ('b', 3)]
See the Python sorting howto for more information.

A rather nice addition to #MartijnPieters answer is to get back a dictionary sorted by occurrence since Collections.most_common only returns a tuple. I often couple this with a json output for handy log files:
from collections import Counter, OrderedDict
x = Counter({'a':5, 'b':3, 'c':7})
y = OrderedDict(x.most_common())
With the output:
OrderedDict([('c', 7), ('a', 5), ('b', 3)])
{
"c": 7,
"a": 5,
"b": 3
}

Yes:
>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
Using the sorted keyword key and a lambda function:
>>> sorted(x.items(), key=lambda i: i[1])
[('b', 3), ('a', 5), ('c', 7)]
>>> sorted(x.items(), key=lambda i: i[1], reverse=True)
[('c', 7), ('a', 5), ('b', 3)]
This works for all dictionaries. However Counter has a special function which already gives you the sorted items (from most frequent, to least frequent). It's called most_common():
>>> x.most_common()
[('c', 7), ('a', 5), ('b', 3)]
>>> list(reversed(x.most_common())) # in order of least to most
[('b', 3), ('a', 5), ('c', 7)]
You can also specify how many items you want to see:
>>> x.most_common(2) # specify number you want
[('c', 7), ('a', 5)]

More general sorted, where the key keyword defines the sorting method, minus before numerical type indicates descending:
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> sorted(x.items(), key=lambda k: -k[1]) # Ascending
[('c', 7), ('a', 5), ('b', 3)]

Related

Sort the letter count descending [duplicate]

Other than doing list comprehensions of reversed list comprehension, is there a pythonic way to sort Counter by value? If so, it is faster than this:
>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> sorted(x)
['a', 'b', 'c']
>>> sorted(x.items())
[('a', 5), ('b', 3), ('c', 7)]
>>> [(l,k) for k,l in sorted([(j,i) for i,j in x.items()])]
[('b', 3), ('a', 5), ('c', 7)]
>>> [(l,k) for k,l in sorted([(j,i) for i,j in x.items()], reverse=True)]
[('c', 7), ('a', 5), ('b', 3)
Use the Counter.most_common() method, it'll sort the items for you:
>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> x.most_common()
[('c', 7), ('a', 5), ('b', 3)]
It'll do so in the most efficient manner possible; if you ask for a Top N instead of all values, a heapq is used instead of a straight sort:
>>> x.most_common(1)
[('c', 7)]
Outside of counters, sorting can always be adjusted based on a key function; .sort() and sorted() both take callable that lets you specify a value on which to sort the input sequence; sorted(x, key=x.get, reverse=True) would give you the same sorting as x.most_common(), but only return the keys, for example:
>>> sorted(x, key=x.get, reverse=True)
['c', 'a', 'b']
or you can sort on only the value given (key, value) pairs:
>>> sorted(x.items(), key=lambda pair: pair[1], reverse=True)
[('c', 7), ('a', 5), ('b', 3)]
See the Python sorting howto for more information.
A rather nice addition to #MartijnPieters answer is to get back a dictionary sorted by occurrence since Collections.most_common only returns a tuple. I often couple this with a json output for handy log files:
from collections import Counter, OrderedDict
x = Counter({'a':5, 'b':3, 'c':7})
y = OrderedDict(x.most_common())
With the output:
OrderedDict([('c', 7), ('a', 5), ('b', 3)])
{
"c": 7,
"a": 5,
"b": 3
}
Yes:
>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
Using the sorted keyword key and a lambda function:
>>> sorted(x.items(), key=lambda i: i[1])
[('b', 3), ('a', 5), ('c', 7)]
>>> sorted(x.items(), key=lambda i: i[1], reverse=True)
[('c', 7), ('a', 5), ('b', 3)]
This works for all dictionaries. However Counter has a special function which already gives you the sorted items (from most frequent, to least frequent). It's called most_common():
>>> x.most_common()
[('c', 7), ('a', 5), ('b', 3)]
>>> list(reversed(x.most_common())) # in order of least to most
[('b', 3), ('a', 5), ('c', 7)]
You can also specify how many items you want to see:
>>> x.most_common(2) # specify number you want
[('c', 7), ('a', 5)]
More general sorted, where the key keyword defines the sorting method, minus before numerical type indicates descending:
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> sorted(x.items(), key=lambda k: -k[1]) # Ascending
[('c', 7), ('a', 5), ('b', 3)]

How to find mean of an array which has two elements in Python?

I need to find mean of an array which is like: [('a', 5), ('b', 2), ('a', 4), ('b', 6)]
Result should be like; [('a', 4.5), ('b', 4)]
You can put all your tuples in a defaultdict, using the first value to group them into a list and then calculate the mean:
from collections import defaultdict
d = defaultdict(list)
for key,value in [('a', 5), ('b', 2), ('a', 4), ('b', 6)]:
d[key].append(value)
mean = []
for k,values in d.items():
# mean.append((k,sum(values)/float(len(values)))) #python 2
mean.append((k,sum(values)/len(values)))
print(mean) # [('a', 4.5), ('b', 4.0)]
Raw solution without additional libraries could look like this:
def mean(l):
result = {}
for key, value in l:
if key not in result:
result[key] = []
result[key].append(value)
return [(k, sum(v)/len(v)) for k, v in result.items()]
lst = [('a', 5), ('b', 2), ('a', 4), ('b', 6)]
m = mean(lst)
print(m)
# [('a', 4.5), ('b', 4.0)]
We can use pandas for this:
import pandas as pd
pd.DataFrame(data).groupby(0)[1].mean().to_dict()
this will give us:
>>> pd.DataFrame(data).groupby(0)[1].mean().to_dict()
{'a': 4.5, 'b': 4.0}
or we can convert this to a list of 2-tuples with:
list(pd.DataFrame(data).groupby(0)[1].mean().to_dict().items())
which gives:
>>> list(pd.DataFrame(data).groupby(0)[1].mean().to_dict().items())
[('a', 4.5), ('b', 4.0)]
The above is thus more a "declarative" approach: we specify what we want, not much how we want to do this.
You can collect the numbers with a collections.defaultdict(), then apply statistics.mean() on each group of numbers:
from statistics import mean
from collections import defaultdict
lst = [('a', 5), ('b', 2), ('a', 4), ('b', 6)]
d = defaultdict(list)
for k, v in lst:
d[k].append(v)
means = [(k, mean(v)) for k, v in d.items()]
print(means)
# [('a', 4.5), ('b', 4)]
You can also use itertools.groupby() to group the tuples:
from statistics import mean
from itertools import groupby
from operator import itemgetter
lst = [("a", 5), ("b", 2), ("a", 4), ("b", 6)]
means = [
(k, mean(map(itemgetter(1), g)))
for k, g in groupby(sorted(lst, key=itemgetter(0)), key=itemgetter(0))
]
print(means)
[('a', 4.5), ('b', 4)]
If you wish, you can also try the below reusable code (without using any external libraries).
>>> def get_mean(l):
... d = {}
... for k, v in l:
... if k in d:
... d[k].append(v)
... else:
... d[k] = [v]
... result = [(k, sum(d[k])/len(d[k])) for k in d]
... return result
...
>>> l = [('a', 5), ('b', 2), ('a', 4), ('b', 6)]
>>> new_l = get_mean(l)
>>> new_l
[('a', 4.5), ('b', 4.0)]
>>>

adding lists of tuples and sorting them

I have the following lists of tuples:
mylist=[('a', 3), ('b', 2), ('c', 8)]
mylist2=[('a', 3), ('b', 5), ('c', 20), ('d', 5)]
Is there a way I can sum all values that share the same name and sort them in Python? Something like:
[('c', 28), ('b', 7), ('a', 6), ('d', 5)]
If I were you, I would have done it like:
>>> mylist=[('a', 3), ('b', 2), ('c', 8)]
>>> mylist2=[('a', 3), ('b', 5), ('c', 20), ('d', 5)]
# Step 1: Convert the list of tuples to `dict`
>>> dict_1, dict_2 = dict(mylist), dict(mylist2)
# Step 2: get set of all keys
>>> all_keys = set(dict_1.keys() + dict_2.keys())
# Step 3: Get `sum` of value for each key
>>> sum_list = [(k, dict_1.get(k, 0) + dict_2.get(k, 0)) for k in all_keys]
And then sort the list as:
>>> from operator import itemgetter
# Step 4: Sort in descending order based on value at index 1
>>> sorted(sum_list, key=itemgetter(1), reverse=True)
[('c', 28), ('b', 7), ('a', 6), ('d', 5)]
Note: It assumes that the key at 0th index in tuples of both the lists are unique.

How to return a list as a dictionary, with the placement of the lists' variables as the dictionary's values?

For example, in a race, I have a list of runners and their names in a list ordered from their places, such as ['Bob', 'Charlie', 'Sarah', 'Alex', 'Bob']
I want to create a dictionary with this list such as
{'Bob': [0, 4], 'Charlie': [1], 'Sarah': [2], 'Alex': [3]}
If you only need to create a dictionary with the list variables as the dictionary keys and the positions of the lists' variables as the dictionary values, how would you do so?
[A, B, C, A] -> {A: [0, 3] B: [1], C:[2]}
(I'm having trouble figuring this out.)
Thank you. Sorry for the changed output. Thank you very much!
You can use enumerate(). This will iterate through the list, providing you with both the current element and that element's index.
my_list = ['Bob', 'Charlie', 'Sarah']
my_dict = {}
for index, name in enumerate(my_list):
my_dict[name] = index
EDIT: Since the OP has changed.
To get exactly what you requested, you could use a defaultdict. This will create a dict and you specify what you want the default values to be. So if you go to access a key that does not yet exist, an empty list will automatically be added as the value. This way you can do the following:
from collections import defualtdict
my_list = ['Bob', 'Charlie', 'Sarah', 'Bob']
my_dict = defaultdict(list)
for index, name in enumerate(my_list):
my_dict[name].append(index)
you can use enumerate() and itertools.groupby():
>>> your_list=['A','B','C','C','A','A','B','D']
>>> l=[(j,i) for i,j in enumerate(your_list,1)]
>>> l
[('A', 1), ('B', 2), ('C', 3), ('C', 4), ('A', 5), ('A', 6), ('B', 7), ('D', 8)]
>>> g=[list(g) for k, g in groupby(sorted(l),itemgetter(0))]
>>> g
[[('A', 1), ('A', 5), ('A', 6)], [('B', 2), ('B', 7)], [('C', 3), ('C', 4)], [('D', 8)]]
>>> z=[zip(*i) for i in g]
>>> z
[[('A', 'A', 'A'), (1, 5, 6)], [('B', 'B'), (2, 7)], [('C', 'C'), (3, 4)], [('D',), (8,)]]
>>> {i[0]:j for i,j in z}
{'A': (1, 5, 6), 'C': (3, 4), 'B': (2, 7), 'D': (8,)}
how about a simple loop to get the desired result:
x = ['Bob', 'Charlie', 'Sarah', 'Alex', 'Bob']
y = {}
for i, name in enumerate(x):
if name in y.keys():
y[name].append(i)
else:
y[name] = [i]

Python list sort by size of group

I have a group of items that are labeled like item_labels = [('a', 3), ('b', 2), ('c', 1), ('d', 3), ('e', 2), ('f', 3)]
I want to sort them by the size of group. e.g., label 3 has size 3 and label 2 has size 2 in the above example.
I tried using a combination of groupby and sorted but didn't work.
In [162]: sil = sorted(item_labels, key=op.itemgetter(1))
In [163]: sil
Out[163]: [('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]
In [164]: g = itt.groupby(sil,)
Display all 465 possibilities? (y or n)
In [164]: g = itt.groupby(sil, key=op.itemgetter(1))
In [165]: for k, v in g:
.....: print k, list(v)
.....:
.....:
1 [('c', 1)]
2 [('b', 2), ('e', 2)]
3 [('a', 3), ('d', 3), ('f', 3)]
In [166]: sg = sorted(g, key=lambda x: len(list(x[1])))
In [167]: sg
Out[167]: [] # not exactly know why I got an empty list here
I can always write some tedious for-loop to do this, but I would rather find something more elegant. Any suggestion? If there are libraries that are useful I would happy to use that. e.g., pandas, scipy
In python2.7 and above, use Counter:
from collections import Counter
c = Counter(y for _, y in item_labels)
item_labels.sort(key=lambda t : c[t[1]])
In python2.6, for our purpose, this Counter constructor can be implemented using defaultdict (as suggested by #perreal) this way:
from collections import defaultdict
def Counter(x):
d = defaultdict(int)
for v in x: d[v]+=1
return d
Since we are working with numbers only, and assuming the numbers are as low as those in your example, we can actually use a list (which will be compatible with even older version of Python):
def Counter(x):
lst = list(x)
d = [0] * (max(lst)+1)
for v in lst: d[v]+=1
return d
Without counter, you can simply do this:
item_labels.sort(key=lambda t : len([x[1] for x in item_labels if x[1]==t[1] ]))
It is slower, but reasonable over short lists.
The reason you've got an empty list is that g is a generator. You can only iterate over it once.
from collections import defaultdict
import operator
l=[('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]
d=defaultdict(int)
for p in l: d[p[1]] += 1
print [ p for i in sorted(d.iteritems(), key=operator.itemgetter(1))
for p in l if p[1] == i[1] ]
itertools.groupby returns an iterator, so this for loop: for k, v in g: actually consumed that iterator.
>>> it = iter([1,2,3])
>>> for x in it:pass
>>> list(it) #iterator already consumed by the for-loop
[]
code:
>>> lis = [('a', 3), ('b', 2), ('c', 1), ('d', 3), ('e', 2), ('f', 3)]
>>> from operator import itemgetter
>>> from itertools import groupby
>>> lis.sort(key = itemgetter(1) )
>>> new_lis = [list(v) for k,v in groupby(lis, key = itemgetter(1) )]
>>> new_lis.sort(key = len)
>>> new_lis
[[('c', 1)], [('b', 2), ('e', 2)], [('a', 3), ('d', 3), ('f', 3)]]
To get a flattened list use itertools.chain:
>>> from itertools import chain
>>> list( chain.from_iterable(new_lis))
[('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]
Same as #perreal's and #Elazar's answers, but with better names:
from collections import defaultdict
size = defaultdict(int)
for _, group_id in item_labels:
size[group_id] += 1
item_labels.sort(key=lambda (_, group_id): size[group_id])
print item_labels
# -> [('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]
Here is another way:
example=[('a', 3), ('b', 2), ('c', 1), ('d', 3), ('e', 2), ('f', 3)]
out={}
for t in example:
out.setdefault(t[1],[]).append(t)
print sorted(out.values(),key=len)
Prints:
[[('c', 1)], [('b', 2), ('e', 2)], [('a', 3), ('d', 3), ('f', 3)]]
If you want a flat list:
print [l for s in sorted(out.values(),key=len) for l in s]
[('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]

Categories

Resources