Find least frequent value in dictionary - python

I'm working on a problem that asks me to return the least frequent value in a dictionary and I can't seem to work it out besides with a few different counts, but there aren't a set number of values in the dictionaries being provided in the checks.
For example, suppose the dictionary contains mappings from students' names (strings) to their ages (integers). Your method would return the least frequently occurring age. Consider a dictionary variable d containing the following key/value pairs:
{'Alyssa':22, 'Char':25, 'Dan':25, 'Jeff':20, 'Kasey':20, 'Kim':20, 'Mogran':25, 'Ryan':25, 'Stef':22}
Three people are age 20 (Jeff, Kasey, and Kim), two people are age 22 (Alyssa and Stef), and four people are age 25 (Char, Dan, Mogran, and Ryan). So rarest(d) returns 22 because only two people are that age.
Would anyone mind pointing me in the right direction please? Thanks!

Counting the members of a collection is the job of collections.Counter:
d={'Alyssa':22, 'Char':25, 'Dan':25, 'Jeff':20, 'Kasey':20, 'Kim':20, 'Mogran':25, 'Ryan':25, 'Stef':22}
import collections
print collections.Counter(d.values()).most_common()[-1][0]
22

You can create an empty dict for the counters, then loop through the dict you've got and add 1 to the corresponding value in the second dict, then return the key of the element with the minimum value in the second dict.

from collections import Counter
min(Counter(my_dict_of_ages.values()).items(),key=lambda x:x[1])
would do it i think

You can use collections.Counter
d={'Alyssa':22, 'Char':25, 'Dan':25, 'Jeff':20, 'Kasey':20, 'Kim':20, 'Mogran':25, 'Ryan':25, 'Stef':22}
import collections
print collections.Counter(d.values()).most_common()[-1][0]
Or write your own function:
def rarest(dict):
values = dict.values()
least_frequent = max(values)
for x in set(values):
if values.count(x) < least_frequent:
least_frequent = x
return {least_frequent:dict[least_frequent]}
>>> rarest({'Alyssa':22, 'Char':25, 'Dan':25, 'Jeff':20, 'Kasey':20, 'Kim':20, 'Mogran':25, 'Ryan':25, 'Stef':22})
{22:2}

You could create a second dictionary that uses the values in the first (ages) as keys in the second, with the values of the second as counts. Then sort the values of the second and do a reverse loop-up to get the associated keys (there are a few ways to do this efficiently by treating the the list of keys and the list of values as numpy arrays).
import numpy
d = {'Alyssa':22, 'Char':25, 'Dan':25, 'Jeff':20, 'Kasey':20, 'Kim':20, 'Mogran':25, 'Ryan':25, 'Stef':22}
def rarest(d):
s = {}
# First, map ages to counts.
for key in d:
if d[key] not in s:
s[d[key]] = 1
else:
s[d[key]] += 1 # Could use a defaultdict for this.
# Second, sort on the counts to find the rarest.
keys = numpy.array(s.keys())
values = numpy.array(s.values())
ordering = np.argsort(values)
return keys[ordering][0]
There's probably a more efficient way to do this, but that seems to work.

my_dict = {'Alyssa':22, 'Char':25, 'Dan':25, 'Jeff':20, 'Kasey':20, 'Kim':20, 'Mogran':25, 'Ryan':25, 'Stef':22}
values = my_dict.values()
most_frequent = 0
for x in set(values):
if values.count(x) > most_frequent:
most_frequent = x
print most_frequent
This code uses the set() method, which returns a set with all unique elements, i.e.:
>> set([1, 2, 3, 4, 2, 1])
set([1, 2, 3, 4])
To extract all the values from the dict, you can use dict.values(). Likewise, you have dict.keys() and dict.items().
>> my_dict.keys()
['Char', 'Stef', 'Kim', 'Jeff', 'Kasey', 'Dan', 'Mogran', 'Alyssa', 'Ryan']
>> my_dict.values()
[25, 22, 20, 20, 20, 25, 25, 22, 25]
>> my_dict.items()
[('Char', 25),
('Stef', 22),
('Kim', 20),
('Jeff', 20),
('Kasey', 20),
('Dan', 25),
('Mogran', 25),
('Alyssa', 22),
('Ryan', 25)]

In case anyone else prefers to remember as few function/property names and packages as possible, JadedTuna's answer is good. Here's my go-to:
val_count = {}
for k in d:
if k in val_count.keys():
val_count[k] += 1
else:
val_count[k] = 1
val_count = list(val_count.items()) # Convert dict to [(k1, v1), (k2, v2), ...]
val_count.sort(key=lambda tup: tup[1]) # Sorts by count. Add reverse=True if you'd like mode instead
val_count[0]

Related

Make a new list depending on group number and add scores up as well

If a have a list within a another list that looks like this...
[['Harry',9,1],['Harry',17,1],['Jake',4,1], ['Dave',9,2],['Sam',17,2],['Sam',4,2]]
How can I add the middle element together so so for 'Harry' for example, it shows up as ['Harry', 26] and also for Python to look at the group number (3rd element) and output the winner only (the one with the highest score which is the middle element). So for each group, there needs to be one winner. So the final output shows:
[['Harry', 26],['Sam',21]]
THIS QUESTION IS NOT A DUPLICATE: It has a third element as well which I am stuck about
The similar question gave me an answer of:
grouped_scores = {}
for name, score, group_number in players_info:
if name not in grouped_scores:
grouped_scores[name] = score
grouped_scores[group_number] = group_number
else:
grouped_scores[name] += score
But that only adds the scores up, it doesn't take out the winner from each group. Please help.
I had thought doing something like this, but I'm not sure exactly what to do...
grouped_scores = {}
for name, score, group_number in players_info:
if name not in grouped_scores:
grouped_scores[name] = score
else:
grouped_scores[name] += score
for group in group_number:
if grouped_scores[group_number] = group_number:
[don't know what to do here]
Solution:
Use itertools.groupby, and collections.defaultdict:
l=[['Harry',9,1],['Harry',17,1],['Jake',4,1], ['Dave',9,2],['Sam',17,2],['Sam',4,2]]
from itertools import groupby
from collections import defaultdict
l2=[list(y) for x,y in groupby(l,key=lambda x: x[-1])]
l3=[]
for x in l2:
d=defaultdict(int)
for x,y,z in x:
d[x]+=y
l3.append(max(list(map(list,dict(d).items())),key=lambda x: x[-1]))
Now:
print(l3)
Is:
[['Harry', 26], ['Sam', 21]]
Explanation:
First two lines are importing modules. Then the next line is using groupby to separate in to two groups based on last element of each sub-list. Then the next line to create empty list. Then the next loop iterating trough the grouped ones. Then create a defaultdict. Then the sub-loop is adding the stuff to the defaultdict. Then last line to manage how to make that dictionary into a list.
I would aggregate the data first with a defaultdict.
>>> from collections import defaultdict
>>>
>>> combined = defaultdict(lambda: defaultdict(int))
>>> data = [['Harry',9,1],['Harry',17,1],['Jake',4,1], ['Dave',9,2],['Sam',17,2],['Sam',4,2]]
>>>
>>> for name, score, group in data:
...: combined[group][name] += score
...:
>>> combined
>>>
defaultdict(<function __main__.<lambda>()>,
{1: defaultdict(int, {'Harry': 26, 'Jake': 4}),
2: defaultdict(int, {'Dave': 9, 'Sam': 21})})
Then apply max to each value in that dict.
>>> from operator import itemgetter
>>> [list(max(v.items(), key=itemgetter(1))) for v in combined.values()]
>>> [['Harry', 26], ['Sam', 21]]
use itertools.groupby and then take the middle value from the grouped element and then append it to a list passed on the maximum condition
import itertools
l=[['Harry',9,1],['Harry',17,1],['Jake',4,1], ['Dave',9,2],['Sam',17,2],['Sam',4,2]]
maxlist=[]
maxmiddleindexvalue=0
for key,value in itertools.groupby(l,key=lambda x:x[0]):
s=0
m=0
for element in value:
s+=element[1]
m=max(m,element[1])
if(m==maxmiddleindexvalue):
maxlist.append([(key,s)])
if(m>maxmiddleindexvalue):
maxlist=[(key,s)]
maxmiddleindexvalue=m
print(maxlist)
OUTPUT
[('Harry', 26), [('Sam', 21)]]

How do I access values from dictionary randomly in python3? [duplicate]

How can I get a random pair from a dict? I'm making a game where you need to guess a capital of a country and I need questions to appear randomly.
The dict looks like {'VENEZUELA':'CARACAS'}
How can I do this?
One way would be:
import random
d = {'VENEZUELA':'CARACAS', 'CANADA':'OTTAWA'}
random.choice(list(d.values()))
EDIT: The question was changed a couple years after the original post, and now asks for a pair, rather than a single item. The final line should now be:
country, capital = random.choice(list(d.items()))
I wrote this trying to solve the same problem:
https://github.com/robtandy/randomdict
It has O(1) random access to keys, values, and items.
If you don't want to use the random module, you can also try popitem():
>> d = {'a': 1, 'b': 5, 'c': 7}
>>> d.popitem()
('a', 1)
>>> d
{'c': 7, 'b': 5}
>>> d.popitem()
('c', 7)
Since the dict doesn't preserve order, by using popitem you get items in an arbitrary (but not strictly random) order from it.
Also keep in mind that popitem removes the key-value pair from dictionary, as stated in the docs.
popitem() is useful to destructively iterate over a dictionary
>>> import random
>>> d = dict(Venezuela = 1, Spain = 2, USA = 3, Italy = 4)
>>> random.choice(d.keys())
'Venezuela'
>>> random.choice(d.keys())
'USA'
By calling random.choice on the keys of the dictionary (the countries).
Try this:
import random
a = dict(....) # a is some dictionary
random_key = random.sample(a, 1)[0]
This definitely works.
This works in Python 2 and Python 3:
A random key:
random.choice(list(d.keys()))
A random value
random.choice(list(d.values()))
A random key and value
random.choice(list(d.items()))
Since the original post wanted the pair:
import random
d = {'VENEZUELA':'CARACAS', 'CANADA':'TORONTO'}
country, capital = random.choice(list(d.items()))
(python 3 style)
If you don't want to use random.choice() you can try this way:
>>> list(myDictionary)[i]
'VENEZUELA'
>>> myDictionary = {'VENEZUELA':'CARACAS', 'IRAN' : 'TEHRAN'}
>>> import random
>>> i = random.randint(0, len(myDictionary) - 1)
>>> myDictionary[list(myDictionary)[i]]
'TEHRAN'
>>> list(myDictionary)[i]
'IRAN'
When they ask for a random pair here they mean a key and value.
For such a dict where the key:values are country:city,
use random.choice().
Pass the dictionary keys to this function as follows:
import random
keys = list(my_dict)
country = random.choice(keys)
You may wish to track the keys that were already called in a round and when getting a fresh country, loop until the random selection is not in the list of those already "drawn"... as long as the drawn list is shorter than the keys list.
Since this is homework:
Check out random.sample() which will select and return a random element from an list. You can get a list of dictionary keys with dict.keys() and a list of dictionary values with dict.values().
I am assuming that you are making a quiz kind of application. For this kind of application I have written a function which is as follows:
def shuffle(q):
"""
The input of the function will
be the dictionary of the question
and answers. The output will
be a random question with answer
"""
selected_keys = []
i = 0
while i < len(q):
current_selection = random.choice(q.keys())
if current_selection not in selected_keys:
selected_keys.append(current_selection)
i = i+1
print(current_selection+'? '+str(q[current_selection]))
If I will give the input of questions = {'VENEZUELA':'CARACAS', 'CANADA':'TORONTO'} and call the function shuffle(questions) Then the output will be as follows:
VENEZUELA? CARACAS
CANADA? TORONTO
You can extend this further more by shuffling the options also
With modern versions of Python(since 3), the objects returned by methods dict.keys(), dict.values() and dict.items() are view objects*. And hey can be iterated, so using directly random.choice is not possible as now they are not a list or set.
One option is to use list comprehension to do the job with random.choice:
import random
colors = {
'purple': '#7A4198',
'turquoise':'#9ACBC9',
'orange': '#EF5C35',
'blue': '#19457D',
'green': '#5AF9B5',
'red': ' #E04160',
'yellow': '#F9F985'
}
color=random.choice([hex_color for color_value in colors.values()]
print(f'The new color is: {color}')
References:
*Python 3.8: Standard Library Documentation - Built-in types: Dictionary view objects
Python 3.8: Data Structures - List Comprehensions:
I just stumbled across a similar problem and designed the following solution (relevant function is pick_random_item_from_dict; other functions are just for completeness).
import random
def pick_random_key_from_dict(d: dict):
"""Grab a random key from a dictionary."""
keys = list(d.keys())
random_key = random.choice(keys)
return random_key
def pick_random_item_from_dict(d: dict):
"""Grab a random item from a dictionary."""
random_key = pick_random_key_from_dict(d)
random_item = random_key, d[random_key]
return random_item
def pick_random_value_from_dict(d: dict):
"""Grab a random value from a dictionary."""
_, random_value = pick_random_item_from_dict(d)
return random_value
# Usage
d = {...}
random_item = pick_random_item_from_dict(d)
The main difference from previous answers is in the way we handle the dictionary copy with list(d.items()). We can partially circumvent that by only making a copy of d.keys() and using the random key to pick its associated value and create our random item.
Try this (using random.choice from items)
import random
a={ "str" : "sda" , "number" : 123, 55 : "num"}
random.choice(list(a.items()))
# ('str', 'sda')
random.choice(list(a.items()))[1] # getting a value
# 'num'
To select 50 random key values from a dictionary set dict_data:
sample = random.sample(set(dict_data.keys()), 50)
I needed to iterate through ranges of keys in a dict without sorting it each time and found the Sorted Containers library. I discovered that this library enables random access to dictionary items by index which solves this problem intuitively and without iterating through the entire dict each time:
>>> import sortedcontainers
>>> import random
>>> d = sortedcontainers.SortedDict({1: 'a', 2: 'b', 3: 'c'})
>>> random.choice(d.items())
(1, 'a')
>>> random.sample(d.keys(), k=2)
[1, 3]
I found this post by looking for a rather comparable solution. For picking multiple elements out of a dict, this can be used:
idx_picks = np.random.choice(len(d), num_of_picks, replace=False) #(Don't pick the same element twice)
result = dict ()
c_keys = [d.keys()] #not so efficient - unfortunately .keys() returns a non-indexable object because dicts are unordered
for i in idx_picks:
result[c_keys[i]] = d[i]
Here is a little Python code for a dictionary class that can return random keys in O(1) time. (I included MyPy types in this code for readability):
from typing import TypeVar, Generic, Dict, List
import random
K = TypeVar('K')
V = TypeVar('V')
class IndexableDict(Generic[K, V]):
def __init__(self) -> None:
self.keys: List[K] = []
self.vals: List[V] = []
self.dict: Dict[K, int] = {}
def __getitem__(self, key: K) -> V:
return self.vals[self.dict[key]]
def __setitem__(self, key: K, val: V) -> None:
if key in self.dict:
index = self.dict[key]
self.vals[index] = val
else:
self.dict[key] = len(self.keys)
self.keys.append(key)
self.vals.append(val)
def __contains__(self, key: K) -> bool:
return key in self.dict
def __len__(self) -> int:
return len(self.keys)
def random_key(self) -> K:
return self.keys[random.randrange(len(self.keys))]
b = { 'video':0, 'music':23,"picture":12 }
random.choice(tuple(b.items())) ('music', 23)
random.choice(tuple(b.items())) ('music', 23)
random.choice(tuple(b.items())) ('picture', 12)
random.choice(tuple(b.items())) ('video', 0)

Removing items of a certain index from a dictionary?

If I've got a dictionary and it's sorted, and I want to remove the first three items (in order of value) from it by index (no matter what the contents of the initial dictionary was), what do I do? How would I go about doing so?
I was hoping it would let me just slice (such as one does with lists), but I've been made aware that that's impossible.
EDIT: By index I mean indices. So for example, were I to remove the items from 1 to 3 of the sorted dictionary below, after it was sorted by value, then I would only be left with "eggs".
EDIT 2: How do I find the keys in those places then (in indices 0, 1, 2)?
EDIT 3: I'm not allowed to import or print in this.
For example:
>>>food = {"ham":12, "cookie":5, "eggs":16, "steak":2}
>>>remove_3(food)
{"eggs":16}
Get key value pairs (.items()), sort them by value (item[1]), and take the first 3 ([:3]):
for key, value in sorted(food.items(), key=lambda item: item[1])[:3]:
del food[key]
Try the following:
import operator
from collections import OrderedDict
food = {"ham": 12, "cookie": 5, "eggs": 16, "steak": 2}
ordered_dict = OrderedDict(sorted(food.items(), key=operator.itemgetter(1)))
for key in list(ordered_dict)[:3]:
del ordered_dict[key]
Output:
>>> ordered_dict
OrderedDict([('eggs', 16)])
Firstly, regarding your statement:
If I've got a dictionary and it's sorted
dict in Python are not ordered in nature. Hence you can not preserve the order. If you want to create a dict with the sorted order, use collections.OrderedDict(). For example:
>>> from collections import OrderedDict
>>> from operator import itemgetter
>>> food = {"ham":12, "cookie":5, "eggs":16, "steak":2}
>>> my_ordered_dict = OrderedDict(sorted(food.items(), key=itemgetter(1)))
The value hold by my_ordered_dict will be:
>>> my_ordered_dict
OrderedDict([('steak', 2), ('cookie', 5), ('ham', 12), ('eggs', 16)])
which is equivalent to dict preserving the order as:
{
'steak': 2,
'cookie': 5,
'ham': 12,
'eggs': 16
}
In order to convert the dict excluding items with top 3 value, you have to slice the items (dict.items() returns list of tuples in the form (key, value)):
>>> dict(my_ordered_dict.items()[3:]) # OR, OrderedDict(my_ordered_dict.items()[3:])
{'eggs': 16} # for maintaining the order

Merging values from 2 dictionaries (Python)

(I'm new to Python!)
Trying to figure out this homework question:
The function will takes a​s input​ two dictionaries, each mapping strings to integers. The function will r​eturn​ a dictionary that maps strings from the two input dictionaries to the sum of the integers in the two input dictionaries.
my idea was this:
def ​add(​dicA,dicB):
dicA = {}
dicB = {}
newdictionary = dicA.update(dicB)
however, that brings back None.
In the professor's example:
print(add({'alice':10, 'Bob':3, 'Carlie':1}, {'alice':5, 'Bob':100, 'Carlie':1}))
the output is:
{'alice':15, 'Bob':103, 'Carlie':2}
My issue really is that I don't understand how to add up the values from each dictionaries. I know that the '+' is not supported with dictionaries. I'm not looking for anyone to do my homework for me, but any suggestions would be very much appreciated!
From the documentation:
update([other])
Update the dictionary with the key/value pairs from other, overwriting existing keys. Return None.
You don't want to replace key/value pairs, you want to add the values for similar keys. Go through each dictionary and add each value to the relevant key:
def ​add(​dicA,dicB):
result = {}
for d in dicA, dicB:
for key in d:
result[key] = result.get(key, 0) + d[key]
return result
result.get(key, 0) will retrieve the value of an existing key or produce 0 if key is not yet present.
First of all, a.update(b) updates a in place, and returns None.
Secondly, a.update(b) wouldn't help you to sum the keys; it would just produce a dictionary with the resulting dictionary having all the key, value pairs from b:
>>> a = {'alice':10, 'Bob':3, 'Carlie':1}
>>> b = {'alice':5, 'Bob':100, 'Carlie':1}
>>> a.update(b)
>>> a
{'alice': 5, 'Carlie': 1, 'Bob': 100}
It'd be easiest to use collections.Counter to achieve the desired result. As a plus, it does support addition with +:
from collections import Counter
def add(dicA, dicB):
return dict(Counter(dicA) + Counter(dicB))
This produces the intended result:
>>> print(add({'alice':10, 'Bob':3, 'Carlie':1}, {'alice':5, 'Bob':100, 'Carlie':1}))
{'alice': 15, 'Carlie': 2, 'Bob': 103}
The following is not meant to be the most elegant solution, but to get a feeling on how to deal with dicts.
dictA = {'Alice':10, 'Bob':3, 'Carlie':1}
dictB = {'Alice':5, 'Bob':100, 'Carlie':1}
# how to iterate through a dictionary
for k,v in dictA.iteritems():
print k,v
# make a new dict to keep tally
newdict={}
for d in [dictA,dictB]: # go through a list that has your dictionaries
print d
for k,v in d.iteritems(): # go through each dictionary item
if not k in newdict.keys():
newdict[k]=v
else:
newdict[k]+=v
print newdict
Output:
Bob 3
Alice 10
Carlie 1
{'Bob': 3, 'Alice': 10, 'Carlie': 1}
{'Bob': 100, 'Alice': 5, 'Carlie': 1}
{'Bob': 103, 'Alice': 15, 'Carlie': 2}
def ​add(​dicA,dicB):
You define a function that takes two arguments, dicA and dicB.
dicA = {}
dicB = {}
Then you assign an empty dictionary to both those variables, overwriting the dictionaries you passed to the function.
newdictionary = dicA.update(dicB)
Then you update dicA with the values from dicB, and assign the result to newdictionary. dict.update always returns None though.
And finally, you don’t return anything from the function, so it does not give you any results.
In order to combine those dictionaries, you actually need to use the values that were passed to it. Since dict.update mutates the dictionary it is called on, this would change one of those passed dictionaries, which we generally do not want to do. So instead, we use an empty dictionary, and then copy the values from both dictionaries into it:
def add (dicA, dicB):
newDictionary = {}
newDictionary.update(dicA)
newDictionary.update(dicB)
return newDictionary
If you want the values to sum up automatically, then use a Counter instead of a normal dictionary:
from collections import Counter
def add (dicA, dicB):
newDictionary = Counter()
newDictionary.update(dicA)
newDictionary.update(dicB)
return newDictionary
I suspect your professor wants to achieve this using more simple methods. But you can achieve this very easily using collections.Counter.
from collections import Counter
def add(a, b):
return dict(Counter(a) + Counter(b))
Your professor probably wants something like this:
def add(a, b):
new_dict = copy of a
for each key/value pair in b
if key in new_dict
add value to value already present in new_dict
else
insert key/value pair into new_dict
return new_dict
You can try this:
def add(dict1, dict2):
return dict([(key,dict1[key]+dict2[key]) for key in dict1.keys()])
I personally like using a dictionary's get method for this kind of merge:
def add(a, b):
result = {}
for dictionary in (a, b):
for key, value in dictionary.items():
result[key] = result.get(key, 0) + value
return result

Inverting a dictionary when some of the original values are identical

Say I have a dictionary called word_counter_dictionary that counts how many words are in the document in the form {'word' : number}. For example, the word "secondly" appears one time, so the key/value pair would be {'secondly' : 1}. I want to make an inverted list so that the numbers will become keys and the words will become the values for those keys so I can then graph the top 25 most used words. I saw somewhere where the setdefault() function might come in handy, but regardless I cannot use it because so far in the class I am in we have only covered get().
inverted_dictionary = {}
for key in word_counter_dictionary:
new_key = word_counter_dictionary[key]
inverted_dictionary[new_key] = word_counter_dictionary.get(new_key, '') + str(key)
inverted_dictionary
So far, using this method above, it works fine until it reaches another word with the same value. For example, the word "saves" also appears once in the document, so Python will add the new key/value pair just fine. BUT it erases the {1 : 'secondly'} with the new pair so that only {1 : 'saves'} is in the dictionary.
So, bottom line, my goal is to get ALL of the words and their respective number of repetitions in this new dictionary called inverted_dictionary.
A defaultdict is perfect for this
word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
from collections import defaultdict
d = defaultdict(list)
for key, value in word_counter_dictionary.iteritems():
d[value].append(key)
print(d)
Output:
defaultdict(<type 'list'>, {1: ['first'], 2: ['second', 'fourth'], 3: ['third']})
What you can do is convert the value in a list of words with the same key:
word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
inverted_dictionary = {}
for key in word_counter_dictionary:
new_key = word_counter_dictionary[key]
if new_key in inverted_dictionary:
inverted_dictionary[new_key].append(str(key))
else:
inverted_dictionary[new_key] = [str(key)]
print inverted_dictionary
>>> {1: ['first'], 2: ['second', 'fourth'], 3: ['third']}
Python dicts do NOT allow repeated keys, so you can't use a simple dictionary to store multiple elements with the same key (1 in your case). For your example, I'd rather have a list as the value of your inverted dictionary, and store in that list the words that share the number of appearances, like:
inverted_dictionary = {}
for key in word_counter_dictionary:
new_key = word_counter_dictionary[key]
if new_key in inverted_dictionary:
inverted_dictionary[new_key].append(key)
else:
inverted_dictionary[new_key] = [key]
In order to get the 25 most repeated words, you should iterate through the (sorted) keys in the inverted_dictionary and store the words:
common_words = []
for key in sorted(inverted_dictionary.keys(), reverse=True):
if len(common_words) < 25:
common_words.extend(inverted_dictionary[key])
else:
break
common_words = common_words[:25] # In case there are more than 25 words
Here's a version that doesn't "invert" the dictionary:
>>> import operator
>>> A = {'a':10, 'b':843, 'c': 39, 'd': 10}
>>> B = sorted(A.iteritems(), key=operator.itemgetter(1), reverse=True)
>>> B
[('b', 843), ('c', 39), ('a', 10), ('d', 10)]
Instead, it creates a list that is sorted, highest to lowest, by value.
To get the top 25, you simply slice it: B[:25].
And here's one way to get the keys and values separated (after putting them into a list of tuples):
>>> [x[0] for x in B]
['b', 'c', 'a', 'd']
>>> [x[1] for x in B]
[843, 39, 10, 10]
or
>>> C, D = zip(*B)
>>> C
('b', 'c', 'a', 'd')
>>> D
(843, 39, 10, 10)
Note that if you only want to extract the keys or the values (and not both) you should have done so earlier. This is just examples of how to handle the tuple list.
For getting the largest elements of some dataset an inverted dictionary might not be the best data structure.
Either put the items in a sorted list (example assumes you want to get to two most frequent words):
word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
counter_word_list = sorted((count, word) for word, count in word_counter_dictionary.items())
Result:
>>> print(counter_word_list[-2:])
[(2, 'second'), (3, 'third')]
Or use Python's included batteries (heapq.nlargest in this case):
import heapq, operator
print(heapq.nlargest(2, word_counter_dictionary.items(), key=operator.itemgetter(1)))
Result:
[('third', 3), ('second', 2)]

Categories

Resources