How to get a set of keys with largest values? - python

I am working on a function
def common_words(dictionary, N):
if len(dictionary) > N:
max(dictionary, key=dictionary.get)
Description of the function is:
The first parameter is the dictionary of word counts and the second is
a positive integer N. This function should update the dictionary so
that it includes the most common (highest frequency words). At most N
words should be included in the dictionary. If including all words
with some word count would result in a dictionary with more than N
words, then none of the words with that word count should be included.
(i.e., in the case of a tie for the N+1st most common word, omit all
of the words in the tie.)
So I know that I need to get the N items with the highest values but I am not sure how to do that. I also know that once I get N items that if there are any duplicate values that I need to pop them out.
For example, given
k = {'a':5, 'b':4, 'c':4, 'd':1}
then
common_words(k, 2)
should modify k so that it becomes {'a':5}.

Here's my algorithm for this problem.
Extract the data from the dictionary into a list and sort it in descending order on the dictionary values.
Clear the original dictionary.
Group the sorted data into groups that have the same value.
Re-populate the dictionary with the all (key, value) pairs from each group in the sorted list if that will keep the total dictionary size <= N. If adding a group would make the total dictionary size > N, then return.
The grouping operation can be easily done using the standard itertools.groupby function.
To perform the sorting and grouping we need an appropriate key function, as described in the groupby, list and sorted docs. Since we need the second item of each tuple we could use
def keyfunc(t):
return t[1]
or
keyfunc = lambda t: t[1]
but it's more efficient to use operator.itemgetter.
from operator import itemgetter
from itertools import groupby
def common_words(d, n):
keyfunc = itemgetter(1)
lst = sorted(d.items(), key=keyfunc, reverse=True)
d.clear()
for _, g in groupby(lst, key=keyfunc):
g = list(g)
if len(d) + len(g) <= n:
d.update(g)
else:
break
# test
data = {'a':5, 'b':4, 'c':4, 'd':1}
common_words(data, 4)
print(data)
common_words(data, 2)
print(data)
output
{'c': 4, 'd': 1, 'b': 4, 'a': 5}
{'a': 5}

my algorithm as below
1st build tuple list from dictionary sorted based on value from
largest to smallest
check for if item[N-1] match item[N] value, if yes, drop item[N-1]
(index start from 0, so -1 there)
finally, convert the slice of tuple list up to N element back to
dict, may change to use OrderedDict here if wanna retain the items order
it will just return the dictionary as it is if the dictionary length is less than N
def common_words(dictionary, N):
if len(dictionary) > N:
tmp = [(k,dictionary[k]) for k in sorted(dictionary, key=dictionary.get, reverse=True)]
if tmp[N-1][1] == tmp[N][1]:
N -= 1
return dict(tmp[:N])
# return [i[0] for i in tmp[:N]] # comment line above and uncomment this line to get keys only as your title mention how to get keys
else:
return dictionary
# return dictionary.keys() # comment line above and uncomment this line to get keys only as your title mention how to get keys
>>> common_words({'a':5, 'b':4, 'c':4, 'd':1}, 2)
{'a': 5}
OP wanna modify input dictionary within function and return None, it can be modified as below
def common_words(dictionary, N):
if len(dictionary) > N:
tmp = [(k,dictionary[k]) for k in sorted(dictionary, key=dictionary.get, reverse=True)]
if tmp[N-1][1] == tmp[N][1]:
N -= 1
# return dict(tmp[:N])
for i in tmp[N:]:
dictionary.pop(i[0])
>>> k = {'a':5, 'b':4, 'c':4, 'd':1}
>>> common_words(k, 2)
>>> k
{'a': 5}

Related

Add values of same key in a dictionary

My input is a list and I want to convert it into a dictionary and add all values for the same keys. like in the give example in a random list k has two values 1 and 3 so value of k in dictionary will be {'k':4,'D':2}. And then sort it in alphabetic order
Input: ['k:1','D:2','k:3']
Output {'k':4,'D':2}
dlist = ['k:1','D:2','k:3']
dick ={}
for x in dlist:
key,value = x.split(':')
dick[key] = int(value)
print(dick)
I have the above code but I don't know how to add two values for k?
You need to actually add the value to the pre-existing value. In your code now you just overwrite the old value with the new one.
dick = {}
for x in dlist:
key, value = x.split(':')
dick[key] = dick.get(key, 0) + int(value)
dict.get(key, 0) gets the value of the key in the dictionary, with a default of zero
This is possible by making dick a defaultdict. So, now you can just += the value.
from collections import defaultdict
dlist = ['k:1','D:2','k:3']
dick = defaultdict(int)
for x in dlist:
key, value = x.split(':')
dick[key] += int(value)
print(dick)
You can use groupby from itertools to group same values and then sum.
import itertools
a=['k:1','D:2','k:3']
print({k:sum([int(j[1]) for j in v]) for k,v in itertools.groupby([i.split(":") for i in sorted(a)], lambda x: x[0])})
Output
{'D': 2, 'k': 4}

Python: List of pairs. Making every pair single and sum the values of the same keys

I have a list of pairs.The list contains items of [x,y].I would like to make list or dictionary making the left item the key and right the value.The list maybe contains multiple times the same key. I want to sum the values and keep one time the key.
E.x
pairs[0]=['3106124650', 2.86]
pairs[1]=['3106124650', 8.86]
pairs[2]=['5216154610', 23.77]
I want to keep '3106124650' one time and sum the values.So my new list or dictionary will contain one time this key with value 11.72.
'3106124650',11.72
Here's a way. For large datasets, numpy will probably be faster though.
import collections
result = collections.defaultdict(lambda : 0)
for k,v in pairs:
result[k]+=v
sumdict = dict()
for i, v in pairs:
sumdict[i] = v + sumdict.get(i, 0)
li=[['a',1],['a',2],['b',3],['c',4]]
d={}
for w in li:
d[w[0]]=w[1]+d.get(w[0],0)
Output:{'a': 3, 'b': 3, 'c': 4}
you can try this:
d={}
for entry in pairs:
if entry[0] in d:
d[entry[0]]+=entry[1]
else:
d[entry[0]]=entry[1]

How do I iterate over entire dictionary keys?

Diction is a dictionary containing keys and values. I want to iterate over dictionary keys in which I will return an array of keys with values less than or equal to 20. But, I am only iterating over one key. How do I iterate over the entire dictionary keys?
def total(diction):
for key in diction:
if diction[key] <= 20:
return [key]
Your function finishes execution when it hits the first return statement.
You can adjust your function like this.
def total(diction):
result = []
for key, value in diction.items():
if value <= 20:
result.append(key)
return result
This function appends the keys satisfying your criterion to a list and only returns that list once it has looked at all (key, value) pairs in the dict.
Alternatively, you can write a generator function:
def total_gen(diction):
for key, value in diction.items():
if value <= 20:
yield key
You might consider giving the functions a better name than total and have them take an additional parameter (for example named limit) in order avoid hardcoding the value 20.
Demo:
>>> d = {'a': 5, 'b': 100, 'c': 23, 'd': -2}
>>> total(d)
>>> ['d', 'a']
>>> list(total_gen(d))
>>> ['d', 'a']
Of course, you could also use succinct list or generator expressions:
>>> [key for key, value in d.items() if value <= 20]
>>> ['d', 'a']
>>>
>>> for k in (key for key, value in d.items() if value <= 20):
...: print(k)
...:
d
a
The generator-function and the generator expression are especially useful in cases where you don't need all the keys in memory at once - for example if you just want to iterate over them.
You can iterate over the dict items with a list comprehension like this:
def total(diction):
return [key for key, value in diction.items() if value <= 20]
def total(diction):
result = []
for key in diction:
if diction[key] <= 20:
result.append(key)
return result
Or using a list comprehension
def total(diction):
return [key for key in diction if diction[key] <= 20]
You need to save the keys to a list, then only return after you have gone through all the keys.
def total(diction):
key_list = []
for key in diction:
if diction[key] <= 20:
key_list.append(key)
return key_list
This could also be done with list comprehension:
def total(diction):
return [k for k in diction if diction[k]<=20]
I am not totally sure if I understood your question correctly. But you can get a list of keys of a dictionary object with ".keys()" attribute of a dictionary. E.g. if you have a dictionary object named "diction" then with diction.keys() you get a list object with keys as list item in it. Then you can iterate through each keys like that:
For dict_key in diction.keys():
diction[dict_key] = ......
Hope, it helps
What do you mean by 'iterating over one key'? You are indeed iterating over all the keys when you write this, one at a time -
for key in diction:
If you want to do that in a single line, you could use a lambda function to shorten the code -
list(filter(lambda x: dic[x]<=20, dic.keys()))
Although, even here you check the keys one at a time. That's iteration!!

How to quickly get a list of keys from dict

I construct a dictionary from an excel sheet and end up with something like:
d = {('a','b','c'): val1, ('a','d'): val2}
The tuples I use as keys contain a handful of values, the goal is to get a list of these values which occur more than a certain number of times.
I've tried two solutions, both of which take entirely too long.
Attempt 1, simple list comprehension filter:
keyList = []
for k in d.keys():
keyList.extend(list(k))
# The script makes it to here before hanging
commonkeylist = [key for key in keyList if keyList.count(key) > 5]
This takes forever since list.count() traverses the least on each iteration of the comprehension.
Attempt 2, create a count dictionary
keyList = []
keydict = {}
for k in d.keys():
keyList.extend(list(k))
# The script makes it to here before hanging
for k in keyList:
if k in keydict.keys():
keydict[k] += 1
else:
keydict[k] = 1
commonkeylist = [k for k in keyList if keydict[k] > 50]
I thought this would be faster since we only traverse all of keyList a handful of times, but it still hangs the script.
What other steps can I take to improve the efficiency of this operation?
Use collections.Counter() and a generator expression:
from collections import Counter
counts = Counter(item for key in d for item in key)
commonkkeylist = [item for item, count in counts.most_common() if count > 50]
where iterating over the dictionary directly yields the keys without creating an intermediary list object.
Demo with a lower count filter:
>>> from collections import Counter
>>> d = {('a','b','c'): 'val1', ('a','d'): 'val2'}
>>> counts = Counter(item for key in d for item in key)
>>> counts
Counter({'a': 2, 'c': 1, 'b': 1, 'd': 1})
>>> [item for item, count in counts.most_common() if count > 1]
['a']
I thought this would be faster since we only traverse all of keyList a
handful of times, but it still hangs the script.
That's because you're still doing an O(n) search. Replace this:
for k in keyList:
if k in keydict.keys():
with this:
for k in keyList:
if k in keydict:
and see if that helps your 2nd attempt perform better.

Remove the smallest element(s) from a dictionary

I have a function such that there is a dictionary as parameters, with the value associated to be an integer. I'm trying to remove the minimum element(s) and return a set of the remaining keys.
I am programming in python. I cant seem to remove key value pairs with the same key or values. My code does not work for the 2nd and 3rd example
This is how it would work:
remaining({A: 1, B: 2, C: 2})
{B, C}
remaining({B: 2, C : 2})
{}
remaining({A: 1, B: 1, C: 1, D: 4})
{D}
This is what I have:
def remaining(d : {str:int}) -> {str}:
Remaining = set(d)
Remaining.remove(min(d, key=d.get))
return Remaining
One approach is to take the minimum value, then build a list of keys that are equal to it and utilise dict.viewkeys() which has set-like behaviour and remove the keys matching the minimum value from it.
d = {'A': 1, 'B': 1, 'C': 1, 'D': 4}
# Use .values() and .keys() and .items() for Python 3.x
min_val = min(d.itervalues())
remaining = d.viewkeys() - (k for k, v in d.iteritems() if v == min_val)
# set(['D'])
On a side note, I find it odd that {B: 2, C : 2} should be {} as there's not actually anything greater for those to be the minimum as it were.
That's because you're trying to map values to keys and map allows different keys to have the same values but not the other way! you should implement a map "reversal" as described here, remove the minimum key, and then reverse the map back to its original form.
from collections import defaultdict
# your example
l = {'A': 1, 'B': 1, 'C': 1, 'D': 4}
# reverse the dict
d1 = {}
for k, v in l.iteritems():
d1[v] = d1.get(v, []) + [k]
# remove the min element
del d1[min(d1, key=d1.get)]
#recover the rest to the original dict minus the min
res = {}
for k, v in d1.iteritems():
for e in v:
res[e] = k
print res
Comment:
#Jon Clements's solution is more elegant and should be accepted as the answer
Take the minimum value and construct a set with all the keys which are not associated to that value:
def remaining(d):
m = min(d.values())
return {k for k,v in d.items() if v != m}
If you don't like set comprehensions that's the same as:
def remaining(d):
m = min(d.values())
s = set()
for k,v in d.items():
if v != m:
s.add(k)
return s
This removes all the items with the minimum value.
import copy
def remaining(dic):
minimum = min([i for i in dic.values()])
for k, v in copy.copy(dic.items()):
if v == minimum: dic.pop(k)
return set(dic.keys())
An easier way would be to use pd.Series.idxmin() or pd.Series.min(). These functions allow you to find the index of the minimum value or the minimum value in a series, plus pandas allows you to create a named index.
import pandas as pd
import numpy as np
A = pd.Series(np.full(shape=5,fill_value=0))#create series of 0
A = A.reindex(['a','b','c','d','e'])#set index, similar to dictionary names
A['a'] = 2
print(A.max())
#output 2.0
print(A.idxmax())#you can also pop by index without changing other indices
#output a

Categories

Resources