I have a dictionary of names and the number of times the names appear in the phone book:
names_dict = {
'Adam': 100,
'Anne': 400,
'Britney': 321,
'George': 645,
'Joe': 200,
'John': 1010,
'Mike': 500,
'Paul': 325,
'Sarah': 150
}
Preferably without using sorted(), I want to iterate through the dictionary and create a new dictionary that has the top five names only:
def sort_top_list():
# create dict of any 5 names first
new_dict = {}
for i in names_dict.keys()[:5]:
new_dict[i] = names_dict[i]:
# Find smallest current value in new_dict
# and compare to others in names_dict
# to find bigger ones; replace smaller name in new_dict with bigger name
for k,v in address_dict.iteritems():
current_smallest = min(new_dict.itervalues())
if v > current_smallest:
# Found a bigger value; replace smaller key/ value in new_dict with larger key/ value
new_dict[k] = v
# ?? delete old key/ value pair from new_dict somehow
I seem to be able to create a new dictionary that gets a new key/ value pair whenever we iterate through names_dict and find a name/ count that is higher than what we have in new_dict. I can't figure out, though, how to remove the smaller ones from new_dict after we add the bigger ones from names_dict.
Is there a better way - without having to import special libraries or use sorted() - to iterate through a dict and create a new dict of the top N keys with the highest values?
You should use the heapq.nlargest() function to achieve this:
import heapq
from operator import itemgetter
top_names = dict(heapq.nlargest(5, names_dict.items(), key=itemgetter(1)))
This uses a more efficient algorithm (O(NlogK) for a dict of size N, and K top items) to extract the top 5 items as (key, value) tuples, which are then passed to dict() to create a new dictionary.
Demo:
>>> import heapq
>>> from operator import itemgetter
>>> names_dict = {'Adam': 100, 'Anne': 400, 'Britney': 321, 'George': 645, 'Joe': 200, 'John': 1010, 'Mike': 500, 'Paul': 325, 'Sarah': 150}
>>> dict(heapq.nlargest(5, names_dict.items(), key=itemgetter(1)))
{'John': 1010, 'George': 645, 'Mike': 500, 'Anne': 400, 'Paul': 325}
You probably want to use the collections.Counter() class instead. The Counter.most_common() method would have made your use-case trivial to solve. The implementation for that method uses heapq.nlargest() under the hood.
These are not special libraries, they are part of the Python standard library. You otherwise would have to implement a binary heap yourself to achieve this. Unless you are specifically studying this algorithm, there is little point in re-implementing your own, the Python implementation is highly optimised with an extension written in C for some critical functions).
I do not know, why you don't want to use sort and the solution is not perfect and even doesn't match your problem exactly, but I hope it can inspire you to find your own implementation. I think it was only a short example for the real Problem you have.
But as you have seen on the other answer: Normally it is better to use code, that is written before instead of do all the things yourself.
names_dict = {'Joe' : 200, 'Anne': 400, 'Mike': 500, 'John': 1010, 'Sarah': 150, 'Paul': 325, 'George' : 645, 'Adam' : 100, 'Britney': 321}
def extract_top_n(dictionary, count):
#first step: Find the topmost values
highest_values = []
for k,v in dictionary.iteritems():
print k,v, highest_values, len(highest_values)
highest_values.append(v)
l = len(highest_values)
for i in range(l-1):
print i,l
if l-i < 1:
break
if highest_values[l-i-1]>highest_values[l-i-2]:
temp = highest_values[l-i-2]
highest_values[l-i-2] = highest_values[l-i-1]
highest_values[l-i-1] = temp
highest_values = highest_values [:count]
#fill the dirctionary with all entries at least as big as the smallest of the biggest
#but pay attention: If there are more than 2 occurances of one of the top N there will be more than N entries in the dictionary
last_interesting = highest_values[len(highest_values)-1]
return_dictionary = {}
for k,v in dictionary.iteritems():
if v >= last_interesting:
return_dictionary[k] = v
return return_dictionary
print extract_top_n(names_dict,3)
Related
I have two parallel lists of data like:
genres = ["classic", "pop", "classic", "classic", "pop"]
plays = [500, 600, 150, 800, 2500]
I want to get this result:
album = {"classic":{0:500, 2:150, 3:800}, "pop":{1:600, 4:2500}} # want to make
So I tried this code:
album = dict.fromkeys(genres,dict())
# album = {'classic': {}, 'pop': {}}
for i in range(len(genres)):
for key,value in album.items():
if genres[i] == key:
album[key].update({i:plays[i]})
The result for album is wrong. It looks like
{'classic': {0: 500, 1: 600, 2: 150, 3: 800, 4: 2500},
'pop': {0: 500, 1: 600, 2: 150, 3: 800, 4: 2500}}
That is, every plays value was added for both of the genres, instead of being added only to the genre that corresponds to the number.
Why does this occur? How can I fix the problem?
Try replacing album = dict.fromkeys(genres,dict()) with
album = {genre: {} for genre in genres}
The reason why your dict.fromkeys does not work is documented in the doc:
fromkeys() is a class method that returns a new dictionary. value defaults to None. All of the values refer to just a single instance, so it generally doesn’t make sense for value to be a mutable object such as an empty list. To get distinct values, use a dict comprehension instead.
That is, when you write album = dict.fromkeys(genres,dict()), album['classic'] and album['pop'] both are the same object. As you add new items to either one of them, it is applied to the other (because they are the same object).
Alternatively, you can use defaultdict and zip:
from collections import defaultdict
genres = ["classic", "pop", "classic", "classic", "pop"]
plays = [500, 600, 150, 800, 2500]
album = defaultdict(dict)
for i, (genre, play) in enumerate(zip(genres, plays)):
album[genre][i] = play
print(dict(album))
# {'classic': {0: 500, 2: 150, 3: 800}, 'pop': {1: 600, 4: 2500}}
The dict(album) is redundant in most cases; you can use album like a dict.
Use:
In [1059]: d = {}
In [1060]: for c,i in enumerate(genres):
...: if i in d:
...: d[i].update({c:plays[c]})
...: else:
...: d[i] = {c:plays[c]}
...:
In [1061]: d
Out[1061]: {'classic': {0: 500, 2: 150, 3: 800}, 'pop': {1: 600, 4: 2500}}
There are two issues here: first off, the for key,value in album.items(): loop is redundant, although this will not cause a problem because dictionaries have unique keys - you will store every key-value pair twice, but the second time will just replace the first.
The important problem is that after album = dict.fromkeys(genres,dict()), the two values in album will be the same dictionary. dict() happens before the call to dict.fromkeys, and the resulting object is passed in. dict.fromkeys() uses that same object as the value for each key - it does not make a copy.
To solve this, use a dict comprehension to create the dictionary instead:
album = {g: {} for g in genres}
This is an analogous problem to List of lists changes reflected across sublists unexpectedly, except that instead of a list-of-lists it is a dict-with-dict values, and instead of creating the problematic data by multiplication we create it with a method. The underlying logic is the same, however, and the natural solution works in the same way as well.
Another approach is to create the key-value pairs in album only when they are first needed, by checking for their presence first.
Yet another approach is to use a tool that automates that on-demand creation - for example, defaultdict from the standard library collections module`. That way looks like:
from collections import defaultdict
# other code until we get to:
album = defaultdict(dict)
# whenever we try `album[k].update(v)`, if there is not already an
# `album[k]`, it will automatically create `album[k] = dict()` first
# - with a new dictionary, created just then.
#j1-lee answered it correctly, but just in case you want to avoid defaultdict and go with primitive dictionary here is the code.
genres = ["classic", "pop", "classic", "classic", "pop"]
plays = [500, 600, 150, 800, 2500]
all_genres_plays = zip(genres, plays)
album = {}
for index, single_genre_play in enumerate(all_genres_plays):
genre, play = single_genre_play
if genre not in album:
album[genre] = {}
album[genre][index] = play
print(album)
output:
{'classic': {0: 500, 2: 150, 3: 800}, 'pop': {1: 600, 4: 2500}}
I would like to take a nested list such as:
list = [[Name, Height, Weight],[Dan, 175, 75],[Mark, 165, 64], [Sam, 183, 83]]
and convert it into a dictionary like:
dict = {Name: [Dan,Mark, Sam], Height: [175, 165, 183], Weight: [75, 64, 83]}
my current code is unfortunately not really giving me the dictionary format I'm looking for.
i = 1
z = 0
for items in list[0]:
dict[items] = [list[i][z]]
i += 1
z += 1
can someone please assist me and find where I'm going wrong?
Separate the keys and the rest first, then construct the dictionary with zip:
keys, *rest = list_of_lists
out = dict(zip(keys, zip(*rest)))
where list_of_lists is what you called list (but please refrain from that as it shadows the builtin list). First * is slurping all the lists starting from second one. The second * in zip kind of transposes the lists to reorder them
to get
>>> out
{"Name": ("Dan", "Mark", "Sam"),
"Height": (175, 165, 183),
"Weight": (75, 64, 83)}
this gives tuples in the values but to get lists, you can map:
out = dict(zip(keys, map(list, zip(*rest))))
Welcome to stackoverflow :)
We seldom use i++ i+=1 for loop count or step in python if we can easily use for i in ... even if we don't know how many list in it.
your original data is a list of list. first list is the key of dictionary, other list is each records.
we often use zip(*your_list) when your data (list in list) is equal length. zip function will help you rearrange your_list. the * in front of your_list means put each record in your_list to zip function's argument one by one
then put it in a for loop, like for rec in zip(list):.
so, you can write your code like:
your_dict = {}
for rec in zip(yout_list):
k = rec[0] #dict key
v = list(rec[1:]) #dict value, convert to list if needed
your_dict[k] = v # set key and value
e.g.
that's it!
#Mustafa's answer is the most concise but if you are a beginner you might find it easier to break it down into steps.
data =[
['Name', 'Height', 'Weight'],
['Dan', 175, 75], ['Mark', 165, 64], ['Sam', 183, 83]
]
keys = data[0]
values = [list(items) for items in zip(*data[1:])]
results = dict(zip(keys, values))
The goal of the function is to make a grade adjustment based off of a dictionary and list. For instance
def adjust_grades(roster, grade_adjustment)
adjust_grades({'ann': 75, 'bob': 80}, [5, -5])
will return
{'ann': 80, 'bob': 75}
I just need a nudge in the right direction, I'm new to Python so I thought to put a nested for loop for each num in grade_adjustment but its not the right way.
Assuming Python 3.7 (ordered dicts) and the length of the adjustments match the length of the items in the dictionary, you can zip them together as follows:
for name, adjustment_amount in zip(roster, grade_adjustment):
roster[name] += adjustment_amount
>>> roster
{'ann': 80, 'bob': 75}
This is making several assumptions:
the dictionary and the list have the same length (your final code should make sure they do)
you are using a version of python in which the order of the dictionary keys is preserved (if not, you can make grade_adjustment a dictionary as well, as mentioned by other comments)
result = roster.copy()
for index, key in enumerate(roster):
result[key] += grade_adjustment[index]
You can use
def adjust_grades(roster, grade_adjustment):
for k, v in enumerate(grade_adjustment):
roster[list(roster.keys())[k]] = roster[list(roster.keys())[k]] + v
return roster
This gives output as you said {'ann': 80, 'bob': 75}
assuming 3.7 or ordered dict and equal length:
def adjust_grades(roster, grade_adjustment):
return {key:value + adjustment for (key, value), adjustment in
zip(roster.items(), grade_adjustment)}
print(adjust_grades({'ann': 75, 'bob': 80}, [5, -5]))
I have a dictionary with an int as value for each key. I also have total stored in a variable. I want to obtain a percentage that each value represent for the variable and return the percentage to the dictionary as another value for the same key.
I tried to extract the values in a list, then do the operation and append the results to another list. But I don't know how to append that list to the dictionary.
total = 1000
d = {"key_1":150, "key_2":350, "key_3":500}
lst = list(d.values())
percentages = [100 * (i/total) for i in lst]
# Desired dictionary
d
{"key_1": [15%, 150],
"key_2": [35%, 350],
"key_3": [50%, 500]
}
You're better off avoiding the intermediate list and just updating each key as you go:
total = 1000
d = {"key_1":150, "key_2":350, "key_3":500}
for k, v in d.items():
d[k] = [100 * (v / total), v]
While it's technically possible to zip the dict's keys with the values of the list, as long as the keys aren't changed and the list order is kept in line with the values extracted from the dict, the resulting code would reek of code smell, and it's just easier to avoid that list entirely anyway.
Note that this won't put a % sign in the representation, because there is no such thing as a percentage type. The only simple way to shove one in there would be to store it as a string, not a float, e.g. replacing the final line with:
d[k] = [f'{100 * (v / total)}%', v]
to format the calculation as a string, and shove a % on the end.
Here
total = 1000
d = {"key_1": 150, "key_2": 350, "key_3": 500}
d1 = {k: ['{}%'.format(v * 100 / 1000),v] for k, v in d.items()}
print(d1)
output
{'key_1': ['15.0%', 150], 'key_2': ['35.0%', 350], 'key_3': ['50.0%', 500]}
I have a dictionary of "documents" in python with document ID numbers as keys and dictionaries (again) as values. These internal dictionaries each have a 'weight' key that holds a floating-point value of interest. In other words:
documents[some_id]['weight'] = ...
What I want to do is obtain a list of my document IDs sorted in descending order of the 'weight' value. I know that dictionaries are inherently unordered (and there seem to be a lot of ways to do things in Python), so what is the most painless way to go? It feels like kind of a messy situation...
I would convert the dictionary to a list of tuples and sort it based on weight (in reverse order for descending), then just remove the objects to get a list of the keys
l = documents.items()
l.sort(key=lambda x: x[1]['weight'], reverse=True)
result = [d[0] for d in l]
I took the approach that you might want the keys as well as the rest of the object:
# Build a random dictionary
from random import randint
ds = {} # A |D|ata |S|tructure
for i in range(20,1,-1):
ds[i]={'weight':randint(0,100)}
sortedDS = sorted(ds.keys(),key=lambda x:ds[x]['weight'])
for i in sortedDS :
print i,ds[i]['weight']
sorted() is a python built in that takes a list and returns it sorted (obviously), however it can take a key value that it uses to determine the rank of each object. In the above case it uses the 'weight' value as the key to sort on.
The advantage of this over Ameers answer is that it returns the order of keys rather than the items. Its an extra step, but it means you can refer back into the original data structure
This seems to work for me. The inspiration for it came from OrderedDict and question #9001509
from collections import OrderedDict
d = {
14: {'weight': 90},
12: {'weight': 100},
13: {'weight': 101},
15: {'weight': 5}
}
sorted_dict = OrderedDict(sorted(d.items(), key=lambda rec: rec[1].get('weight')))
print sorted_dict