Random Python dictionary key, weighted by values - python

I have a dictionary where each key has a list of variable length, eg:
d = {
'a': [1, 3, 2],
'b': [6],
'c': [0, 0]
}
Is there a clean way to get a random dictionary key, weighted by the length of its value?
random.choice(d.keys()) will weight the keys equally, but in the case above I want 'a' to be returned roughly half the time.

This would work:
random.choice([k for k in d for x in d[k]])

Do you always know the total number of values in the dictionary? If so, this might be easy to do with the following algorithm, which can be used whenever you want to make a probabilistic selection of some items from an ordered list:
Iterate over your list of keys.
Generate a uniformly distributed random value between 0 and 1 (aka "roll the dice").
Assuming that this key has N_VALS values associated with it and there are TOTAL_VALS total values in the entire dictionary, accept this key with a probability N_VALS / N_REMAINING, where N_REMAINING is the number of items left in the list.
This algorithm has the advantage of not having to generate any new lists, which is important if your dictionary is large. Your program is only paying for the loop over K keys to calculate the total, a another loop over the keys which will on average end halfway through, and whatever it costs to generate a random number between 0 and 1. Generating such a random number is a very common application in programming, so most languages have a fast implementation of such a function. In Python the random number generator a C implementation of the Mersenne Twister algorithm, which should be very fast. Additionally, the documentation claims that this implementation is thread-safe.
Here's the code. I'm sure that you can clean it up if you'd like to use more Pythonic features:
#!/usr/bin/python
import random
def select_weighted( d ):
# calculate total
total = 0
for key in d:
total = total + len(d[key])
accept_prob = float( 1.0 / total )
# pick a weighted value from d
n_seen = 0
for key in d:
current_key = key
for val in d[key]:
dice_roll = random.random()
accept_prob = float( 1.0 / ( total - n_seen ) )
n_seen = n_seen + 1
if dice_roll <= accept_prob:
return current_key
dict = {
'a': [1, 3, 2],
'b': [6],
'c': [0, 0]
}
counts = {}
for key in dict:
counts[key] = 0
for s in range(1,100000):
k = select_weighted(dict)
counts[k] = counts[k] + 1
print counts
After running this 100 times, I get select keys this number of times:
{'a': 49801, 'c': 33548, 'b': 16650}
Those are fairly close to your expected values of:
{'a': 0.5, 'c': 0.33333333333333331, 'b': 0.16666666666666666}
Edit: Miles pointed out a serious error in my original implementation, which has since been corrected. Sorry about that!

Without constructing a new, possibly big list with repeated values:
def select_weighted(d):
offset = random.randint(0, sum(d.itervalues())-1)
for k, v in d.iteritems():
if offset < v:
return k
offset -= v

Given that your dict fits in memory, the random.choice method should be reasonable. But assuming otherwise, the next technique is to use a list of increasing weights, and use bisect to find a randomly chosen weight.
>>> import random, bisect
>>> items, total = [], 0
>>> for key, value in d.items():
total += len(value)
items.append((total, key))
>>> items[bisect.bisect_left(items, (random.randint(1, total),))][1]
'a'
>>> items[bisect.bisect_left(items, (random.randint(1, total),))][1]
'c'

Make a list in which each key is repeated a number of times equal to the length of its value. In your example: ['a', 'a', 'a', 'b', 'c', 'c']. Then use random.choice().
Edit: or, less elegantly but more efficiently, try this: take the sum of the lengths of all values in the dictionary, S (you can cache and invalidate this value, or keep it up to date as you edit the dictionary, depending on the exact usage pattern you anticipate). Generate a random number from 0 to S, and do a linear search through the dictionary keys to find the range into which your random number falls.
I think that's the best you can do without changing or adding to your data representation.

Here is some code that is based on a previous answer I gave for probability distribution in python but is using the length to set the weight. It uses an iterative markov chain so that it does not need to know what the total of all of the weights are. Currently it calculates the max length but if that is too slow just change
self._maxw = 1
to
self._maxw = max lenght
and remove
for k in self._odata:
if len(self._odata[k])> self._maxw:
self._maxw=len(self._odata[k])
Here is the code.
import random
class RandomDict:
"""
The weight is the length of each object in the dict.
"""
def __init__(self,odict,n=0):
self._odata = odict
self._keys = list(odict.keys())
self._maxw = 1 # to increase speed set me to max length
self._len=len(odict)
if n==0:
self._n=self._len
else:
self._n=n
# to increase speed set above max value and comment out next 3 lines
for k in self._odata:
if len(self._odata[k])> self._maxw:
self._maxw=len(self._odata[k])
def __iter__(self):
return self.next()
def next(self):
while (self._len > 0) and (self._n>0):
self._n -= 1
for i in range(100):
k=random.choice(self._keys)
rx=random.uniform(0,self._maxw)
if rx <= len(self._odata[k]): # test to see if that is the value we want
break
# if you do not find one after 100 tries then just get a random one
yield k
def GetRdnKey(self):
for i in range(100):
k=random.choice(self._keys)
rx=random.uniform(0,self._maxw)
if rx <= len(self._odata[k]): # test to see if that is the value we want
break
# if you do not find one after 100 tries then just get a random one
return k
#test code
d = {
'a': [1, 3, 2],
'b': [6],
'c': [0, 0]
}
rd=RandomDict(d)
dc = {
'a': 0,
'b': 0,
'c': 0
}
for i in range(100000):
k=rd.GetRdnKey()
dc[k]+=1
print("Key count=",dc)
#iterate over the objects
dc = {
'a': 0,
'b': 0,
'c': 0
}
for k in RandomDict(d,100000):
dc[k]+=1
print("Key count=",dc)
Test results
Key count= {'a': 50181, 'c': 33363, 'b': 16456}
Key count= {'a': 50080, 'c': 33411, 'b': 16509}

I'd say this:
random.choice("".join([k * len(d[k]) for k in d]))
This makes it clear that each k in d gets as many chances as the length of its value. Of course, it is relying on dictionary keys of length 1 that are characters....
Much later:
table = "".join([key * len(value) for key, value in d.iteritems()])
random.choice(table)

I modified some of the other answers to come up with this. It's a bit more configurable. It takes 2 arguments, a list and a lambda function to tell it how to generate a key.
def select_weighted(lst, weight):
""" Usage: select_weighted([0,1,10], weight=lambda x: x) """
thesum = sum([weight(x) for x in lst])
if thesum == 0:
return random.choice(lst)
offset = random.randint(0, thesum - 1)
for k in lst:
v = weight(k)
if offset < v:
return k
offset -= v
Thanks to sth for the base code for this.

import numpy as np
my_dict = {
"one": 5,
"two": 1,
"three": 25,
"four": 14
}
probs = []
elements = [my_dict[x] for x in my_dict.keys()]
total = sum(elements)
probs[:] = [x / total for x in elements]
r = np.random.choice(len(my_dict), p=probs)
print(list(my_dict.values())[r])
# 25

Need to mention random.choices for Python 3.6+:
import random
raffle_dict = {"Person 1": [1,2], "Person 2": [1]}
random.choices(list(raffle_dict.keys()), [len(w[1]) for w in raffle_dict.items()], k=1)[0]
random.choices returns a list of samples, so k=1 if you only need one and we'll take the first item in the list. If your dictionary already has the weights, just get rid of the len or better yet:
raffle_dict = {"Person 1": 1, "Person 2": 10}
random.choices(list(raffle_dict.keys()), raffle_dict.values(), k=1)[0]
See also this question and this tutorial,

Related

how can I get the first key of the second biggest value from a dictionary?

I want to get the first key from the second biggest value given. With this program I can get the first key with the biggest value given, but I don't know how to tell it that I want the second. On the other hand, I have to be able to work with None values and it doesn't pass them, what can I do?
def f(x):
"""Return the first ascending KEY ordering of a dictionary
based on the second biggest value that is given in a dictionary
biggest value that is given in a dictionary"""
dict_face = x
d = max(dict_face, key=dict_face.get)
print(d)
####
asserts
####
f({'a': 1, 'b': 2, 'c': 2, 'd': 500}) == 'b'
f({'a': 0, 'b': 0, 'c': 2, 'd': 500}) == 'c'
f({'a': None, 'b': None, 'c': None, 'd': 500}) == None
###
output:
d
d
File "/Users/[...]/dict_biggest_value.py", line 7, in f
d = max(dict_face, key=dict_face.get)
TypeError: '>' not supported between instances of 'NoneType' and 'NoneType'
Thanks!
I don't think 'max' will give you the functionality you want. Additionally I'd think that you have to think about what happens if you only have, say, 1 or 0 values. You can also think about how you want to handle None. I'd personally use it as a "negative infinity", that is, the smallest possible value.
I'd think that you can write something like this:
def second_to_max(input):
# in here, you track both the max value and the next largest value
max = None
s_max = None
retVal = ''
for key, val in input.items():
# check if val is greater than max,
# if so assign max to s_max and val to max
# and key to retVal
# You'd have special cases for if max and s_max are None,
# since you can't actually compare None to an integer
return retVal
List all integer values (removing None).
Then sort values and find nth maximum (ie n = 1 is max value, n = 2
is second largest, etc).
Extract first key value corresponding to target max.
def f(input_dict, nth_largest_value):
"""Return nth largest value when sorting values in ascending order"""
# get only unique integer values
dict_values = list(set([x for x in input_dict.values() if type(x) == int]))
dict_values = sorted(dict_values) # ascending order
if len(dict_values) >= nth_largest_value:
for key in input_dict:
if input_dict[key] == dict_values[-nth_largest_value]:
return key
Input: f({'a': None, 'b': None, 'c': 400, 'd': 500}, 2)
Output: 'c'
What if the max value is repeated?
That being said, one approach could be:
values_list = list(dict_face.values())
largest = max(values_list)
values_list.remove(largest)
second_largest = max(values_list)
for k,v in dict_face.items():
if v == second_largest:
return k

Is there a way to randomly shuffle keys and values in a Python Dictionary, but the result can't have any of the original key value pairs?

I would like to shuffle the key value pairs in this dictionary so that the outcome has no original key value pairs. Starting dictionary:
my_dict = {'A':'a',
'K':'k',
'P':'p',
'Z':'z'}
Example of unwanted outcome:
my_dict_shuffled = {'Z':'a',
'K':'k', <-- Original key value pair
'A':'p',
'P':'z'}
Example of wanted outcome:
my_dict_shuffled = {'Z':'a',
'A':'k',
'K':'p',
'P':'z'}
I have tried while loops and for loops with no luck. Please help! Thanks in advance.
Here's a fool-proof algorithm I learned from a Numberphile video :)
import itertools
import random
my_dict = {'A': 'a',
'K': 'k',
'P': 'p',
'Z': 'z'}
# Shuffle the keys and values.
my_dict_items = list(my_dict.items())
random.shuffle(my_dict_items)
shuffled_keys, shuffled_values = zip(*my_dict_items)
# Offset the shuffled values by one.
shuffled_values = itertools.cycle(shuffled_values)
next(shuffled_values, None) # Offset the values by one.
# Guaranteed to have each value paired to a random different key!
my_random_dict = dict(zip(shuffled_keys, shuffled_values))
Disclaimer (thanks for mentioning, #jf328): this will not generate all possible permutations! It will only generate permutations with exactly one "cycle". Put simply, the algorithm will never give you the following outcome:
{'A': 'k',
'K': 'a',
'P': 'z',
'Z': 'p'}
However, I imagine you can extend this solution by building a random list of sub-cycles:
(2, 2, 3) => concat(zip(*items[0:2]), zip(*items[2:4]), zip(*items[4:7]))
A shuffle which doesn't leave any element in the same place is called a derangement. Essentially, there are two parts to this problem: first to generate a derangement of the keys, and then to build the new dictionary.
We can randomly generate a derangement by shuffling until we get one; on average it should only take 2-3 tries even for large dictionaries, but this is a Las Vegas algorithm in the sense that there's a tiny probability it could take a much longer time to run than expected. The upside is that this trivially guarantees that all derangements are equally likely.
from random import shuffle
def derangement(keys):
if len(keys) == 1:
raise ValueError('No derangement is possible')
new_keys = list(keys)
while any(x == y for x, y in zip(keys, new_keys)):
shuffle(new_keys)
return new_keys
def shuffle_dict(d):
return { x: d[y] for x, y in zip(d, derangement(d)) }
Usage:
>>> shuffle_dict({ 'a': 1, 'b': 2, 'c': 3 })
{'a': 2, 'b': 3, 'c': 1}
theonewhocodes, does this work, if you don't have a right answer, can you update your question with a second use case?
my_dict = {'A':'a',
'K':'k',
'P':'p',
'Z':'z'}
while True:
new_dict = dict(zip(list(my_dict.keys()), random.sample(list(my_dict.values()),len(my_dict))))
if new_dict.items() & my_dict.items():
continue
else:
break
print(my_dict)
print(new_dict)

Match list's index based off its value

I am new to Python and working on a problem where I have to match a list of indices to a list of value with 2 conditions:
If there is a repeated index, then the values should be summed
If there is no index in the list, then value should be 0
For example, below are my 2 lists: 'List of Inds' and 'List of Vals'. So at index 0, my value is 5; at index 1, my value is 4; at index 2, my value is 3 (2+1), at index 3, may value 0 (since no value associated with the index) and so on.
Input:
'List of Inds' = [0,1,4,2,2]
'List Vals' = [5,4,3,2,1]
Output = [5,4,3,0,3]
I have been struggling with it for few days and can't find anything online that can point me in the right direction. Thank you.
List_of_Inds = [0,1,4,2,2]
List_Vals = [5,4,3,2,1]
dic ={}
i = 0
for key in List_of_Inds:
if key not in dic:
dic[key] = 0
dic[key] = List_Vals[i]+dic[key]
i = i+1
output = []
for key in range(0, len(dic)+1):
if key in dic:
output.append(dic[key])
else:
output.append(0)
print(dic)
print(output)
output:
{0: 5, 1: 4, 4: 3, 2: 3}
[5, 4, 3, 0, 3]
The following code works as desired. In computer science it is called "Sparse Matrix" where the data is kept only for said indices, but the "virtual size" of the data structure seems large from the outside.
import logging
class SparseVector:
def __init__(self, indices, values):
self.d = {}
for c, indx in enumerate(indices):
logging.info(c)
logging.info(indx)
if indx not in self.d:
self.d[indx] = 0
self.d[indx] += values[c]
def getItem(self, key):
if key in self.d:
return self.d[key]
else:
return 0
p1 = SparseVector([0,1,4,2,2], [5,4,3,2,1])
print p1.getItem(0);
print p1.getItem(1);
print p1.getItem(2);
print p1.getItem(3);
print p1.getItem(4);
print p1.getItem(5);
print p1.getItem(6);
Answer code is
def ans(list1,list2):
dic={}
ans=[]
if not(len(list1)==len(list2)):
return "Not Possible"
for i in range(0,len(list1)):
ind=list1[i]
val=list2[i]
if not(ind in dic.keys()):
dic[ind]=val
else:
dic[ind]+=val
val=len(list1)
for i in range(0,val):
if not(i in dic.keys()):
ans.append(0)
else:
ans.append(dic[i])
return ans
To test:
print(ans([0,1,4,2,2], [5,4,3,2,1]))
output:
[5, 4, 3, 0, 3]
Hope it helps
Comment if you dont understand any step
what you can do is sort the indexes and values in an ascending order, and then sum it up. Here is an example code:
import numpy as np
ind = [0,1,4,2,2]
vals = [5,4,3,2,1]
points = zip(ind,vals)
sorted_points = sorted(points)
new_ind = [point[0] for point in sorted_points]
new_val = [point[1] for point in sorted_points]
output = np.zeros((len(new_ind)))
for i in range(len(new_ind)):
output[new_ind[i]] += new_val[i]
In this code, the index values are sorted to be in ascending order and then the value array is rearranged according to the sorted index array. Then, using a simple for loop, you can sum the values of each existing index and calculate the output.
This is a grouping problem. You can use collections.defaultdict to build a dictionary mapping, incrementing values in each iteration. Then use a list comprehension:
indices = [0,1,4,2,2]
values = [5,4,3,2,1]
from collections import defaultdict
dd = defaultdict(int)
for idx, val in zip(indices, values):
dd[idx] += val
res = [dd[idx] for idx in range(max(dd) + 1)]
## functional alternative:
# res = list(map(dd.get, range(max(dd) + 1)))
print(res)
# [5, 4, 3, 0, 3]

How to get a set of keys with largest values?

I am working on a function
def common_words(dictionary, N):
if len(dictionary) > N:
max(dictionary, key=dictionary.get)
Description of the function is:
The first parameter is the dictionary of word counts and the second is
a positive integer N. This function should update the dictionary so
that it includes the most common (highest frequency words). At most N
words should be included in the dictionary. If including all words
with some word count would result in a dictionary with more than N
words, then none of the words with that word count should be included.
(i.e., in the case of a tie for the N+1st most common word, omit all
of the words in the tie.)
So I know that I need to get the N items with the highest values but I am not sure how to do that. I also know that once I get N items that if there are any duplicate values that I need to pop them out.
For example, given
k = {'a':5, 'b':4, 'c':4, 'd':1}
then
common_words(k, 2)
should modify k so that it becomes {'a':5}.
Here's my algorithm for this problem.
Extract the data from the dictionary into a list and sort it in descending order on the dictionary values.
Clear the original dictionary.
Group the sorted data into groups that have the same value.
Re-populate the dictionary with the all (key, value) pairs from each group in the sorted list if that will keep the total dictionary size <= N. If adding a group would make the total dictionary size > N, then return.
The grouping operation can be easily done using the standard itertools.groupby function.
To perform the sorting and grouping we need an appropriate key function, as described in the groupby, list and sorted docs. Since we need the second item of each tuple we could use
def keyfunc(t):
return t[1]
or
keyfunc = lambda t: t[1]
but it's more efficient to use operator.itemgetter.
from operator import itemgetter
from itertools import groupby
def common_words(d, n):
keyfunc = itemgetter(1)
lst = sorted(d.items(), key=keyfunc, reverse=True)
d.clear()
for _, g in groupby(lst, key=keyfunc):
g = list(g)
if len(d) + len(g) <= n:
d.update(g)
else:
break
# test
data = {'a':5, 'b':4, 'c':4, 'd':1}
common_words(data, 4)
print(data)
common_words(data, 2)
print(data)
output
{'c': 4, 'd': 1, 'b': 4, 'a': 5}
{'a': 5}
my algorithm as below
1st build tuple list from dictionary sorted based on value from
largest to smallest
check for if item[N-1] match item[N] value, if yes, drop item[N-1]
(index start from 0, so -1 there)
finally, convert the slice of tuple list up to N element back to
dict, may change to use OrderedDict here if wanna retain the items order
it will just return the dictionary as it is if the dictionary length is less than N
def common_words(dictionary, N):
if len(dictionary) > N:
tmp = [(k,dictionary[k]) for k in sorted(dictionary, key=dictionary.get, reverse=True)]
if tmp[N-1][1] == tmp[N][1]:
N -= 1
return dict(tmp[:N])
# return [i[0] for i in tmp[:N]] # comment line above and uncomment this line to get keys only as your title mention how to get keys
else:
return dictionary
# return dictionary.keys() # comment line above and uncomment this line to get keys only as your title mention how to get keys
>>> common_words({'a':5, 'b':4, 'c':4, 'd':1}, 2)
{'a': 5}
OP wanna modify input dictionary within function and return None, it can be modified as below
def common_words(dictionary, N):
if len(dictionary) > N:
tmp = [(k,dictionary[k]) for k in sorted(dictionary, key=dictionary.get, reverse=True)]
if tmp[N-1][1] == tmp[N][1]:
N -= 1
# return dict(tmp[:N])
for i in tmp[N:]:
dictionary.pop(i[0])
>>> k = {'a':5, 'b':4, 'c':4, 'd':1}
>>> common_words(k, 2)
>>> k
{'a': 5}

Remove the smallest element(s) from a dictionary

I have a function such that there is a dictionary as parameters, with the value associated to be an integer. I'm trying to remove the minimum element(s) and return a set of the remaining keys.
I am programming in python. I cant seem to remove key value pairs with the same key or values. My code does not work for the 2nd and 3rd example
This is how it would work:
remaining({A: 1, B: 2, C: 2})
{B, C}
remaining({B: 2, C : 2})
{}
remaining({A: 1, B: 1, C: 1, D: 4})
{D}
This is what I have:
def remaining(d : {str:int}) -> {str}:
Remaining = set(d)
Remaining.remove(min(d, key=d.get))
return Remaining
One approach is to take the minimum value, then build a list of keys that are equal to it and utilise dict.viewkeys() which has set-like behaviour and remove the keys matching the minimum value from it.
d = {'A': 1, 'B': 1, 'C': 1, 'D': 4}
# Use .values() and .keys() and .items() for Python 3.x
min_val = min(d.itervalues())
remaining = d.viewkeys() - (k for k, v in d.iteritems() if v == min_val)
# set(['D'])
On a side note, I find it odd that {B: 2, C : 2} should be {} as there's not actually anything greater for those to be the minimum as it were.
That's because you're trying to map values to keys and map allows different keys to have the same values but not the other way! you should implement a map "reversal" as described here, remove the minimum key, and then reverse the map back to its original form.
from collections import defaultdict
# your example
l = {'A': 1, 'B': 1, 'C': 1, 'D': 4}
# reverse the dict
d1 = {}
for k, v in l.iteritems():
d1[v] = d1.get(v, []) + [k]
# remove the min element
del d1[min(d1, key=d1.get)]
#recover the rest to the original dict minus the min
res = {}
for k, v in d1.iteritems():
for e in v:
res[e] = k
print res
Comment:
#Jon Clements's solution is more elegant and should be accepted as the answer
Take the minimum value and construct a set with all the keys which are not associated to that value:
def remaining(d):
m = min(d.values())
return {k for k,v in d.items() if v != m}
If you don't like set comprehensions that's the same as:
def remaining(d):
m = min(d.values())
s = set()
for k,v in d.items():
if v != m:
s.add(k)
return s
This removes all the items with the minimum value.
import copy
def remaining(dic):
minimum = min([i for i in dic.values()])
for k, v in copy.copy(dic.items()):
if v == minimum: dic.pop(k)
return set(dic.keys())
An easier way would be to use pd.Series.idxmin() or pd.Series.min(). These functions allow you to find the index of the minimum value or the minimum value in a series, plus pandas allows you to create a named index.
import pandas as pd
import numpy as np
A = pd.Series(np.full(shape=5,fill_value=0))#create series of 0
A = A.reindex(['a','b','c','d','e'])#set index, similar to dictionary names
A['a'] = 2
print(A.max())
#output 2.0
print(A.idxmax())#you can also pop by index without changing other indices
#output a

Categories

Resources