Related
This question already has answers here:
how to randomly choose multiple keys and its value in a dictionary python
(4 answers)
Closed 7 months ago.
This post was edited and submitted for review 7 months ago and failed to reopen the post:
Duplicate This question has been answered, is not unique, and doesn’t differentiate itself from another question.
I have a (potentially) huge dict in Python 3.10 and want to randomly sample a few values. Alas, random.sample(my_dict, k) says:
TypeError: Population must be a sequence. For dicts or sets, use sorted(d).
and random.sample(my_dict.keys(), k) gives
DeprecationWarning: Sampling from a set deprecated
since Python 3.9 and will be removed in a subsequent version.
I don't want to pay the cost of converting my dictionary keys to a list, and I don't need them sorted.
There's an old question in a similar vein, but that's from before stuff got deprecated in Python and the person asking that question didn't mind converting to a list first.
I also tried running random.choice multiple times to simulate random.sample. But that's even worse: it just throws an exception when you use it on a dict. (Instead of giving you a reasonable error message.)
You need to use sample(list(dct)). (The example code select random 2 items from original dict.)
from random import sample
dct = {'a':1, 'b':2, 'c':3, 'd':4}
rnd_keys = sample(list(dct), 2)
# rnd_keys -> ['c', 'b']
rnd_dct = dict(sample(list(dct.items()), 2))
print(rnd_dct)
{'c': 3, 'b': 2}
Update without converting huge dict to list (this convert use O(n) space and question say, don't do this.). You can generate random number base len(dict) and use enumerate and only get k,v that idx match with random_idx and break from for-loop when reaching to zero base random_number that we want to select (this break helps you don't see all dict).
from random import sample
dct = {'a':1, 'b':2, 'c':3, 'd':4}
# idx -^0^----^1^----^2^----^3^---
number_rnd = 2
rnd_idx = set(sample(range(len(dct)), number_rnd))
print(rnd_idx)
# {0, 3}
res = {}
for idx, (k,v) in enumerate(dct.items()):
if idx in rnd_idx:
res[k] = v
number_rnd -= 1
if number_rnd == 0:
break
print(res)
# {'b': 2, 'c': 3}
Third Approach By thanks Tomerikoo, We can use flip a coin idea, On each iterate over items() we cen generate a random 0 or 1 and if the random number is 1 save the item in the result dict. (Maybe we see all dict items but don't select all random numbers because, maybe we get many random 0.)
import random
dct = {'a':1, 'b':2, 'c':3, 'd':4}
number_rnd = 2
res = {}
for k,v in dct.items():
rnd_ch = random.getrandbits(1)
if rnd_ch:
res[k] = v
number_rnd -= 1
if number_rnd == 0:
break
print(res)
If you’re ok with getting the keys into a list, you can just pick the keys using a random set of integers off of the list. If you don’t want to store them into a list, you can generate your random integers based on the dict size, sort them, then iterate through the dict and sample as you get to indices matching your random integer picks.
Forgive the convoluted title, but I couldn't find a more elegant way to express the problem. The closet question I could locate can be found here, but it doesn't quite get me there.
Assume a number of dictionaries of varying lengths:
dict_1 = {'A':4, 'C':5}
dict_2 = {'A':1, 'B':2, 'C':3}
dict_3 = {'B':6}
The common denominator for these dictionaries is that they all share the keys on this list:
my_keys= ['A','B','C']
but some are missing one or more of the keys.
The intent is to create a list of lists, where each list element is a list of all the existing values in each dictionary, or 'None' where a specific key isn't present. They keys themselves, being identical across all dictionaries, can be disregarded.
So in this case, the expected output is:
final_list =
[[4,"None",5],
[1,2,3],
["None",6,"None"]]
I'm not sure exactly how to approach it. You could start with each element in my_keys, and check its presence against each key in each dictionary; if the relevant dict has the key, the value of that key is appended to a temporary list; otherwise, 'None' is appended. Once the my_keys are all iterated over, the temp list is appended to a final list and the cycle start again.
To me at least, it's easier said than done. I tried quite a few things (which I won't bother to post because they didn't get even close). So I was wondering if there is an elegant approach to the problem.
dict.get can return a default value (for example, None). If we take your examples:
dict_1 = {'A':4, 'C':5}
dict_2 = {'A':1, 'B':2, 'C':3}
dict_3 = {'B':6}
my_keys= ['A','B','C']
Then dict_1.get('B', None) is the way to make sure we get a default None value. We can loop across all keys the same way:
def dict_to_list(d, keys):
return [d.get(key, None) for key in keys]
Example:
>>> dict_to_list(dict_1, my_keys)
[4, None, 5]
>>> dict_to_list(dict_2, my_keys)
[1, 2, 3]
>>> dict_to_list(dict_3, my_keys)
[None, 6, None]
EDIT: None is the default argument even if it's not explicitly specified, so dict_1.get('B') would work just as well as dict_1.get('B', None)
I'm taking first steps with python and trying to iterate over multidimensional dictionary, while checking if key exists and not None.
Just to make it clear, the code works! But I feel that there should be a better way to implement it:
for key in sites[website_name]:
if 'revenue' in sites[website_name][key]:
if sites[website_name][key]['revenue'] is not None:
totalSiteIncome += sites[website_name][key]['revenue']
else:
sites[website_name][key]['revenue'] = 0
if 'spent' in sites[website_name][key]:
if sites[website_name][key]['spent'] is not None:
totalSiteSpent += sites[website_name][key]['spent']
else:
sites[website_name][key]['spent'] = 0
Any idea if and how can I improve the loop?
Keep in mind, looking for best practice here, thx!
Posting a sample of the sites[website_name] dictionary would really be helpful but if I understand you correctly, this is how I would do it:
totalSiteIncome = sum(x.get('revenue', 0.0) for x in sites[website_name])
totalSiteSpent = sum(x.get('spent', 0.0) for x in sites[website_name])
As mentioned in the comments, .get() allows you not to care whether the key is there or not and it takes a default argument in case it isn't (in this case 0). Other than that is just a generator in the sum() function.
In english the first line would read:
"get me all the revenues if they exist from every website in my site dictionary and sum them. If the revenue is not logged, assume 0"
As a sidenote, in your code, the totalSiteIncome and totalSiteSpent have to be initialised too, otherwise it whould not run. In my version they don't have to be and if they are their values will be overwritten.
The following approach could be useful if you need a solution that is agnostic to the nesting level of the target fields (revenue and spent). Also could be useful if you want to add more and more fields, as with this solution you don't need to repeat the code for each new field.
Besides that, there are some downsides to my suggestion, comparing to your solution: it's using recursion, which is less readable, and also a flag (return_totals), which feels hacky. Just adding my 5 cents to the brainstorm.
import collections
def _update(input_dict, target_fields, totals = {}, return_totals=True):
result = {}
for k, v in input_dict.iteritems():
if isinstance(v, dict):
r = _update(input_dict[k], target_fields, totals, return_totals=False)
result[k] = r
else:
if k in target_fields:
result[k] = input_dict[k] or 0
if k not in totals:
totals[k] = 0
totals[k] += result[k]
else:
result[k] = input_dict[k]
if return_totals:
return {
'updated_dictionary': result,
'totals': totals,
}
return result
new_sites = _update(input_dict = sites, target_fields = ['revenue', 'spent'])
print 'updated_dictionary:'
print new_sites['updated_dictionary']
print 'totals:'
print new_sites['totals']
dictionaries should be iterated using their iterator methods, for example dict.keys(), dict.values() and dict.items().
dict.keys():
d = {'a': '1', 'b': '2'}
for key in d.keys():
print(key)
Output:
a
b
dict.values():
d = {'a': '1', 'b': '2'}
for value in d.values():
print(value)
Output:
1
2
dict.items():
d = {'a': '1', 'b': '2'}
for key, value in d.items():
print(key + " -> " + value)
Output:
a -> 1
b -> 2
NOTE:
This methods work both in Python2 and Python3, but are only real iterators (improving efficiency) in Python3. The iterators in Python2 are called dir.iterkeys(), dir.itervalues() and dir.iteritems() respectively.
In someone else's code I read the following two lines:
x = defaultdict(lambda: 0)
y = defaultdict(lambda: defaultdict(lambda: 0))
As the argument of defaultdict is a default factory, I think the first line means that when I call x[k] for a nonexistent key k (such as a statement like v=x[k]), the key-value pair (k,0) will be automatically added to the dictionary, as if the statement x[k]=0 is first executed. Am I correct?
And what about y? It seems that the default factory will create a defaultdict with default 0. But what does that mean concretely? I tried to play around with it in Python shell, but couldn't figure out what it is exactly.
I think the first line means that when I call x[k] for a nonexistent key k (such as a statement like v=x[k]), the key-value pair (k,0) will be automatically added to the dictionary, as if the statement x[k]=0 is first executed.
That's right. This is more idiomatically written
x = defaultdict(int)
In the case of y, when you do y["ham"]["spam"], the key "ham" is inserted in y if it does not exist. The value associated with it becomes a defaultdict in which "spam" is automatically inserted with a value of 0.
I.e., y is a kind of "two-tiered" defaultdict. If "ham" not in y, then evaluating y["ham"]["spam"] is like doing
y["ham"] = {}
y["ham"]["spam"] = 0
in terms of ordinary dict.
You are correct for what the first one does. As for y, it will create a defaultdict with default 0 when a key doesn't exist in y, so you can think of this as a nested dictionary. Consider the following example:
y = defaultdict(lambda: defaultdict(lambda: 0))
print y['k1']['k2'] # 0
print dict(y['k1']) # {'k2': 0}
To create an equivalent nested dictionary structure without defaultdict you would need to create an inner dict for y['k1'] and then set y['k1']['k2'] to 0, but defaultdict does all of this behind the scenes when it encounters keys it hasn't seen:
y = {}
y['k1'] = {}
y['k1']['k2'] = 0
The following function may help for playing around with this on an interpreter to better your understanding:
def to_dict(d):
if isinstance(d, defaultdict):
return dict((k, to_dict(v)) for k, v in d.items())
return d
This will return the dict equivalent of a nested defaultdict, which is a lot easier to read, for example:
>>> y = defaultdict(lambda: defaultdict(lambda: 0))
>>> y['a']['b'] = 5
>>> y
defaultdict(<function <lambda> at 0xb7ea93e4>, {'a': defaultdict(<function <lambda> at 0xb7ea9374>, {'b': 5})})
>>> to_dict(y)
{'a': {'b': 5}}
defaultdict takes a zero-argument callable to its constructor, which is called when the key is not found, as you correctly explained.
lambda: 0 will of course always return zero, but the preferred method to do that is defaultdict(int), which will do the same thing.
As for the second part, the author would like to create a new defaultdict(int), or a nested dictionary, whenever a key is not found in the top-level dictionary.
All answers are good enough still I am giving the answer to add more info:
"defaultdict requires an argument that is callable. That return result of that callable object is the default value that the dictionary returns when you try to access the dictionary with a key that does not exist."
Here's an example
SAMPLE= {'Age':28, 'Salary':2000}
SAMPLE = defaultdict(lambda:0,SAMPLE)
>>> SAMPLE
defaultdict(<function <lambda> at 0x0000000002BF7C88>, {'Salary': 2000, 'Age': 28})
>>> SAMPLE['Age']----> This will return 28
>>> SAMPLE['Phone']----> This will return 0 # you got 0 as output for a non existing key inside SAMPLE
y = defaultdict(lambda:defaultdict(lambda:0))
will be helpful if you try this y['a']['b'] += 1
The following two expressions seem equivalent to me. Which one is preferable?
data = [('a', 1), ('b', 1), ('b', 2)]
d1 = {}
d2 = {}
for key, val in data:
# variant 1)
d1[key] = d1.get(key, []) + [val]
# variant 2)
d2.setdefault(key, []).append(val)
The results are the same but which version is better or rather more pythonic?
Personally I find version 2 harder to understand, as to me setdefault is very tricky to grasp. If I understand correctly, it looks for the value of "key" in the dictionary, if not available, enters "[]" into the dict, returns a reference to either the value or "[]" and appends "val" to that reference. While certainly smooth it is not intuitive in the least (at least to me).
To my mind, version 1 is easier to understand (if available, get the value for "key", if not, get "[]", then join with a list made up from [val] and place the result in "key"). But while more intuitive to understand, I fear this version is less performant, with all this list creating. Another disadvantage is that "d1" occurs twice in the expression which is rather error-prone. Probably there is a better implementation using get, but presently it eludes me.
My guess is that version 2, although more difficult to grasp for the inexperienced, is faster and therefore preferable. Opinions?
Your two examples do the same thing, but that doesn't mean get and setdefault do.
The difference between the two is basically manually setting d[key] to point to the list every time, versus setdefault automatically setting d[key] to the list only when it's unset.
Making the two methods as similar as possible, I ran
from timeit import timeit
print timeit("c = d.get(0, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("c = d.get(1, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(0, []).extend([1])", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(1, []).extend([1])", "d = {1: []}", number = 1000000)
and got
0.794723378711
0.811882272256
0.724429205999
0.722129751973
So setdefault is around 10% faster than get for this purpose.
The get method allows you to do less than you can with setdefault. You can use it to avoid getting a KeyError when the key doesn't exist (if that's something that's going to happen frequently) even if you don't want to set the key.
See Use cases for the 'setdefault' dict method and dict.get() method returns a pointer for some more info about the two methods.
The thread about setdefault concludes that most of the time, you want to use a defaultdict. The thread about get concludes that it is slow, and often you're better off (speed wise) doing a double lookup, using a defaultdict, or handling the error (depending on the size of the dictionary and your use case).
The accepted answer from agf isn't comparing like with like. After:
print timeit("d[0] = d.get(0, []) + [1]", "d = {1: []}", number = 10000)
d[0] contains a list with 10,000 items whereas after:
print timeit("d.setdefault(0, []) + [1]", "d = {1: []}", number = 10000)
d[0] is simply []. i.e. the d.setdefault version never modifies the list stored in d. The code should actually be:
print timeit("d.setdefault(0, []).append(1)", "d = {1: []}", number = 10000)
and in fact is faster than the faulty setdefault example.
The difference here really is because of when you append using concatenation the whole list is copied every time (and once you have 10,000 elements that is beginning to become measurable. Using append the list updates are amortised O(1), i.e. effectively constant time.
Finally, there are two other options not considered in the original question: defaultdict or simply testing the dictionary to see whether it already contains the key.
So, assuming d3, d4 = defaultdict(list), {}
# variant 1 (0.39)
d1[key] = d1.get(key, []) + [val]
# variant 2 (0.003)
d2.setdefault(key, []).append(val)
# variant 3 (0.0017)
d3[key].append(val)
# variant 4 (0.002)
if key in d4:
d4[key].append(val)
else:
d4[key] = [val]
variant 1 is by far the slowest because it copies the list every time, variant 2 is the second slowest, variant 3 is the fastest but won't work if you need Python older than 2.5, and variant 4 is just slightly slower than variant 3.
I would say use variant 3 if you can, with variant 4 as an option for those occasional places where defaultdict isn't an exact fit. Avoid both of your original variants.
For those who are still struggling in understanding these two term, let me tell you basic difference between get() and setdefault() method -
Scenario-1
root = {}
root.setdefault('A', [])
print(root)
Scenario-2
root = {}
root.get('A', [])
print(root)
In Scenario-1 output will be {'A': []} while in Scenario-2 {}
So setdefault() sets absent keys in the dict while get() only provides you default value but it does not modify the dictionary.
Now let come where this will be useful-
Suppose you are searching an element in a dict whose value is a list and you want to modify that list if found otherwise create a new key with that list.
using setdefault()
def fn1(dic, key, lst):
dic.setdefault(key, []).extend(lst)
using get()
def fn2(dic, key, lst):
dic[key] = dic.get(key, []) + (lst) #Explicit assigning happening here
Now lets examine timings -
dic = {}
%%timeit -n 10000 -r 4
fn1(dic, 'A', [1,2,3])
Took 288 ns
dic = {}
%%timeit -n 10000 -r 4
fn2(dic, 'A', [1,2,3])
Took 128 s
So there is a very large timing difference between these two approaches.
You might want to look at defaultdict in the collections module. The following is equivalent to your examples.
from collections import defaultdict
data = [('a', 1), ('b', 1), ('b', 2)]
d = defaultdict(list)
for k, v in data:
d[k].append(v)
There's more here.
1. Explained with a good example here:
http://code.activestate.com/recipes/66516-add-an-entry-to-a-dictionary-unless-the-entry-is-a/
dict.setdefault typical usage
somedict.setdefault(somekey,[]).append(somevalue)
dict.get typical usage
theIndex[word] = 1 + theIndex.get(word,0)
2. More explanation : http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
dict.setdefault() is equivalent to get or set & get. Or set if necessary then get. It's especially efficient if your dictionary key is expensive to compute or long to type.
The only problem with dict.setdefault() is that the default value is always evaluated, whether needed or not. That only matters if the default value is expensive to compute. In that case, use defaultdict.
3. Finally the official docs with difference highlighted http://docs.python.org/2/library/stdtypes.html
get(key[, default])
Return the value for key if key is in the dictionary, else default. If
default is not given, it defaults to None, so that this method never
raises a KeyError.
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
The logic of dict.get is:
if key in a_dict:
value = a_dict[key]
else:
value = default_value
Take an example:
In [72]: a_dict = {'mapping':['dict', 'OrderedDict'], 'array':['list', 'tuple']}
In [73]: a_dict.get('string', ['str', 'bytes'])
Out[73]: ['str', 'bytes']
In [74]: a_dict.get('array', ['str', 'byets'])
Out[74]: ['list', 'tuple']
The mechamism of setdefault is:
levels = ['master', 'manager', 'salesman', 'accountant', 'assistant']
#group them by the leading letter
group_by_leading_letter = {}
# the logic expressed by obvious if condition
for level in levels:
leading_letter = level[0]
if leading_letter not in group_by_leading_letter:
group_by_leading_letter[leading_letter] = [level]
else:
group_by_leading_letter[leading_letter].append(word)
In [80]: group_by_leading_letter
Out[80]: {'a': ['accountant', 'assistant'], 'm': ['master', 'manager'], 's': ['salesman']}
The setdefault dict method is for precisely this purpose. The preceding for loop can be rewritten as:
In [87]: for level in levels:
...: leading = level[0]
...: group_by_leading_letter.setdefault(leading,[]).append(level)
Out[80]: {'a': ['accountant', 'assistant'], 'm': ['master', 'manager'], 's': ['salesman']}
It's very simple, means that either a non-null list append an element or a null list append an element.
The defaultdict, which makes this even easier. To create one, you pass a type or function for generating the default value for each slot in the dict:
from collections import defualtdict
group_by_leading_letter = defaultdict(list)
for level in levels:
group_by_leading_letter[level[0]].append(level)
There is no strict answer to this question. They both accomplish the same purpose. They can both be used to deal with missing values on keys. The only difference that I have found is that with setdefault(), the key that you invoke (if not previously in the dictionary) gets automatically inserted while it does not happen with get(). Here is an example:
Setdefault()
>>> myDict = {'A': 'GOD', 'B':'Is', 'C':'GOOD'} #(1)
>>> myDict.setdefault('C') #(2)
'GOOD'
>>> myDict.setdefault('C','GREAT') #(3)
'GOOD'
>>> myDict.setdefault('D','AWESOME') #(4)
'AWESOME'
>>> myDict #(5)
{'A': 'GOD', 'B': 'Is', 'C': 'GOOD', 'D': 'AWSOME'}
>>> myDict.setdefault('E')
>>>
Get()
>>> myDict = {'a': 1, 'b': 2, 'c': 3} #(1)
>>> myDict.get('a',0) #(2)
1
>>> myDict.get('d',0) #(3)
0
>>> myDict #(4)
{'a': 1, 'b': 2, 'c': 3}
Here is my conclusion: there is no specific answer to which one is best specifically when it comes to default values imputation. The only difference is that setdefault() automatically adds any new key with a default value in the dictionary while get() does not. For more information, please go here !
In [1]: person_dict = {}
In [2]: person_dict['liqi'] = 'LiQi'
In [3]: person_dict.setdefault('liqi', 'Liqi')
Out[3]: 'LiQi'
In [4]: person_dict.setdefault('Kim', 'kim')
Out[4]: 'kim'
In [5]: person_dict
Out[5]: {'Kim': 'kim', 'liqi': 'LiQi'}
In [8]: person_dict.get('Dim', '')
Out[8]: ''
In [5]: person_dict
Out[5]: {'Kim': 'kim', 'liqi': 'LiQi'}