Python update a key in dict if it doesn't exist - python

I want to insert a key-value pair into dict if key not in dict.keys().
Basically I could do it with:
if key not in d.keys():
d[key] = value
But is there a better way? Or what's the pythonic solution to this problem?

You do not need to call d.keys(), so
if key not in d:
d[key] = value
is enough. There is no clearer, more readable method.
You could update again with dict.get(), which would return an existing value if the key is already present:
d[key] = d.get(key, value)
but I strongly recommend against this; this is code golfing, hindering maintenance and readability.

Use dict.setdefault():
>>> d = {'key1': 'one'}
>>> d.setdefault('key1', 'some-unused-value')
'one'
>>> d # d has not changed because the key already existed
{'key1': 'one'}
>>> d.setdefault('key2', 'two')
'two'
>>> d
{'key1': 'one', 'key2': 'two'}

Since Python 3.9 you can use the merge operator | to merge two dictionaries. The dict on the right takes precedence:
new_dict = old_dict | { key: val }
For example:
new_dict = { 'a': 1, 'b': 2 } | { 'b': 42 }
print(new_dict) # {'a': 1, 'b': 42}
Note: this creates a new dictionary with the updated values.

With the following you can insert multiple values and also have default values but you're creating a new dictionary.
d = {**{ key: value }, **default_values}
I've tested it with the most voted answer and on average this is faster as it can be seen in the following example, .
Speed test comparing a for loop based method with a dict comprehension with unpack operator method.
if no copy (d = default_vals.copy()) is made on the first case then the most voted answer would be faster once we reach orders of magnitude of 10**5 and greater. Memory footprint of both methods are the same.

You can also use this solution in only one line of code:
dict[dict_key] = dict.get(dict_key,value)
The second argument of dict.get is the value you want to assign to the key in case the key does not exist. Since this evaluates before the assignment to dict[dict_key] = , we can be sure that they key will exist when we try to access it.

Related

Match two dict and change keys in first if keys exists in second

I have two dict:
a={'a':'A','b':'B'}
b={'a':123,'b':123}
I need check if keys 'a' and 'b' (two elements in example, in real code, it will be more) in dict b, exist in dict a. If so, I should change the keys in dict b using values from dict a:
Expected result:
b={'A':123, 'B': 123}
How I can do it?
{a[k] if k in a else k: v for k, v in b.items()}
This is how it's done:
a={'a':'A','b':'B'}
b={'a':123,'b':123}
c = {}
for key in a.keys():
if key in b.keys():
c.update({a[key]:b[key]})
The other answers so far ignore the question which wants the code to:
change keys in dict in b for values from dict a
I infer that any data in b, for which there isn't a replacement key in a, should be left alone. So walking the keys of a creating a new dictionary c won't work. We need to modify b directly. A fun way to do this is via the pop() method which we normally associate with lists but also works on dictionaries:
a = {'a': 'A', 'b': 'B'}
b = {'a': 123, 'b': 124, 'C': 125}
for key in list(b): # need a *copy* of old keys in b
if key in a:
b[a[key]] = b.pop(key) # copy data to new key, remove old key
print(b)
OUTPUT
> python3 test.py
{'C': 125, 'A': 123, 'B': 124}
>

Updating a dictionary

I have created three dictionaries-dict1, dict2, and dict2. I want to update dict1 with dict2 first, and resulting dictionary with dict3. I am not sure why they are not adding up.
def wordcount_directory(directory):
dict = {}
filelist=[os.path.join(directory,f) for f in os.listdir(directory)]
dicts=[wordcount_file(file) for file in filelist]
dict1=dicts[0]
dict2=dicts[1]
dict3=dicts[2]
for k,v in dict1.iteritems():
if k in dict2.keys():
dict1[k]+=1
else:
dict1[k]=v
for k1,v1 in dict1.iteritems():
if k1 in dict3.keys():
dict1[k1]+=1
else:
dict1[k1]=v1
return dict1
print wordcount_directory("C:\\Users\\Phil2040\\Desktop\\Word_count")
Maybe I am not understanding you question right, but are you trying to add all the values from each of the dictionaries together into one final dictionary? If so:
dict1 = {'a': 1, 'b': 2, 'c': 3}
dict2 = {'b': 5, 'c': 1, 'd': 9}
dict3 = {'d': 1, 'e': 7}
def add_dict(to_dict, from_dict):
for key, value in from_dict.iteritems():
to_dict[key] = to_dict.get(key, 0) + value
result = dict(dict1)
add_dict(result, dict2)
add_dict(result, dict3)
print result
This yields: {'a': 1, 'c': 4, 'b': 7, 'e': 7, 'd': 10}
It would be really helpful to post what the expected outcome should be for your question.
EDIT:
For an arbitrary amount of dictionaries:
result = dict(dicts[0])
for dict_sum in dicts[1:]:
add_dict(result, dict_sum)
print(result)
If you really want to fix the code from your original question in the format it is in:
You are using dict1[k]+=1 when you should be performing dict1[k]+=dict2.get(k, 0).
The introduction of get removes the need to check for its existence with an if statement.
You need to iterate though dict2 and dict3 to introduce new keys from them into dict1
(not really a problem, but worth mentioning) In the if statement to check if the key is in the dictionary, it is recommended to simply the operation to if k in dict2: (see this post for more details)
With the amazing built-in library found by #DisplacedAussie, the answer can be simplified even further:
from collections import Counter
print(Counter(dict1) + Counter(dict2) + Counter(dict3))
The result yields: Counter({'d': 10, 'b': 7, 'e': 7, 'c': 4, 'a': 1})
The Counter object is a sub-class of dict, so it can be used in the same way as a standard dict.
Hmmm, here a simple function that might help:
def dictsum(dict1, dict2):
'''Modify dict1 to accumulate new sums from dict2
'''
k1 = set(dict1.keys())
k2 = set(dict2.keys())
for i in k1 & k2:
dict1[i] += dict2[i]
for i in k2 - k1:
dict1[i] = dict2[i]
return None
... for the intersection update each by adding the second value to the existing one; then for the difference add those key/value pairs.
With that defined you'd simple call:
dictsum(dict1, dict2)
dictsum(dict1, dict3)
... and be happy.
(I will note that functions modify the contents of dictionaries in this fashion are not all that common. I'm returning None explicitly to follow the convention established by the list.sort() method ... functions which modify the contents of a container, in Python, do not normally return copies of the container).
If I understand your question correctly, you are iterating on the wrong dictionary. You want to iterate over dict2 and update dict1 with matching keys or add non-matching keys to dict1.
If so, here's how you need to update the for loops:
for k,v in dict2.iteritems(): # Iterate over dict2
if k in dict1.keys():
dict1[k]+=1 # Update dict1 for matching keys
else:
dict1[k]=v # Add non-matching keys to dict1
for k1,v1 in dict3.iteritems(): # Iterate over dict3
if k1 in dict1.keys():
dict1[k1]+=1 # Update dict1 for matching keys
else:
dict1[k1]=v1 # Add non-matching keys to dict1
I assume that wordcount_file(file) returns a dict of the words found in file, with each key being a word and the associated value being the count for that word. If so, your updating algorithm is wrong. You should do something like this:
keys1 = dict1.keys()
for k,v in dict2.iteritems():
if k in keys1:
dict1[k] += v
else:
dict1[k] = v
If there's a lot of data in these dicts you can make the key lookup faster by storing the keys in a set:
keys1 = set(dict1.keys())
You should probably put that code into a function, so you don't need to duplicate the code when you want to update dict1 with the data in dict3.
You should take a look at collections.Counter, a subclass of dict that supports counting; using Counters would simplify this task considerably. But if this is an assignment (or you're using Python 2.6 or older) you may not be able to use Counters.

Python dictionary replace values

I have a dictionary with 20 000 plus entries with at the moment simply the unique word and the number of times the word was used in the source text (Dante's Divine Comedy in Italian).
I would like to work through all entries replacing the value with an actual definition as I find them. Is there a simple way to iterate through the keywords that have as a value a number in order to replace (as I research the meaning)?
The dictionary starts:
{'corse': 378, 'cielo,': 209, 'mute;': 16, 'torre,': 11, 'corsa': 53, 'assessin': 21, 'corso': 417, 'Tolomea': 21} # etc.
Sort of an application that will suggest a keyword to research and define.
via dict.update() function
In case you need a declarative solution, you can use dict.update() to change values in a dict.
Either like this:
my_dict.update({'key1': 'value1', 'key2': 'value2'})
or like this:
my_dict.update(key1='value1', key2='value2')
via dictionary unpacking
Since Python 3.5 you can also use dictionary unpacking for this:
my_dict = { **my_dict, 'key1': 'value1', 'key2': 'value2'}
Note: This creates a new dictionary.
via merge operator or update operator
Since Python 3.9 you can also use the merge operator on dictionaries:
my_dict = my_dict | {'key1': 'value1', 'key2': 'value2'}
Note: This creates a new dictionary.
Or you can use the update operator:
my_dict |= {'key1': 'value1', 'key2': 'value2'}
You cannot select on specific values (or types of values). You'd either make a reverse index (map numbers back to (lists of) keys) or you have to loop through all values every time.
If you are processing numbers in arbitrary order anyway, you may as well loop through all items:
for key, value in inputdict.items():
# do something with value
inputdict[key] = newvalue
otherwise I'd go with the reverse index:
from collections import defaultdict
reverse = defaultdict(list)
for key, value in inputdict.items():
reverse[value].append(key)
Now you can look up keys by value:
for key in reverse[value]:
inputdict[key] = newvalue
If you iterate over a dictionary you get the keys, so assuming your dictionary is in a variable called data and you have some function find_definition() which gets the definition, you can do something like the following:
for word in data:
data[word] = find_definition(word)
I think this may help you solve your issue.
Imagine you have a dictionary like this:
dic0 = {0:"CL1", 1:"CL2", 2:"CL3"}
And you want to change values by this one:
dic0to1 = {"CL1":"Unknown1", "CL2":"Unknown2", "CL3":"Unknown3"}
You can use code bellow to change values of dic0 properly respected to dic0to1 without worrying yourself about indexes in dictionary:
for x, y in dic0.items():
dic0[x] = dic0to1[y]
Now you have:
>>> dic0
{0: 'Unknown1', 1: 'Unknown2', 2: 'Unknown3'}
Just had to do something similar. My approach for sanitizing data for python based on Sadra Sabouri's answer:
def sanitize(value):
if str(value) == 'false':
return False
elif str(value) == 'true':
return True
elif str(value) == 'null':
return None
return value
for k,v in some_dict.items():
some_dict[k] = sanitize(v)
data = {key1: value1, key2: value2, key3: value3}
for key in data:
if key == key1:
data[key1] = change
print(data)
this will replace key1: value1 to key1: change

python dict: get vs setdefault

The following two expressions seem equivalent to me. Which one is preferable?
data = [('a', 1), ('b', 1), ('b', 2)]
d1 = {}
d2 = {}
for key, val in data:
# variant 1)
d1[key] = d1.get(key, []) + [val]
# variant 2)
d2.setdefault(key, []).append(val)
The results are the same but which version is better or rather more pythonic?
Personally I find version 2 harder to understand, as to me setdefault is very tricky to grasp. If I understand correctly, it looks for the value of "key" in the dictionary, if not available, enters "[]" into the dict, returns a reference to either the value or "[]" and appends "val" to that reference. While certainly smooth it is not intuitive in the least (at least to me).
To my mind, version 1 is easier to understand (if available, get the value for "key", if not, get "[]", then join with a list made up from [val] and place the result in "key"). But while more intuitive to understand, I fear this version is less performant, with all this list creating. Another disadvantage is that "d1" occurs twice in the expression which is rather error-prone. Probably there is a better implementation using get, but presently it eludes me.
My guess is that version 2, although more difficult to grasp for the inexperienced, is faster and therefore preferable. Opinions?
Your two examples do the same thing, but that doesn't mean get and setdefault do.
The difference between the two is basically manually setting d[key] to point to the list every time, versus setdefault automatically setting d[key] to the list only when it's unset.
Making the two methods as similar as possible, I ran
from timeit import timeit
print timeit("c = d.get(0, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("c = d.get(1, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(0, []).extend([1])", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(1, []).extend([1])", "d = {1: []}", number = 1000000)
and got
0.794723378711
0.811882272256
0.724429205999
0.722129751973
So setdefault is around 10% faster than get for this purpose.
The get method allows you to do less than you can with setdefault. You can use it to avoid getting a KeyError when the key doesn't exist (if that's something that's going to happen frequently) even if you don't want to set the key.
See Use cases for the 'setdefault' dict method and dict.get() method returns a pointer for some more info about the two methods.
The thread about setdefault concludes that most of the time, you want to use a defaultdict. The thread about get concludes that it is slow, and often you're better off (speed wise) doing a double lookup, using a defaultdict, or handling the error (depending on the size of the dictionary and your use case).
The accepted answer from agf isn't comparing like with like. After:
print timeit("d[0] = d.get(0, []) + [1]", "d = {1: []}", number = 10000)
d[0] contains a list with 10,000 items whereas after:
print timeit("d.setdefault(0, []) + [1]", "d = {1: []}", number = 10000)
d[0] is simply []. i.e. the d.setdefault version never modifies the list stored in d. The code should actually be:
print timeit("d.setdefault(0, []).append(1)", "d = {1: []}", number = 10000)
and in fact is faster than the faulty setdefault example.
The difference here really is because of when you append using concatenation the whole list is copied every time (and once you have 10,000 elements that is beginning to become measurable. Using append the list updates are amortised O(1), i.e. effectively constant time.
Finally, there are two other options not considered in the original question: defaultdict or simply testing the dictionary to see whether it already contains the key.
So, assuming d3, d4 = defaultdict(list), {}
# variant 1 (0.39)
d1[key] = d1.get(key, []) + [val]
# variant 2 (0.003)
d2.setdefault(key, []).append(val)
# variant 3 (0.0017)
d3[key].append(val)
# variant 4 (0.002)
if key in d4:
d4[key].append(val)
else:
d4[key] = [val]
variant 1 is by far the slowest because it copies the list every time, variant 2 is the second slowest, variant 3 is the fastest but won't work if you need Python older than 2.5, and variant 4 is just slightly slower than variant 3.
I would say use variant 3 if you can, with variant 4 as an option for those occasional places where defaultdict isn't an exact fit. Avoid both of your original variants.
For those who are still struggling in understanding these two term, let me tell you basic difference between get() and setdefault() method -
Scenario-1
root = {}
root.setdefault('A', [])
print(root)
Scenario-2
root = {}
root.get('A', [])
print(root)
In Scenario-1 output will be {'A': []} while in Scenario-2 {}
So setdefault() sets absent keys in the dict while get() only provides you default value but it does not modify the dictionary.
Now let come where this will be useful-
Suppose you are searching an element in a dict whose value is a list and you want to modify that list if found otherwise create a new key with that list.
using setdefault()
def fn1(dic, key, lst):
dic.setdefault(key, []).extend(lst)
using get()
def fn2(dic, key, lst):
dic[key] = dic.get(key, []) + (lst) #Explicit assigning happening here
Now lets examine timings -
dic = {}
%%timeit -n 10000 -r 4
fn1(dic, 'A', [1,2,3])
Took 288 ns
dic = {}
%%timeit -n 10000 -r 4
fn2(dic, 'A', [1,2,3])
Took 128 s
So there is a very large timing difference between these two approaches.
You might want to look at defaultdict in the collections module. The following is equivalent to your examples.
from collections import defaultdict
data = [('a', 1), ('b', 1), ('b', 2)]
d = defaultdict(list)
for k, v in data:
d[k].append(v)
There's more here.
1. Explained with a good example here:
http://code.activestate.com/recipes/66516-add-an-entry-to-a-dictionary-unless-the-entry-is-a/
dict.setdefault typical usage
somedict.setdefault(somekey,[]).append(somevalue)
dict.get typical usage
theIndex[word] = 1 + theIndex.get(word,0)
2. More explanation : http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
dict.setdefault() is equivalent to get or set & get. Or set if necessary then get. It's especially efficient if your dictionary key is expensive to compute or long to type.
The only problem with dict.setdefault() is that the default value is always evaluated, whether needed or not. That only matters if the default value is expensive to compute. In that case, use defaultdict.
3. Finally the official docs with difference highlighted http://docs.python.org/2/library/stdtypes.html
get(key[, default])
Return the value for key if key is in the dictionary, else default. If
default is not given, it defaults to None, so that this method never
raises a KeyError.
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
The logic of dict.get is:
if key in a_dict:
value = a_dict[key]
else:
value = default_value
Take an example:
In [72]: a_dict = {'mapping':['dict', 'OrderedDict'], 'array':['list', 'tuple']}
In [73]: a_dict.get('string', ['str', 'bytes'])
Out[73]: ['str', 'bytes']
In [74]: a_dict.get('array', ['str', 'byets'])
Out[74]: ['list', 'tuple']
The mechamism of setdefault is:
levels = ['master', 'manager', 'salesman', 'accountant', 'assistant']
#group them by the leading letter
group_by_leading_letter = {}
# the logic expressed by obvious if condition
for level in levels:
leading_letter = level[0]
if leading_letter not in group_by_leading_letter:
group_by_leading_letter[leading_letter] = [level]
else:
group_by_leading_letter[leading_letter].append(word)
In [80]: group_by_leading_letter
Out[80]: {'a': ['accountant', 'assistant'], 'm': ['master', 'manager'], 's': ['salesman']}
The setdefault dict method is for precisely this purpose. The preceding for loop can be rewritten as:
In [87]: for level in levels:
...: leading = level[0]
...: group_by_leading_letter.setdefault(leading,[]).append(level)
Out[80]: {'a': ['accountant', 'assistant'], 'm': ['master', 'manager'], 's': ['salesman']}
It's very simple, means that either a non-null list append an element or a null list append an element.
The defaultdict, which makes this even easier. To create one, you pass a type or function for generating the default value for each slot in the dict:
from collections import defualtdict
group_by_leading_letter = defaultdict(list)
for level in levels:
group_by_leading_letter[level[0]].append(level)
There is no strict answer to this question. They both accomplish the same purpose. They can both be used to deal with missing values on keys. The only difference that I have found is that with setdefault(), the key that you invoke (if not previously in the dictionary) gets automatically inserted while it does not happen with get(). Here is an example:
Setdefault()
>>> myDict = {'A': 'GOD', 'B':'Is', 'C':'GOOD'} #(1)
>>> myDict.setdefault('C') #(2)
'GOOD'
>>> myDict.setdefault('C','GREAT') #(3)
'GOOD'
>>> myDict.setdefault('D','AWESOME') #(4)
'AWESOME'
>>> myDict #(5)
{'A': 'GOD', 'B': 'Is', 'C': 'GOOD', 'D': 'AWSOME'}
>>> myDict.setdefault('E')
>>>
Get()
>>> myDict = {'a': 1, 'b': 2, 'c': 3} #(1)
>>> myDict.get('a',0) #(2)
1
>>> myDict.get('d',0) #(3)
0
>>> myDict #(4)
{'a': 1, 'b': 2, 'c': 3}
Here is my conclusion: there is no specific answer to which one is best specifically when it comes to default values imputation. The only difference is that setdefault() automatically adds any new key with a default value in the dictionary while get() does not. For more information, please go here !
In [1]: person_dict = {}
In [2]: person_dict['liqi'] = 'LiQi'
In [3]: person_dict.setdefault('liqi', 'Liqi')
Out[3]: 'LiQi'
In [4]: person_dict.setdefault('Kim', 'kim')
Out[4]: 'kim'
In [5]: person_dict
Out[5]: {'Kim': 'kim', 'liqi': 'LiQi'}
In [8]: person_dict.get('Dim', '')
Out[8]: ''
In [5]: person_dict
Out[5]: {'Kim': 'kim', 'liqi': 'LiQi'}

Remove an item from a dictionary when its key is unknown

What is the best way to remove an item from a dictionary by value, i.e. when the item's key is unknown? Here's a simple approach:
for key, item in some_dict.items():
if item is item_to_remove:
del some_dict[key]
Are there better ways? Is there anything wrong with mutating (deleting items) from the dictionary while iterating it?
The dict.pop(key[, default]) method allows you to remove items when you know the key. It returns the value at the key if it removes the item otherwise it returns what is passed as default. See the docs.'
Example:
>>> dic = {'a':1, 'b':2}
>>> dic
{'a': 1, 'b': 2}
>>> dic.pop('c', 0)
0
>>> dic.pop('a', 0)
1
>>> dic
{'b': 2}
Be aware that you're currently testing for object identity (is only returns True if both operands are represented by the same object in memory - this is not always the case with two object that compare equal with ==). If you are doing this on purpose, then you could rewrite your code as
some_dict = {key: value for key, value in some_dict.items()
if value is not value_to_remove}
But this may not do what you want:
>>> some_dict = {1: "Hello", 2: "Goodbye", 3: "You say yes", 4: "I say no"}
>>> value_to_remove = "You say yes"
>>> some_dict = {key: value for key, value in some_dict.items() if value is not value_to_remove}
>>> some_dict
{1: 'Hello', 2: 'Goodbye', 3: 'You say yes', 4: 'I say no'}
>>> some_dict = {key: value for key, value in some_dict.items() if value != value_to_remove}
>>> some_dict
{1: 'Hello', 2: 'Goodbye', 4: 'I say no'}
So you probably want != instead of is not.
a = {'name': 'your_name','class': 4}
if 'name' in a: del a['name']
A simple comparison between del and pop():
import timeit
code = """
results = {'A': 1, 'B': 2, 'C': 3}
del results['A']
del results['B']
"""
print timeit.timeit(code, number=100000)
code = """
results = {'A': 1, 'B': 2, 'C': 3}
results.pop('A')
results.pop('B')
"""
print timeit.timeit(code, number=100000)
result:
0.0329667857143
0.0451040902256
So, del is faster than pop().
I'd build a list of keys that need removing, then remove them. It's simple, efficient and avoids any problem about simultaneously iterating over and mutating the dict.
keys_to_remove = [key for key, value in some_dict.iteritems()
if value == value_to_remove]
for key in keys_to_remove:
del some_dict[key]
items() returns a list, and it is that list you are iterating, so mutating the dict in the loop doesn't matter here. If you were using iteritems() instead, mutating the dict in the loop would be problematic, and likewise for viewitems() in Python 2.7.
I can't think of a better way to remove items from a dict by value.
c is the new dictionary, and a is your original dictionary, {'z','w'}
are the keys you want to remove from a
c = {key:a[key] for key in a.keys() - {'z', 'w'}}
Also check: https://www.safaribooksonline.com/library/view/python-cookbook-3rd/9781449357337/ch01.html
y={'username':'admin','machine':['a','b','c']}
if 'c' in y['machine'] : del y['machine'][y['machine'].index('c')]
There is nothing wrong with deleting items from the dictionary while iterating, as you've proposed. Be careful about multiple threads using the same dictionary at the same time, which may result in a KeyError or other problems.
Of course, see the docs at http://docs.python.org/library/stdtypes.html#typesmapping
This is how I would do it.
for key in some_dict.keys():
if some_dict[key] == item_to_remove:
some_dict.pop(key)
break

Categories

Resources