Say I want to write a function which will return an arbitrary value from a dict, like: mydict['foo']['bar']['baz'], or return an empty string if it doesn't. However, I don't know if mydict['foo'] will necessarily exist, let alone mydict['foo']['bar']['baz'].
I'd like to do something like:
safe_nested(dict, element):
try:
return dict[element]
except KeyError:
return ''
But I don't know how to approach writing code that will accept the lookup path in the function. I started going down the route of accepting a period-separated string (like foo.bar.baz) so this function could recursively try to get the next sub-dict, but this didn't feel very Pythonic. I'm wondering if there's a way to pass in both the dict (mydict) and the sub-structure I'm interested in (['foo']['bar']['baz']), and have the function try to access this or return an empty string if it encounters a KeyError.
Am I going about this in the right way?
You should use the standard defaultdict: https://docs.python.org/2/library/collections.html#collections.defaultdict
For how to nest them, see: defaultdict of defaultdict, nested or Multiple levels of 'collection.defaultdict' in Python
I think this does what you want:
from collections import defaultdict
mydict = defaultdict(lambda: defaultdict(lambda: defaultdict(str)))
You might also want to check out addict.
>>> from addict import Dict
>>> addicted = Dict()
>>> addicted.a = 2
>>> addicted.b.c.d.e
{}
>>> addicted
{'a': 2, 'b': {'c': {'d': {'e': {}}}}}
It returns an empty Dict, not an empty string, but apart from that it looks like it does what you ask for in the question.
Related
I am setting up some values in a nested JSON. In the JSON, it is not necessary that the keys would always be present.
My sample code looks like below.
if 'key' not in data:
data['key'] = {}
if 'nested_key' not in data['key']:
data['key']['nested_key'] = some_value
Is there any other elegant way to achieve this? Simply assigning the value without if's like - data['key']['nested_key'] = some_value can sometimes throw KeyError.
I referred multiple similar questions about "getting nested JSON" on StackOverflow but none fulfilled my requirement. So I have added a new question. In case, this is a duplicate question then I'll remove this one once guided towards the right question.
Thanks
Please note that, for the insertion you need not check for the key and you can directly add it. But, defaultdict can be used. It is particularly helpful incase of values like lists.
from collections import defaultdict
data = defaultdict(dict)
data['key']['nested_key'] = some_value
defaultdict will ensure that you will never get a key error. If the key doesn't exist, it returns an empty object of the type with which you have initialized it.
List based example:
from collections import defaultdict
data = defaultdict(list)
data['key'].append(1)
which otherwise will have to be done like below:
data = {}
if 'key' not in data:
data['key'] = ['1']
else:
data['key'].append('2')
Example based on existing dict:
from collections import defaultdict
data = {'key1': 'sample'}
data_new = defaultdict(dict,data)
data_new['key']['something'] = 'nothing'
print data_new
Output:
defaultdict(<type 'dict'>, {'key1': 'sample', 'key': {'something': 'nothing'}})
You can write in one statement:
data.setdefault('key', {})['nested_value'] = some_value
but I am not sure it looks more elegant.
PS: if you prefer to use defaultdict as proposed by Jay, you can initialize the new dict with the original one returned by json.loads(), then passes it to json.dumps():
data2 = defaultdict(dict, data)
data2['key'] = value
json.dumps(data2) # print the expected dict
I often deal with heterogeneous datasets and I acquire them as dictionaries in my python routines. I usually face the problem that the key of the next entry I am going to add to the dictionary already exists.
I was wondering if there exists a more "pythonic" way to do the following task: check whether the key exists and create/update the corresponding pair key-item of my dictionary
myDict = dict()
for line in myDatasetFile:
if int(line[-1]) in myDict.keys():
myDict[int(line[-1])].append([line[2],float(line[3])])
else:
myDict[int(line[-1])] = [[line[2],float(line[3])]]
Use a defaultdict.
from collections import defaultdict
d = defaultdict(list)
# Every time you try to access the value of a key that isn't in the dict yet,
# d will call list with no arguments (producing an empty list),
# store the result as the new value, and give you that.
for line in myDatasetFile:
d[int(line[-1])].append([line[2],float(line[3])])
Also, never use thing in d.keys(). In Python 2, that will create a list of keys and iterate through it one item at a time to find the key instead of using a hash-based lookup. In Python 3, it's not quite as horrible, but it's still redundant and still slower than the right way, which is thing in d.
Its what that dict.setdefault is for.
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
example :
>>> d={}
>>> d.setdefault('a',[]).append([1,2])
>>> d
{'a': [[1, 2]]}
Python follows the idea that it's easier to ask for forgiveness than permission.
so the true Pythonic way would be:
try:
myDict[int(line[-1])].append([line[2],float(line[3])])
except KeyError:
myDict[int(line[-1])] = [[line[2],float(line[3])]]
for reference:
https://docs.python.org/2/glossary.html#term-eafp
https://stackoverflow.com/questions/6092992/why-is-it-easier-to-ask-forgiveness-than-permission-in-python-but-not-in-java
Try to catch the Exception when you get a KeyError
myDict = dict()
for line in myDatasetFile:
try:
myDict[int(line[-1])].append([line[2],float(line[3])])
except KeyError:
myDict[int(line[-1])] = [[line[2],float(line[3])]]
Or use:
myDict = dict()
for line in myDatasetFile:
myDict.setdefault(int(line[-1]),[]).append([line[2],float(line[3])])
I am stack in something really simple, but I cannot find the right way. I have a dictionary like this one:
mydic= {'a': {'mylist':[..,..,..]}, 'b': {'mylist':[..,..,..]}}
I am trying to iterate through mylist and create a new subdictionary with the results of a function.
for i in mydic:
for j in mydic[i]['mylist']:
mydic[i]['thenewkey'][j] = myfunction(j)
The find dictionary would be like this:
mydic= {'a': {'mylist':[..,..,..], 'thenewkey':{'..':'..', '..':'..'}}, 'b': {'mylist':[..,..,..],'thenewkey':{'..':'..', '..':'..'}}}
But when I run the code, I have a key error on thenewkey. Any idea?
You have to create first mydic[i]['thenewkey'] before assigning to mydic[i]['thenewkey'][j].
As already said, you have to create the thenewkey entry before you can add items to it, i.e. you'd have to add mydic[i]['thenewkey'] = {} to your outer loop. You could use a defaultdict(dict) to make Python automatically insert missing entries, but since you have both list and dict entries, this does not seem like a good idea.
That said, using a dictionary comprehension makes it a bit shorter and IMHO much more readable:
for i in mydic:
mydic[i]['thenewkey'] = {j: myfunction(j) for j in mydic[i]['mylist']}
What is the most pythonic way to set a value in a dict if the value is not already set?
At the moment my code uses if statements:
if "timeout" not in connection_settings:
connection_settings["timeout"] = compute_default_timeout(connection_settings)
dict.get(key,default) is appropriate for code consuming a dict, not for code that is preparing a dict to be passed to another function. You can use it to set something but its no prettier imo:
connection_settings["timeout"] = connection_settings.get("timeout", \
compute_default_timeout(connection_settings))
would evaluate the compute function even if the dict contained the key; bug.
Defaultdict is when default values are the same.
Of course there are many times you set primative values that don't need computing as defaults, and they can of course use dict.setdefault. But how about the more complex cases?
dict.setdefault will precisely "set a value in a dict only if the value is not already set".
You still need to compute the value to pass it in as the parameter:
connection_settings.setdefault("timeout", compute_default_timeout(connection_settings))
This is a bit of a non-answer, but I would say the most pythonic is the if statement as you have it. You resisted the urge to one-liner it with __setitem__ or other methods. You've avoided possible bugs in the logic due to existing-but-falsey values which might happen when trying to be clever with short-circuiting and/or hacks. It's immediately obvious that the compute function isn't used when it wasn't necessary.
It's clear, concise, and readable - pythonic.
One way to do this is:
if key not in dict:
dict[key] = value
Since Python 3.9 you can use the merge operator | to merge two dictionaries. The dict on the right takes precedence:
d = { key: value } | d
Note: this creates a new dictionary with the updated values.
You probably need dict.setdefault:
Create a new dictionary and set a value:
>>> d = {}
>>> d.setdefault('timeout', 120)
120
>>> d
{'timeout': 120}
If a value already set, dict.setdefault won't override it:
>>> d['port']=8080
>>> d.setdefault('port', 8888)
8080
>>> d
{'port': 8080, 'timeout': 120}
I'm using the following to modify kwargs to non-default values and pass to another function:
def f( **non_default_kwargs ):
kwargs = {
'a':1,
'b':2,
}
kwargs.update( non_default_kwargs )
f2( **kwargs )
This has the merits that
you don't have to type the keys twice
all is done in a single function
The answer by #Rotareti makes me wonder if for older version of Python then 3.9, we can do:
>>> dict_a = {'a': 1 }
>>> dict_a = {'a': 3, 'b': 2, **dict_a}
>>> dict_a
{'a': 1, 'b': 2}
(Well, it works for sure on Python3.7, but is this Pythonesque enough?)
I found it convenient and obvious to exploit the return of the dict .get() method being None (Falsy), along with or to put off evaluation of an expensive network request if the key was not present.
d = dict()
def fetch_and_set(d, key):
d[key] = ("expensive operation to fetch key")
if not d[key]:
raise Exception("could not get value")
return d[key]
...
value = d.get(key) or fetch_and_set(d, key)
In my case specifically, I was building a new dictionary from a cache then later updating the cache after expediting the fn() call.
Here's a simplified view of my use
j = load(database) # dict
d = dict()
# see if desired keys are in the cache, else fetch
for key in keys:
d[key] = j.get(key) or fetch(key, network_token)
fn(d) # use d for something useful
j.update(d) # update database with new values (if any)
Here's a common situation when compiling data in dictionaries from different sources:
Say you have a dictionary that stores lists of things, such as things I like:
likes = {
'colors': ['blue','red','purple'],
'foods': ['apples', 'oranges']
}
and a second dictionary with some related values in it:
favorites = {
'colors':'yellow',
'desserts':'ice cream'
}
You then want to iterate over the "favorites" object and either append the items in that object to the list with the appropriate key in the "likes" dictionary or add a new key to it with the value being a list containing the value in "favorites".
There are several ways to do this:
for key in favorites:
if key in likes:
likes[key].append(favorites[key])
else:
likes[key] = list(favorites[key])
or
for key in favorites:
try:
likes[key].append(favorites[key])
except KeyError:
likes[key] = list(favorites[key])
And many more as well...
I generally use the first syntax because it feels more pythonic, but if there are other, better ways, I'd love to know what they are. Thanks!
Use collections.defaultdict, where the default value is a new list instance.
>>> import collections
>>> mydict = collections.defaultdict(list)
In this way calling .append(...) will always succeed, because in case of a non-existing key append will be called on a fresh empty list.
You can instantiate the defaultdict with a previously generated list, in case you get the dict likes from another source, like so:
>>> mydict = collections.defaultdict(list, likes)
Note that using list as the default_factory attribute of a defaultdict is also discussed as an example in the documentation.
Use collections.defaultdict:
import collections
likes = collections.defaultdict(list)
for key, value in favorites.items():
likes[key].append(value)
defaultdict takes a single argument, a factory for creating values for unknown keys on demand. list is a such a function, it creates empty lists.
And iterating over .items() will save you from using the key to get the value.
Except defaultdict, the regular dict offers one possibility (that might look a bit strange): dict.setdefault(k[, d]):
for key, val in favorites.iteritems():
likes.setdefault(key, []).append(val)
Thank you for the +20 in rep -- I went from 1989 to 2009 in 30 seconds. Let's remember it is 20 years since the Wall fell in Europe..
>>> from collections import defaultdict
>>> d = defaultdict(list, likes)
>>> d
defaultdict(<class 'list'>, {'colors': ['blue', 'red', 'purple'], 'foods': ['apples', 'oranges']})
>>> for i, j in favorites.items():
d[i].append(j)
>>> d
defaultdict(<class 'list'>, {'desserts': ['ice cream'], 'colors': ['blue', 'red', 'purple', 'yellow'], 'foods': ['apples', 'oranges']})
All of the answers are defaultdict, but I'm not sure that's the best way to go about it. Giving out defaultdict to code that expects a dict can be bad. (See: How do I make a defaultdict safe for unexpecting clients? ) I'm personally torn on the matter. (I actually found this question looking for an answer to "which is better, dict.get() or defaultdict") Someone in the other thread said that you don't want a defaultdict if you don't want this behavior all the time, and that might be true. Maybe using defaultdict for the convenience is the wrong way to go about it. I think there are two needs being conflated here:
"I want a dict whose default values are empty lists." to which defaultdict(list) is the correct solution.
and
"I want to append to the list at this key if it exists and create a list if it does not exist." to which my_dict.get('foo', []) with append() is the answer.
What do you guys think?