Is it reasonable to use None as a dictionary key in Python? - python

None seems to work as a dictionary key, but I am wondering if that will just lead to trouble later. For example, this works:
>>> x={'a':1, 'b':2, None:3}
>>> x
{'a': 1, None: 3, 'b': 2}
>>> x[None]
3
The actual data I am working with is educational standards. Every standard is associated with a content area. Some standards are also associated with content subareas. I would like to make a nested dictionary of the form {contentArea:{contentSubArea:[standards]}}. Some of those contentSubArea keys would be None.
In particular, I am wondering if this will lead to confusion if I look for a key that does not exist at some point, or something unanticipated like that.

Any hashable value is a valid Python Dictionary Key. For this reason, None is a perfectly valid candidate. There's no confusion when looking for non-existent keys - the presence of None as a key would not affect the ability to check for whether another key was present. Ex:
>>> d = {1: 'a', 2: 'b', None: 'c'}
>>> 1 in d
True
>>> 5 in d
False
>>> None in d
True
There's no conflict, and you can test for it just like normal. It shouldn't cause you a problem. The standard 1-to-1 Key-Value association still exists, so you can't have multiple things in the None key, but using None as a key shouldn't pose a problem by itself.

You want trouble? here we go:
>>> json.loads(json.dumps({None:None}))
{u'null': None}
So yea, better stay away from json if you do use None as a key. You can patch this by custom (de/)serializer, but I would advise against use of None as a key in the first place.

None is not special in any particular way, it's just another python value. Its only distinction is that it happens to be the return value of a function that doesn't specify any other return value, and it also happens to be a common default value (the default arg of dict.get(), for instance).
You won't cause any run-time conflicts using such a key, but you should ask yourself if that's really a meaningful value to use for a key. It's often more helpful, from the point of view of reading code and understanding what it does, to use a designated instance for special values. Something like:
NoSubContent = SubContentArea(name=None)
{"contentArea":
{NoSubContent:[standards],
SubContentArea(name="Fruits"): ['apples', 'bananas']}}

jsonify does not support a dictionary with None key.
From Flask import jsonify
def json_():
d = {None: 'None'}
return jsonify(d)
This will throw an error:
TypeError: '<' not supported between instances of 'NoneType' and 'str'

It seems to me, the larger, later problem is this. If your process is creating pairs and some pairs have a "None" key, then it will overwrite all the previous None pairs. Your dictionary will silently throw out values because you had duplicate None keys. No?

Funny though, even this works :
d = {None: 'None'}
In [10]: None in d
Out[10]: True

Related

Updating Python dict at the time of initialization

I was just playing around with Python's dictionaries and lists. And I found these weird things -
At the time of initialization of any Python dict, list or some other data types, you can't perform operations of that data type.
# Case 1
d1 = {}
d1.update({'a': 'A'})
print(d1)
# Case 2
d2 = {}.update({'p': 'P'})
print(d2)
Output:
{'a': 'A'}
None
The weird thing is, neither d2 initialized not it threw any error.
What I think about this?
Well, "Python's interpreter reads code line by line". So, when it reads the line d1 = {} it saves d1 and it's type (dict) in memory.
But this is not happening with d2 = {}.update({'p': 'P'})
Any operation of dict can be performed on dict object, which in the second case never initiated, ie. dictionary object was never created.
What you think about this?
Please drop your answers and correct me, if I was wrong. Which I guess, I may be.
dict.update() is an operation which has no return type.
Well, .update() does the dictionary update in place and returns nothing, so when you are initiaising an empty dictionary and updating it, it returns Nothing and None gets assigned to the d2. Instead it will be like:
d = {}
d.update({'p': 'P'})
print(d)

A "pythonic" strategy to check whether a key already exists in a dictionary

I often deal with heterogeneous datasets and I acquire them as dictionaries in my python routines. I usually face the problem that the key of the next entry I am going to add to the dictionary already exists.
I was wondering if there exists a more "pythonic" way to do the following task: check whether the key exists and create/update the corresponding pair key-item of my dictionary
myDict = dict()
for line in myDatasetFile:
if int(line[-1]) in myDict.keys():
myDict[int(line[-1])].append([line[2],float(line[3])])
else:
myDict[int(line[-1])] = [[line[2],float(line[3])]]
Use a defaultdict.
from collections import defaultdict
d = defaultdict(list)
# Every time you try to access the value of a key that isn't in the dict yet,
# d will call list with no arguments (producing an empty list),
# store the result as the new value, and give you that.
for line in myDatasetFile:
d[int(line[-1])].append([line[2],float(line[3])])
Also, never use thing in d.keys(). In Python 2, that will create a list of keys and iterate through it one item at a time to find the key instead of using a hash-based lookup. In Python 3, it's not quite as horrible, but it's still redundant and still slower than the right way, which is thing in d.
Its what that dict.setdefault is for.
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
example :
>>> d={}
>>> d.setdefault('a',[]).append([1,2])
>>> d
{'a': [[1, 2]]}
Python follows the idea that it's easier to ask for forgiveness than permission.
so the true Pythonic way would be:
try:
myDict[int(line[-1])].append([line[2],float(line[3])])
except KeyError:
myDict[int(line[-1])] = [[line[2],float(line[3])]]
for reference:
https://docs.python.org/2/glossary.html#term-eafp
https://stackoverflow.com/questions/6092992/why-is-it-easier-to-ask-forgiveness-than-permission-in-python-but-not-in-java
Try to catch the Exception when you get a KeyError
myDict = dict()
for line in myDatasetFile:
try:
myDict[int(line[-1])].append([line[2],float(line[3])])
except KeyError:
myDict[int(line[-1])] = [[line[2],float(line[3])]]
Or use:
myDict = dict()
for line in myDatasetFile:
myDict.setdefault(int(line[-1]),[]).append([line[2],float(line[3])])

python dictionary check if any key other than given keys exist

Lets say I have a dictionary that specifies some properties for a package:
d = {'from': 'Bob', 'to': 'Joe', 'item': 'book', 'weight': '3.5lbs'}
To check the validity of a package dictionary, it needs to have a 'from' and 'to' key, and any number of properties, but there must be at least one property. So a dictionary can have either 'item' or 'weight', both, but can't have neither. The property keys could be anything, not limited to 'item' or 'weight'.
How would I check dictionaries to make sure they're valid, as in having the 'to', 'from', and at least one other key?
The only method I can think of is by obtaining d.keys(), removing the 'from' and 'to' keys, and checking if its empty.
Is there a better way to go about doing this?
must = {"from", "to"}
print len(d) > len(must) and all(key in d for key in must)
# True
This solution makes sure that your dictionary has more elements than the elements in the must set and also all the elements in must will be there in the dictionary.
The advantage of this solution is that, it is easily extensible. If you want to make sure that one more parameter exists in the dictionary, just include that in the must dictionary, it will work fine. You don't have to alter the logic.
Edit
Apart from that, if you are using Python 2.7, you can do this more succinctly like this
print d.viewkeys() > {"from", "to"}
If you are using Python 3.x, you can simply write that as
print(d.keys() > {"from", "to"})
This hack works because, d.viewkeys and d.keys return set-like objects. So, we can use set comparison operators. > is used to check if the left hand side set is a strict superset of the right hand side set. So, in order to satisfy the condition, the left hand side set-like object should have both from and to, and some other object.
Quoting from the set.issuperset docs,
set > other
Test whether the set is a proper superset of other, that is, set >= other and set != other.
if d.keys() has a length of at least 3, and it has a from and to attribute, you're golden.
My knowledge of Python isn't the greatest but I imagine it goes something like if len(d.keys) > 2 and d['from'] and d['to']
Use the following code:
def exists(var, dict):
try:
x = dict[var]
return True
except KeyError:
return False
def check(dict):
if exists('from', dict) == False:
return False
if exists('to', dict) == False:
return False
if exists('item', dict) == False and exists('weight', dict) == False:
return False
return True
def main():
d = {'from': 'Bob', 'to': 'Joe', 'item': 'book', 'weight': '3.5lbs'}
mybool = check(d)
print mybool
if __name__ == '__main__':
main()
This doesn't address the problem OP has, but provides what I think to be a better practice solution. I realize there's already been an answer but I just spent a few minutes reading on best practices and thought I would share
Problems with using a dictionary:
Dictionaries are meant to be on a key value basis. You inherently have 2 different types of key values given that to and from are mandatory while item and weight are optional
Dictionaries are meant to be logic-less. By setting certain requirements, you violate the principal of a dictionary which is just meant to hold data. To make a instance you need to build some sort of logic constructor for the dictionary
So why not just use a class? Proposed alternative:
class D(dict): # inheirits dict
def __init__ (self,t,f,**attributes): # from is a keyword
self['to'] = t
self['from'] = f
if(len(attributes) > 0):
self.update(attributes)
else:
raise Exception("Require attribute")
d = D('jim','bob',item='book')
print d # {'to': 'jim', 'from': 'bob', 'item': 'book'}
print d['to'] # jim
print d['item'] # item
print d['from'] # bob
d = D('jim','bob') # throws error
Obviously this falls apart if to and from are set asynchronously but I think the base idea still holds. Creating a class also gives you the verbosity to prevent to and from from being overwritten/deleted as well as limiting the minimum/maximum of attributes set.

How do I pythonically set a value in a dictionary if it is None?

This code is not bad, but I want to know how good programmers will write the code
if count.get('a') is None:
count['a'] = 0
You can use dict.setdefault :
count.setdefault('a', 0)
help on dict.setdefault:
>>> print dict.setdefault.__doc__
D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D
setdefault is the best answer, but for the record, the Pythonic way to check for a key in a dict is using the in keyword:
if 'a' not in count:
count['a'] = 0
Looking at the selection of answer, I believe the question is somewhat incorrectly phrased.
set a value in a dictionary if it is None?
In fact if the title is correct in asking about setting a value if it is None, setdefault doesn't set the value, instead returns that None.
a_dict = {'key': None}
assert a_dict.setdefault('key', True) is None
I don't think it's a very common situation when you want to update the dictionary if a key has a value of None (as opposed to not having that key at all, in which case the setdefault is the way to go.) or if it's not in there at all. In that case the following should work and seems the most pythonic to me.
if not a_dict.get('key'):
a_dict['key'] = 'value'

Set a value in a dict only if the value is not already set

What is the most pythonic way to set a value in a dict if the value is not already set?
At the moment my code uses if statements:
if "timeout" not in connection_settings:
connection_settings["timeout"] = compute_default_timeout(connection_settings)
dict.get(key,default) is appropriate for code consuming a dict, not for code that is preparing a dict to be passed to another function. You can use it to set something but its no prettier imo:
connection_settings["timeout"] = connection_settings.get("timeout", \
compute_default_timeout(connection_settings))
would evaluate the compute function even if the dict contained the key; bug.
Defaultdict is when default values are the same.
Of course there are many times you set primative values that don't need computing as defaults, and they can of course use dict.setdefault. But how about the more complex cases?
dict.setdefault will precisely "set a value in a dict only if the value is not already set".
You still need to compute the value to pass it in as the parameter:
connection_settings.setdefault("timeout", compute_default_timeout(connection_settings))
This is a bit of a non-answer, but I would say the most pythonic is the if statement as you have it. You resisted the urge to one-liner it with __setitem__ or other methods. You've avoided possible bugs in the logic due to existing-but-falsey values which might happen when trying to be clever with short-circuiting and/or hacks. It's immediately obvious that the compute function isn't used when it wasn't necessary.
It's clear, concise, and readable - pythonic.
One way to do this is:
if key not in dict:
dict[key] = value
Since Python 3.9 you can use the merge operator | to merge two dictionaries. The dict on the right takes precedence:
d = { key: value } | d
Note: this creates a new dictionary with the updated values.
You probably need dict.setdefault:
Create a new dictionary and set a value:
>>> d = {}
>>> d.setdefault('timeout', 120)
120
>>> d
{'timeout': 120}
If a value already set, dict.setdefault won't override it:
>>> d['port']=8080
>>> d.setdefault('port', 8888)
8080
>>> d
{'port': 8080, 'timeout': 120}
I'm using the following to modify kwargs to non-default values and pass to another function:
def f( **non_default_kwargs ):
kwargs = {
'a':1,
'b':2,
}
kwargs.update( non_default_kwargs )
f2( **kwargs )
This has the merits that
you don't have to type the keys twice
all is done in a single function
The answer by #Rotareti makes me wonder if for older version of Python then 3.9, we can do:
>>> dict_a = {'a': 1 }
>>> dict_a = {'a': 3, 'b': 2, **dict_a}
>>> dict_a
{'a': 1, 'b': 2}
(Well, it works for sure on Python3.7, but is this Pythonesque enough?)
I found it convenient and obvious to exploit the return of the dict .get() method being None (Falsy), along with or to put off evaluation of an expensive network request if the key was not present.
d = dict()
def fetch_and_set(d, key):
d[key] = ("expensive operation to fetch key")
if not d[key]:
raise Exception("could not get value")
return d[key]
...
value = d.get(key) or fetch_and_set(d, key)
In my case specifically, I was building a new dictionary from a cache then later updating the cache after expediting the fn() call.
Here's a simplified view of my use
j = load(database) # dict
d = dict()
# see if desired keys are in the cache, else fetch
for key in keys:
d[key] = j.get(key) or fetch(key, network_token)
fn(d) # use d for something useful
j.update(d) # update database with new values (if any)

Categories

Resources