Related
Today, I came across the dict method get which, given a key in the dictionary, returns the associated value.
For what purpose is this function useful? If I wanted to find a value associated with a key in a dictionary, I can just do dict[key], and it returns the same thing:
dictionary = {"Name": "Harry", "Age": 17}
dictionary["Name"]
dictionary.get("Name")
It allows you to provide a default value if the key is missing:
dictionary.get("bogus", default_value)
returns default_value (whatever you choose it to be), whereas
dictionary["bogus"]
would raise a KeyError.
If omitted, default_value is None, such that
dictionary.get("bogus") # <-- No default specified -- defaults to None
returns None just like
dictionary.get("bogus", None)
would.
What is the dict.get() method?
As already mentioned the get method contains an additional parameter which indicates the missing value. From the documentation
get(key[, default])
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
An example can be
>>> d = {1:2,2:3}
>>> d[1]
2
>>> d.get(1)
2
>>> d.get(3)
>>> repr(d.get(3))
'None'
>>> d.get(3,1)
1
Are there speed improvements anywhere?
As mentioned here,
It seems that all three approaches now exhibit similar performance (within about 10% of each other), more or less independent of the properties of the list of words.
Earlier get was considerably slower, However now the speed is almost comparable along with the additional advantage of returning the default value. But to clear all our queries, we can test on a fairly large list (Note that the test includes looking up all the valid keys only)
def getway(d):
for i in range(100):
s = d.get(i)
def lookup(d):
for i in range(100):
s = d[i]
Now timing these two functions using timeit
>>> import timeit
>>> print(timeit.timeit("getway({i:i for i in range(100)})","from __main__ import getway"))
20.2124660015
>>> print(timeit.timeit("lookup({i:i for i in range(100)})","from __main__ import lookup"))
16.16223979
As we can see the lookup is faster than the get as there is no function lookup. This can be seen through dis
>>> def lookup(d,val):
... return d[val]
...
>>> def getway(d,val):
... return d.get(val)
...
>>> dis.dis(getway)
2 0 LOAD_FAST 0 (d)
3 LOAD_ATTR 0 (get)
6 LOAD_FAST 1 (val)
9 CALL_FUNCTION 1
12 RETURN_VALUE
>>> dis.dis(lookup)
2 0 LOAD_FAST 0 (d)
3 LOAD_FAST 1 (val)
6 BINARY_SUBSCR
7 RETURN_VALUE
Where will it be useful?
It will be useful whenever you want to provide a default value whenever you are looking up a dictionary. This reduces
if key in dic:
val = dic[key]
else:
val = def_val
To a single line, val = dic.get(key,def_val)
Where will it be NOT useful?
Whenever you want to return a KeyError stating that the particular key is not available. Returning a default value also carries the risk that a particular default value may be a key too!
Is it possible to have get like feature in dict['key']?
Yes! We need to implement the __missing__ in a dict subclass.
A sample program can be
class MyDict(dict):
def __missing__(self, key):
return None
A small demonstration can be
>>> my_d = MyDict({1:2,2:3})
>>> my_d[1]
2
>>> my_d[3]
>>> repr(my_d[3])
'None'
get takes a second optional value. If the specified key does not exist in your dictionary, then this value will be returned.
dictionary = {"Name": "Harry", "Age": 17}
dictionary.get('Year', 'No available data')
>> 'No available data'
If you do not give the second parameter, None will be returned.
If you use indexing as in dictionary['Year'], nonexistent keys will raise KeyError.
A gotcha to be aware of when using .get():
If the dictionary contains the key used in the call to .get() and its value is None, the .get() method will return None even if a default value is supplied.
For example, the following returns None, not 'alt_value' as may be expected:
d = {'key': None}
assert None is d.get('key', 'alt_value')
.get()'s second value is only returned if the key supplied is NOT in the dictionary, not if the return value of that call is None.
I will give a practical example in scraping web data using python, a lot of the times you will get keys with no values, in those cases you will get errors if you use dictionary['key'], whereas dictionary.get('key', 'return_otherwise') has no problems.
Similarly, I would use ''.join(list) as opposed to list[0] if you try to capture a single value from a list.
hope it helps.
[Edit] Here is a practical example:
Say, you are calling an API, which returns a JOSN file you need to parse. The first JSON looks like following:
{"bids":{"id":16210506,"submitdate":"2011-10-16 15:53:25","submitdate_f":"10\/16\/2011 at 21:53 CEST","submitdate_f2":"p\u0159ed 2 lety","submitdate_ts":1318794805,"users_id":"2674360","project_id":"1250499"}}
The second JOSN is like this:
{"bids":{"id":16210506,"submitdate":"2011-10-16 15:53:25","submitdate_f":"10\/16\/2011 at 21:53 CEST","submitdate_f2":"p\u0159ed 2 lety","users_id":"2674360","project_id":"1250499"}}
Note that the second JSON is missing the "submitdate_ts" key, which is pretty normal in any data structure.
So when you try to access the value of that key in a loop, can you call it with the following:
for item in API_call:
submitdate_ts = item["bids"]["submitdate_ts"]
You could, but it will give you a traceback error for the second JSON line, because the key simply doesn't exist.
The appropriate way of coding this, could be the following:
for item in API_call:
submitdate_ts = item.get("bids", {'x': None}).get("submitdate_ts")
{'x': None} is there to avoid the second level getting an error. Of course you can build in more fault tolerance into the code if you are doing scraping. Like first specifying a if condition
The purpose is that you can give a default value if the key is not found, which is very useful
dictionary.get("Name",'harry')
For what purpose is this function useful?
One particular usage is counting with a dictionary. Let's assume you want to count the number of occurrences of each element in a given list. The common way to do so is to make a dictionary where keys are elements and values are the number of occurrences.
fruits = ['apple', 'banana', 'peach', 'apple', 'pear']
d = {}
for fruit in fruits:
if fruit not in d:
d[fruit] = 0
d[fruit] += 1
Using the .get() method, you can make this code more compact and clear:
for fruit in fruits:
d[fruit] = d.get(fruit, 0) + 1
Other answers have clearly explained the difference between dict bracket keying and .get and mentioned a fairly innocuous pitfall when None or the default value is also a valid key.
Given this information, it may be tempting conclude that .get is somehow safer and better than bracket indexing and should always be used instead of bracket lookups, as argued in Stop Using Square Bracket Notation to Get a Dictionary's Value in Python, even in the common case when they expect the lookup to succeed (i.e. never raise a KeyError).
The author of the blog post argues that .get "safeguards your code":
Notice how trying to reference a term that doesn't exist causes a KeyError. This can cause major headaches, especially when dealing with unpredictable business data.
While we could wrap our statement in a try/except or if statement, this much care for a dictionary term will quickly pile up.
It's true that in the uncommon case for null (None)-coalescing or otherwise filling in a missing value to handle unpredictable dynamic data, a judiciously-deployed .get is a useful and Pythonic shorthand tool for ungainly if key in dct: and try/except blocks that only exist to set default values when the key might be missing as part of the behavioral specification for the program.
However, replacing all bracket dict lookups, including those that you assert must succeed, with .get is a different matter. This practice effectively downgrades a class of runtime errors that help reveal bugs into silent illegal state scenarios that tend to be harder to identify and debug.
A common mistake among programmers is to think exceptions cause headaches and attempt to suppress them, using techniques like wrapping code in try ... except: pass blocks. They later realize the real headache is never seeing the breach of application logic at the point of failure and deploying a broken application. Better programming practice is to embrace assertions for all program invariants such as keys that must be in a dictionary.
The hierarchy of error safety is, broadly:
Error category
Relative ease of debugging
Compile-time error
Easy; go to the line and fix the problem
Runtime exception
Medium; control needs to flow to the error and it may be due to unanticipated edge cases or hard-to-reproduce state like a race condition between threads, but at least we get a clear error message and stack trace when it does happen.
Silent logical error
Difficult; we may not even know it exists, and when we do, tracking down state that caused it can be very challenging due to lack of locality and potential for multiple assertion breaches.
When programming language designers talk about program safety, a major goal is to surface, not suppress, genuine errors by promoting runtime errors to compile-time errors and promote silent logical errors to either runtime exceptions or (ideally) compile-time errors.
Python, by design as an interpreted language, relies heavily on runtime exceptions instead of compiler errors. Missing methods or properties, illegal type operations like 1 + "a" and out of bounds or missing indices or keys raise by default.
Some languages like JS, Java, Rust and Go use the fallback behavior for their maps by default (and in many cases, don't provide a throw/raise alternative), but Python throws by default, along with other languages like C#. Perl/PHP issue an uninitialized value warning.
Indiscriminate application of .get to all dict accesses, even those that aren't expected to fail and have no fallback for dealing with None (or whatever default is used) running amok through the code, pretty much tosses away Python's runtime exception safety net for this class of errors, silencing or adding indirection to potential bugs.
Other supporting reasons to prefer bracket lookups (with the occasional, well-placed .get where a default is expected):
Prefer writing standard, idiomatic code using the tools provided by the language. Python programmers usually (correctly) prefer brackets for the exception safety reasons given above and because it's the default behavior for Python dicts.
Always using .get forfeits intent by making cases when you expect to provide a default None value indistinguishable from a lookup you assert must succeed.
Testing increases in complexity in proportion to the new "legal" program paths permitted by .get. Effectively, each lookup is now a branch that can succeed or fail -- both cases must be tested to establish coverage, even if the default path is effectively unreachable by specification (ironically leading to additional if val is not None: or try for all future uses of the retrieved value; unnecessary and confusing for something that should never be None in the first place).
.get is a bit slower.
.get is harder to type and uglier to read (compare Java's tacked-on-feel ArrayList syntax to native-feel C# Lists or C++ vector code). Minor.
Some languages like C++ and Ruby offer alternate methods (at and fetch, respectively) to opt-in to throwing an error on a bad access, while C# offers opt-in fallback value TryGetValue similar to Python's get.
Since JS, Java, Ruby, Go and Rust bake the fallback approach of .get into all hash lookups by default, it can't be that bad, one might think. It's true that this isn't the largest issue facing language designers and there are plenty of use cases for the no-throw access version, so it's unsurprising that there's no consensus across languages.
But as I've argued, Python (along with C#) has done better than these languages by making the assert option the default. It's a loss of safety and expressivity to opt-out of using it to report contract violations at the point of failure by indiscriminately using .get across the board.
Why dict.get(key) instead of dict[key]?
0. Summary
Comparing to dict[key], dict.get provides a fallback value when looking up for a key.
1. Definition
get(key[, default]) 4. Built-in Types — Python 3.6.4rc1 documentation
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
d = {"Name": "Harry", "Age": 17}
In [4]: d['gender']
KeyError: 'gender'
In [5]: d.get('gender', 'Not specified, please add it')
Out[5]: 'Not specified, please add it'
2. Problem it solves.
If without default value, you have to write cumbersome codes to handle such an exception.
def get_harry_info(key):
try:
return "{}".format(d[key])
except KeyError:
return 'Not specified, please add it'
In [9]: get_harry_info('Name')
Out[9]: 'Harry'
In [10]: get_harry_info('Gender')
Out[10]: 'Not specified, please add it'
As a convenient solution, dict.get introduces an optional default value avoiding above unwiedly codes.
3. Conclusion
dict.get has an additional default value option to deal with exception if key is absent from the dictionary
One difference, that can be an advantage, is that if we are looking for a key that doesn't exist we will get None, not like when we use the brackets notation, in which case we will get an error thrown:
print(dictionary.get("address")) # None
print(dictionary["address"]) # throws KeyError: 'address'
Last thing that is cool about the get method, is that it receives an additional optional argument for a default value, that is if we tried to get the score value of a student, but the student doesn't have a score key we can get a 0 instead.
So instead of doing this (or something similar):
score = None
try:
score = dictionary["score"]
except KeyError:
score = 0
We can do this:
score = dictionary.get("score", 0)
# score = 0
One other use-case that I do not see mentioned is as the key argument for functions like sorted, max and min. The get method allows for keys to be returned based on their values.
>>> ages = {"Harry": 17, "Lucy": 16, "Charlie": 18}
>>> print(sorted(ages, key=ages.get))
['Lucy', 'Harry', 'Charlie']
>>> print(max(ages, key=ages.get))
Charlie
>>> print(min(ages, key=ages.get))
Lucy
Thanks to this answer to a different question for providing this use-case!
Short answer
The square brackets are used for conditional lookups which can fail with a KeyError when the key is missing.
The get() method is used from unconditional lookups that never fail because a default value has been supplied.
Base method and helper method
The square brackets call the __getitem__ method which is fundamental for mappings like dicts.
The get() method is a helper layered on top of that functionality. It is a short-cut for the common coding pattern:
try:
v = d[k]
except KeyError:
v = default_value
It allow you to provide a default value, instead of get an error when the value is not found. persuedocode like this :
class dictionary():
def get(self,key,default):
if self[key] is not found :
return default
else:
return self[key]
With Python 3.8 and after, the dictionary get() method can be used with the walrus operator := in an assignment expression to further reduce code:
if (name := dictonary.get("Name")) is not None
return name
Using [] instead of get() would require wrapping the code in a try/except block and catching KeyError (not shown). And without the walrus operator, you would need another line of code:
name = dictionary.get("Name")
if (name is not None)
return name
Today, I came across the dict method get which, given a key in the dictionary, returns the associated value.
For what purpose is this function useful? If I wanted to find a value associated with a key in a dictionary, I can just do dict[key], and it returns the same thing:
dictionary = {"Name": "Harry", "Age": 17}
dictionary["Name"]
dictionary.get("Name")
It allows you to provide a default value if the key is missing:
dictionary.get("bogus", default_value)
returns default_value (whatever you choose it to be), whereas
dictionary["bogus"]
would raise a KeyError.
If omitted, default_value is None, such that
dictionary.get("bogus") # <-- No default specified -- defaults to None
returns None just like
dictionary.get("bogus", None)
would.
What is the dict.get() method?
As already mentioned the get method contains an additional parameter which indicates the missing value. From the documentation
get(key[, default])
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
An example can be
>>> d = {1:2,2:3}
>>> d[1]
2
>>> d.get(1)
2
>>> d.get(3)
>>> repr(d.get(3))
'None'
>>> d.get(3,1)
1
Are there speed improvements anywhere?
As mentioned here,
It seems that all three approaches now exhibit similar performance (within about 10% of each other), more or less independent of the properties of the list of words.
Earlier get was considerably slower, However now the speed is almost comparable along with the additional advantage of returning the default value. But to clear all our queries, we can test on a fairly large list (Note that the test includes looking up all the valid keys only)
def getway(d):
for i in range(100):
s = d.get(i)
def lookup(d):
for i in range(100):
s = d[i]
Now timing these two functions using timeit
>>> import timeit
>>> print(timeit.timeit("getway({i:i for i in range(100)})","from __main__ import getway"))
20.2124660015
>>> print(timeit.timeit("lookup({i:i for i in range(100)})","from __main__ import lookup"))
16.16223979
As we can see the lookup is faster than the get as there is no function lookup. This can be seen through dis
>>> def lookup(d,val):
... return d[val]
...
>>> def getway(d,val):
... return d.get(val)
...
>>> dis.dis(getway)
2 0 LOAD_FAST 0 (d)
3 LOAD_ATTR 0 (get)
6 LOAD_FAST 1 (val)
9 CALL_FUNCTION 1
12 RETURN_VALUE
>>> dis.dis(lookup)
2 0 LOAD_FAST 0 (d)
3 LOAD_FAST 1 (val)
6 BINARY_SUBSCR
7 RETURN_VALUE
Where will it be useful?
It will be useful whenever you want to provide a default value whenever you are looking up a dictionary. This reduces
if key in dic:
val = dic[key]
else:
val = def_val
To a single line, val = dic.get(key,def_val)
Where will it be NOT useful?
Whenever you want to return a KeyError stating that the particular key is not available. Returning a default value also carries the risk that a particular default value may be a key too!
Is it possible to have get like feature in dict['key']?
Yes! We need to implement the __missing__ in a dict subclass.
A sample program can be
class MyDict(dict):
def __missing__(self, key):
return None
A small demonstration can be
>>> my_d = MyDict({1:2,2:3})
>>> my_d[1]
2
>>> my_d[3]
>>> repr(my_d[3])
'None'
get takes a second optional value. If the specified key does not exist in your dictionary, then this value will be returned.
dictionary = {"Name": "Harry", "Age": 17}
dictionary.get('Year', 'No available data')
>> 'No available data'
If you do not give the second parameter, None will be returned.
If you use indexing as in dictionary['Year'], nonexistent keys will raise KeyError.
A gotcha to be aware of when using .get():
If the dictionary contains the key used in the call to .get() and its value is None, the .get() method will return None even if a default value is supplied.
For example, the following returns None, not 'alt_value' as may be expected:
d = {'key': None}
assert None is d.get('key', 'alt_value')
.get()'s second value is only returned if the key supplied is NOT in the dictionary, not if the return value of that call is None.
I will give a practical example in scraping web data using python, a lot of the times you will get keys with no values, in those cases you will get errors if you use dictionary['key'], whereas dictionary.get('key', 'return_otherwise') has no problems.
Similarly, I would use ''.join(list) as opposed to list[0] if you try to capture a single value from a list.
hope it helps.
[Edit] Here is a practical example:
Say, you are calling an API, which returns a JOSN file you need to parse. The first JSON looks like following:
{"bids":{"id":16210506,"submitdate":"2011-10-16 15:53:25","submitdate_f":"10\/16\/2011 at 21:53 CEST","submitdate_f2":"p\u0159ed 2 lety","submitdate_ts":1318794805,"users_id":"2674360","project_id":"1250499"}}
The second JOSN is like this:
{"bids":{"id":16210506,"submitdate":"2011-10-16 15:53:25","submitdate_f":"10\/16\/2011 at 21:53 CEST","submitdate_f2":"p\u0159ed 2 lety","users_id":"2674360","project_id":"1250499"}}
Note that the second JSON is missing the "submitdate_ts" key, which is pretty normal in any data structure.
So when you try to access the value of that key in a loop, can you call it with the following:
for item in API_call:
submitdate_ts = item["bids"]["submitdate_ts"]
You could, but it will give you a traceback error for the second JSON line, because the key simply doesn't exist.
The appropriate way of coding this, could be the following:
for item in API_call:
submitdate_ts = item.get("bids", {'x': None}).get("submitdate_ts")
{'x': None} is there to avoid the second level getting an error. Of course you can build in more fault tolerance into the code if you are doing scraping. Like first specifying a if condition
The purpose is that you can give a default value if the key is not found, which is very useful
dictionary.get("Name",'harry')
For what purpose is this function useful?
One particular usage is counting with a dictionary. Let's assume you want to count the number of occurrences of each element in a given list. The common way to do so is to make a dictionary where keys are elements and values are the number of occurrences.
fruits = ['apple', 'banana', 'peach', 'apple', 'pear']
d = {}
for fruit in fruits:
if fruit not in d:
d[fruit] = 0
d[fruit] += 1
Using the .get() method, you can make this code more compact and clear:
for fruit in fruits:
d[fruit] = d.get(fruit, 0) + 1
Other answers have clearly explained the difference between dict bracket keying and .get and mentioned a fairly innocuous pitfall when None or the default value is also a valid key.
Given this information, it may be tempting conclude that .get is somehow safer and better than bracket indexing and should always be used instead of bracket lookups, as argued in Stop Using Square Bracket Notation to Get a Dictionary's Value in Python, even in the common case when they expect the lookup to succeed (i.e. never raise a KeyError).
The author of the blog post argues that .get "safeguards your code":
Notice how trying to reference a term that doesn't exist causes a KeyError. This can cause major headaches, especially when dealing with unpredictable business data.
While we could wrap our statement in a try/except or if statement, this much care for a dictionary term will quickly pile up.
It's true that in the uncommon case for null (None)-coalescing or otherwise filling in a missing value to handle unpredictable dynamic data, a judiciously-deployed .get is a useful and Pythonic shorthand tool for ungainly if key in dct: and try/except blocks that only exist to set default values when the key might be missing as part of the behavioral specification for the program.
However, replacing all bracket dict lookups, including those that you assert must succeed, with .get is a different matter. This practice effectively downgrades a class of runtime errors that help reveal bugs into silent illegal state scenarios that tend to be harder to identify and debug.
A common mistake among programmers is to think exceptions cause headaches and attempt to suppress them, using techniques like wrapping code in try ... except: pass blocks. They later realize the real headache is never seeing the breach of application logic at the point of failure and deploying a broken application. Better programming practice is to embrace assertions for all program invariants such as keys that must be in a dictionary.
The hierarchy of error safety is, broadly:
Error category
Relative ease of debugging
Compile-time error
Easy; go to the line and fix the problem
Runtime exception
Medium; control needs to flow to the error and it may be due to unanticipated edge cases or hard-to-reproduce state like a race condition between threads, but at least we get a clear error message and stack trace when it does happen.
Silent logical error
Difficult; we may not even know it exists, and when we do, tracking down state that caused it can be very challenging due to lack of locality and potential for multiple assertion breaches.
When programming language designers talk about program safety, a major goal is to surface, not suppress, genuine errors by promoting runtime errors to compile-time errors and promote silent logical errors to either runtime exceptions or (ideally) compile-time errors.
Python, by design as an interpreted language, relies heavily on runtime exceptions instead of compiler errors. Missing methods or properties, illegal type operations like 1 + "a" and out of bounds or missing indices or keys raise by default.
Some languages like JS, Java, Rust and Go use the fallback behavior for their maps by default (and in many cases, don't provide a throw/raise alternative), but Python throws by default, along with other languages like C#. Perl/PHP issue an uninitialized value warning.
Indiscriminate application of .get to all dict accesses, even those that aren't expected to fail and have no fallback for dealing with None (or whatever default is used) running amok through the code, pretty much tosses away Python's runtime exception safety net for this class of errors, silencing or adding indirection to potential bugs.
Other supporting reasons to prefer bracket lookups (with the occasional, well-placed .get where a default is expected):
Prefer writing standard, idiomatic code using the tools provided by the language. Python programmers usually (correctly) prefer brackets for the exception safety reasons given above and because it's the default behavior for Python dicts.
Always using .get forfeits intent by making cases when you expect to provide a default None value indistinguishable from a lookup you assert must succeed.
Testing increases in complexity in proportion to the new "legal" program paths permitted by .get. Effectively, each lookup is now a branch that can succeed or fail -- both cases must be tested to establish coverage, even if the default path is effectively unreachable by specification (ironically leading to additional if val is not None: or try for all future uses of the retrieved value; unnecessary and confusing for something that should never be None in the first place).
.get is a bit slower.
.get is harder to type and uglier to read (compare Java's tacked-on-feel ArrayList syntax to native-feel C# Lists or C++ vector code). Minor.
Some languages like C++ and Ruby offer alternate methods (at and fetch, respectively) to opt-in to throwing an error on a bad access, while C# offers opt-in fallback value TryGetValue similar to Python's get.
Since JS, Java, Ruby, Go and Rust bake the fallback approach of .get into all hash lookups by default, it can't be that bad, one might think. It's true that this isn't the largest issue facing language designers and there are plenty of use cases for the no-throw access version, so it's unsurprising that there's no consensus across languages.
But as I've argued, Python (along with C#) has done better than these languages by making the assert option the default. It's a loss of safety and expressivity to opt-out of using it to report contract violations at the point of failure by indiscriminately using .get across the board.
Why dict.get(key) instead of dict[key]?
0. Summary
Comparing to dict[key], dict.get provides a fallback value when looking up for a key.
1. Definition
get(key[, default]) 4. Built-in Types — Python 3.6.4rc1 documentation
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
d = {"Name": "Harry", "Age": 17}
In [4]: d['gender']
KeyError: 'gender'
In [5]: d.get('gender', 'Not specified, please add it')
Out[5]: 'Not specified, please add it'
2. Problem it solves.
If without default value, you have to write cumbersome codes to handle such an exception.
def get_harry_info(key):
try:
return "{}".format(d[key])
except KeyError:
return 'Not specified, please add it'
In [9]: get_harry_info('Name')
Out[9]: 'Harry'
In [10]: get_harry_info('Gender')
Out[10]: 'Not specified, please add it'
As a convenient solution, dict.get introduces an optional default value avoiding above unwiedly codes.
3. Conclusion
dict.get has an additional default value option to deal with exception if key is absent from the dictionary
One difference, that can be an advantage, is that if we are looking for a key that doesn't exist we will get None, not like when we use the brackets notation, in which case we will get an error thrown:
print(dictionary.get("address")) # None
print(dictionary["address"]) # throws KeyError: 'address'
Last thing that is cool about the get method, is that it receives an additional optional argument for a default value, that is if we tried to get the score value of a student, but the student doesn't have a score key we can get a 0 instead.
So instead of doing this (or something similar):
score = None
try:
score = dictionary["score"]
except KeyError:
score = 0
We can do this:
score = dictionary.get("score", 0)
# score = 0
One other use-case that I do not see mentioned is as the key argument for functions like sorted, max and min. The get method allows for keys to be returned based on their values.
>>> ages = {"Harry": 17, "Lucy": 16, "Charlie": 18}
>>> print(sorted(ages, key=ages.get))
['Lucy', 'Harry', 'Charlie']
>>> print(max(ages, key=ages.get))
Charlie
>>> print(min(ages, key=ages.get))
Lucy
Thanks to this answer to a different question for providing this use-case!
Short answer
The square brackets are used for conditional lookups which can fail with a KeyError when the key is missing.
The get() method is used from unconditional lookups that never fail because a default value has been supplied.
Base method and helper method
The square brackets call the __getitem__ method which is fundamental for mappings like dicts.
The get() method is a helper layered on top of that functionality. It is a short-cut for the common coding pattern:
try:
v = d[k]
except KeyError:
v = default_value
It allow you to provide a default value, instead of get an error when the value is not found. persuedocode like this :
class dictionary():
def get(self,key,default):
if self[key] is not found :
return default
else:
return self[key]
With Python 3.8 and after, the dictionary get() method can be used with the walrus operator := in an assignment expression to further reduce code:
if (name := dictonary.get("Name")) is not None
return name
Using [] instead of get() would require wrapping the code in a try/except block and catching KeyError (not shown). And without the walrus operator, you would need another line of code:
name = dictionary.get("Name")
if (name is not None)
return name
Asked such a question. Why only the type only str and boolean with the same variables refer to one memory location:
a = 'something'
b = 'something'
if a is b: print('True') # True
but we did not write anywhere a = b. hence the interpreter saw that the strings are equal to each other and made a reference to one memory cell.
Of course, if we assign a new value to either of these two variables, there will be no conflict, so now the variable will refer to another memory location
b = 'something more'
if a is b: print('True') # False
with type boolean going on all the same
a = True
b = True
if a is b: print('True') # True
I first thought that this happens with all mutable types. But no. There remained one unchangeable type - tuple. But it has a different behavior, that is, when we assign the same values to variables, we already refer to different memory cells. Why does this happen only with tuple of immutable types
a = (1,9,8)
b = (1,9,8)
if a is b: print('True') # False
In Python, == checks for value equality, while is checks if basically its the same object like so: id(object) == id(object)
Python has some builtin singletons which it starts off with (I'm guessing lower integers and some commonly used strings)
So, if you dig deeper into your statement
a = 'something'
b = 'something'
id(a)
# 139702804094704
id(b)
# 139702804094704
a is b
# True
But if you change it a bit:
a = 'something else'
b = 'something else'
id(a)
# 139702804150640
id(b)
# 139702804159152
a is b
# False
We're getting False because Python uses different memory location for a and b this time, unlike before.
My guess is with tuples (and someone correct me if I'm mistaken) Python allocates different memory every time you create one.
Why do some types cache values? Because you shouldn't be able to notice the difference!
is is a very specialized operator. Nearly always you should use == instead, which will do exactly what you want.
The cases where you want to use is instead of == basically are when you're dealing with objects that have overloaded the behavior of == to not mean what you want it to mean, or where you're worried that you might be dealing with such objects.
If you're not sure whether you're dealing with such objects or not, you're probably not, which means that == is always right and you don't have to ever use is.
It can be a matter of "style points" to use is with known singleton objects, like None, but there's nothing wrong with using == there (again, in the absence of a pathological implementation of ==).
If you're dealing with potentially untrustworthy objects, then you should never do anything that may invoke a method that they control.... and that's a good place to use is. But almost nobody is doing that, and those who do should be aware of the zillion other ways a malicious object could cause problems.
If an object implements == incorrectly then you can get all kinds of weird problems. In the course of debugging those problems, of course you can and should use is! But that shouldn't be your normal way of comparing objects in code you write.
The one other case where you might want to use is rather than == is as a performance optimization, if the object you're dealing with implements == in a particularly expensive way. This is not going to be the case very often at all, and most of the time there are better ways to reduce the number of times you have to compare two objects (e.g. by comparing hash codes instead) which will ultimately have a much better effect on performance without bringing correctness into question.
If you use == wherever you semantically want an equality comparison, then you will never even notice when some types sneakily reuse instances on you.
I understand that the following is valid in Python: foo = {'a': 0, 1: 2, some_obj: 'c'}
However, I wonder how the internal works. Does it treat everything (object, string, number, etc.) as object? Does it type check to determine how to compute the hash code given a key?
Types aren't used the same way in Python as statically types languages. A hashable object is simply one with a valid hash method. The interpreter simply calls that method, no type checking or anything. From there on out, standard hash map principles apply: for an object to fulfill the contract, it must implement both hash and equals methods.
You can answer this by opening a Python interactive prompt and trying several of these keys:
>>> hash('a')
12416037344
>>> hash(2)
2
>>> hash(object())
8736272801544
Does it treat everything (object, string, number, etc.) as object?
You are simply using the hash function to represent each dictionary key as an integer. This integer is simply used to index in the underlying dictionary array. Assuming a dictionary starts of with a pre-allocated size of 8, we use the modulus operator (the remainder) to fit it into an appropriate location:
>>> hash('a')
12416037344
>>> hash(object()) % 8
2
So in this particular case, the hashed object is placed in index 2 of the underlying array. Of course there can be collisions, and so depending on the underlying implementation, the underlying array may actually be an array of arrays.
Note that items that aren't hashable cannot be used as dictionary keys:
>>> hash({})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
Proof:
>>> d = {}
>>> d[{}] = 5
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
Everything in Python is an object, and every object which has a __hash__ method can be used as a dictionary key. How exactly the method (tries to) return a unique integer for each possible key is thus specific to each class, and somewhat private (to use the term carelessly for humorous effect). See https://wiki.python.org/moin/DictionaryKeys for details.
(There are a couple of other methods your class needs to support before it is completely hashable. Again, see the linked exposition.)
I think it would work as long, the object supports __hash__ method and the __hash__(self) of two objects return same values for which __eq__ returns True. hashable objects are explained here.
eg. Try following
a = []
a.__hash__ == None # True
aa = 'xyz'
aa.__hash__ == None # False
a = (1,1,) # A Tuple, hashable
a.__hash__ == None # False
Hope that helps
It might be easier to ignore dictionaries for a moment, and just think of sets. You can probably see how a set s might consist of {'a', 1, some_object}, right? And that if you tried to add 1 to this set, it wouldn't do anything (since 1 is already there)?
Now, suppose you try to add a another_object to s. another_object is not 1 or 'a', so to see if it can be added to s, Python will see if another_object is equal to some_object. Understanding object equality is a whole nother subject, but suffice it to say, there is a sensible way to go about it. If another_object == some_object is true, then s will remain unchanged. Otherwise, s will consist of {'a', 1, some_object, another_object}.
Hopefully, this makes sense to you. If it does, then just think of dictionaries as special sets. The keys of the dictionary are the entries of the set, and values of the dictionary just hold a single value for each key. Saying d['and now'] = 'completely different' is just the same thing as deleting 'and now' from the set, and then adding it back it again with 'completely different' as its associated value. (This isn't technically how a dict handles __setitem__ , but it can be helpful to think of a dict as working this way. In reality, sets are more like crippled dicts than dicts are like extra powerful sets).
Of course, with dicts you should bear in mind that only hashable objects are allowed in a dict. But that is itself a different subject, and not really the crux of your question, AFAICT
Today, I came across the dict method get which, given a key in the dictionary, returns the associated value.
For what purpose is this function useful? If I wanted to find a value associated with a key in a dictionary, I can just do dict[key], and it returns the same thing:
dictionary = {"Name": "Harry", "Age": 17}
dictionary["Name"]
dictionary.get("Name")
It allows you to provide a default value if the key is missing:
dictionary.get("bogus", default_value)
returns default_value (whatever you choose it to be), whereas
dictionary["bogus"]
would raise a KeyError.
If omitted, default_value is None, such that
dictionary.get("bogus") # <-- No default specified -- defaults to None
returns None just like
dictionary.get("bogus", None)
would.
What is the dict.get() method?
As already mentioned the get method contains an additional parameter which indicates the missing value. From the documentation
get(key[, default])
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
An example can be
>>> d = {1:2,2:3}
>>> d[1]
2
>>> d.get(1)
2
>>> d.get(3)
>>> repr(d.get(3))
'None'
>>> d.get(3,1)
1
Are there speed improvements anywhere?
As mentioned here,
It seems that all three approaches now exhibit similar performance (within about 10% of each other), more or less independent of the properties of the list of words.
Earlier get was considerably slower, However now the speed is almost comparable along with the additional advantage of returning the default value. But to clear all our queries, we can test on a fairly large list (Note that the test includes looking up all the valid keys only)
def getway(d):
for i in range(100):
s = d.get(i)
def lookup(d):
for i in range(100):
s = d[i]
Now timing these two functions using timeit
>>> import timeit
>>> print(timeit.timeit("getway({i:i for i in range(100)})","from __main__ import getway"))
20.2124660015
>>> print(timeit.timeit("lookup({i:i for i in range(100)})","from __main__ import lookup"))
16.16223979
As we can see the lookup is faster than the get as there is no function lookup. This can be seen through dis
>>> def lookup(d,val):
... return d[val]
...
>>> def getway(d,val):
... return d.get(val)
...
>>> dis.dis(getway)
2 0 LOAD_FAST 0 (d)
3 LOAD_ATTR 0 (get)
6 LOAD_FAST 1 (val)
9 CALL_FUNCTION 1
12 RETURN_VALUE
>>> dis.dis(lookup)
2 0 LOAD_FAST 0 (d)
3 LOAD_FAST 1 (val)
6 BINARY_SUBSCR
7 RETURN_VALUE
Where will it be useful?
It will be useful whenever you want to provide a default value whenever you are looking up a dictionary. This reduces
if key in dic:
val = dic[key]
else:
val = def_val
To a single line, val = dic.get(key,def_val)
Where will it be NOT useful?
Whenever you want to return a KeyError stating that the particular key is not available. Returning a default value also carries the risk that a particular default value may be a key too!
Is it possible to have get like feature in dict['key']?
Yes! We need to implement the __missing__ in a dict subclass.
A sample program can be
class MyDict(dict):
def __missing__(self, key):
return None
A small demonstration can be
>>> my_d = MyDict({1:2,2:3})
>>> my_d[1]
2
>>> my_d[3]
>>> repr(my_d[3])
'None'
get takes a second optional value. If the specified key does not exist in your dictionary, then this value will be returned.
dictionary = {"Name": "Harry", "Age": 17}
dictionary.get('Year', 'No available data')
>> 'No available data'
If you do not give the second parameter, None will be returned.
If you use indexing as in dictionary['Year'], nonexistent keys will raise KeyError.
A gotcha to be aware of when using .get():
If the dictionary contains the key used in the call to .get() and its value is None, the .get() method will return None even if a default value is supplied.
For example, the following returns None, not 'alt_value' as may be expected:
d = {'key': None}
assert None is d.get('key', 'alt_value')
.get()'s second value is only returned if the key supplied is NOT in the dictionary, not if the return value of that call is None.
I will give a practical example in scraping web data using python, a lot of the times you will get keys with no values, in those cases you will get errors if you use dictionary['key'], whereas dictionary.get('key', 'return_otherwise') has no problems.
Similarly, I would use ''.join(list) as opposed to list[0] if you try to capture a single value from a list.
hope it helps.
[Edit] Here is a practical example:
Say, you are calling an API, which returns a JOSN file you need to parse. The first JSON looks like following:
{"bids":{"id":16210506,"submitdate":"2011-10-16 15:53:25","submitdate_f":"10\/16\/2011 at 21:53 CEST","submitdate_f2":"p\u0159ed 2 lety","submitdate_ts":1318794805,"users_id":"2674360","project_id":"1250499"}}
The second JOSN is like this:
{"bids":{"id":16210506,"submitdate":"2011-10-16 15:53:25","submitdate_f":"10\/16\/2011 at 21:53 CEST","submitdate_f2":"p\u0159ed 2 lety","users_id":"2674360","project_id":"1250499"}}
Note that the second JSON is missing the "submitdate_ts" key, which is pretty normal in any data structure.
So when you try to access the value of that key in a loop, can you call it with the following:
for item in API_call:
submitdate_ts = item["bids"]["submitdate_ts"]
You could, but it will give you a traceback error for the second JSON line, because the key simply doesn't exist.
The appropriate way of coding this, could be the following:
for item in API_call:
submitdate_ts = item.get("bids", {'x': None}).get("submitdate_ts")
{'x': None} is there to avoid the second level getting an error. Of course you can build in more fault tolerance into the code if you are doing scraping. Like first specifying a if condition
The purpose is that you can give a default value if the key is not found, which is very useful
dictionary.get("Name",'harry')
For what purpose is this function useful?
One particular usage is counting with a dictionary. Let's assume you want to count the number of occurrences of each element in a given list. The common way to do so is to make a dictionary where keys are elements and values are the number of occurrences.
fruits = ['apple', 'banana', 'peach', 'apple', 'pear']
d = {}
for fruit in fruits:
if fruit not in d:
d[fruit] = 0
d[fruit] += 1
Using the .get() method, you can make this code more compact and clear:
for fruit in fruits:
d[fruit] = d.get(fruit, 0) + 1
Other answers have clearly explained the difference between dict bracket keying and .get and mentioned a fairly innocuous pitfall when None or the default value is also a valid key.
Given this information, it may be tempting conclude that .get is somehow safer and better than bracket indexing and should always be used instead of bracket lookups, as argued in Stop Using Square Bracket Notation to Get a Dictionary's Value in Python, even in the common case when they expect the lookup to succeed (i.e. never raise a KeyError).
The author of the blog post argues that .get "safeguards your code":
Notice how trying to reference a term that doesn't exist causes a KeyError. This can cause major headaches, especially when dealing with unpredictable business data.
While we could wrap our statement in a try/except or if statement, this much care for a dictionary term will quickly pile up.
It's true that in the uncommon case for null (None)-coalescing or otherwise filling in a missing value to handle unpredictable dynamic data, a judiciously-deployed .get is a useful and Pythonic shorthand tool for ungainly if key in dct: and try/except blocks that only exist to set default values when the key might be missing as part of the behavioral specification for the program.
However, replacing all bracket dict lookups, including those that you assert must succeed, with .get is a different matter. This practice effectively downgrades a class of runtime errors that help reveal bugs into silent illegal state scenarios that tend to be harder to identify and debug.
A common mistake among programmers is to think exceptions cause headaches and attempt to suppress them, using techniques like wrapping code in try ... except: pass blocks. They later realize the real headache is never seeing the breach of application logic at the point of failure and deploying a broken application. Better programming practice is to embrace assertions for all program invariants such as keys that must be in a dictionary.
The hierarchy of error safety is, broadly:
Error category
Relative ease of debugging
Compile-time error
Easy; go to the line and fix the problem
Runtime exception
Medium; control needs to flow to the error and it may be due to unanticipated edge cases or hard-to-reproduce state like a race condition between threads, but at least we get a clear error message and stack trace when it does happen.
Silent logical error
Difficult; we may not even know it exists, and when we do, tracking down state that caused it can be very challenging due to lack of locality and potential for multiple assertion breaches.
When programming language designers talk about program safety, a major goal is to surface, not suppress, genuine errors by promoting runtime errors to compile-time errors and promote silent logical errors to either runtime exceptions or (ideally) compile-time errors.
Python, by design as an interpreted language, relies heavily on runtime exceptions instead of compiler errors. Missing methods or properties, illegal type operations like 1 + "a" and out of bounds or missing indices or keys raise by default.
Some languages like JS, Java, Rust and Go use the fallback behavior for their maps by default (and in many cases, don't provide a throw/raise alternative), but Python throws by default, along with other languages like C#. Perl/PHP issue an uninitialized value warning.
Indiscriminate application of .get to all dict accesses, even those that aren't expected to fail and have no fallback for dealing with None (or whatever default is used) running amok through the code, pretty much tosses away Python's runtime exception safety net for this class of errors, silencing or adding indirection to potential bugs.
Other supporting reasons to prefer bracket lookups (with the occasional, well-placed .get where a default is expected):
Prefer writing standard, idiomatic code using the tools provided by the language. Python programmers usually (correctly) prefer brackets for the exception safety reasons given above and because it's the default behavior for Python dicts.
Always using .get forfeits intent by making cases when you expect to provide a default None value indistinguishable from a lookup you assert must succeed.
Testing increases in complexity in proportion to the new "legal" program paths permitted by .get. Effectively, each lookup is now a branch that can succeed or fail -- both cases must be tested to establish coverage, even if the default path is effectively unreachable by specification (ironically leading to additional if val is not None: or try for all future uses of the retrieved value; unnecessary and confusing for something that should never be None in the first place).
.get is a bit slower.
.get is harder to type and uglier to read (compare Java's tacked-on-feel ArrayList syntax to native-feel C# Lists or C++ vector code). Minor.
Some languages like C++ and Ruby offer alternate methods (at and fetch, respectively) to opt-in to throwing an error on a bad access, while C# offers opt-in fallback value TryGetValue similar to Python's get.
Since JS, Java, Ruby, Go and Rust bake the fallback approach of .get into all hash lookups by default, it can't be that bad, one might think. It's true that this isn't the largest issue facing language designers and there are plenty of use cases for the no-throw access version, so it's unsurprising that there's no consensus across languages.
But as I've argued, Python (along with C#) has done better than these languages by making the assert option the default. It's a loss of safety and expressivity to opt-out of using it to report contract violations at the point of failure by indiscriminately using .get across the board.
Why dict.get(key) instead of dict[key]?
0. Summary
Comparing to dict[key], dict.get provides a fallback value when looking up for a key.
1. Definition
get(key[, default]) 4. Built-in Types — Python 3.6.4rc1 documentation
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
d = {"Name": "Harry", "Age": 17}
In [4]: d['gender']
KeyError: 'gender'
In [5]: d.get('gender', 'Not specified, please add it')
Out[5]: 'Not specified, please add it'
2. Problem it solves.
If without default value, you have to write cumbersome codes to handle such an exception.
def get_harry_info(key):
try:
return "{}".format(d[key])
except KeyError:
return 'Not specified, please add it'
In [9]: get_harry_info('Name')
Out[9]: 'Harry'
In [10]: get_harry_info('Gender')
Out[10]: 'Not specified, please add it'
As a convenient solution, dict.get introduces an optional default value avoiding above unwiedly codes.
3. Conclusion
dict.get has an additional default value option to deal with exception if key is absent from the dictionary
One difference, that can be an advantage, is that if we are looking for a key that doesn't exist we will get None, not like when we use the brackets notation, in which case we will get an error thrown:
print(dictionary.get("address")) # None
print(dictionary["address"]) # throws KeyError: 'address'
Last thing that is cool about the get method, is that it receives an additional optional argument for a default value, that is if we tried to get the score value of a student, but the student doesn't have a score key we can get a 0 instead.
So instead of doing this (or something similar):
score = None
try:
score = dictionary["score"]
except KeyError:
score = 0
We can do this:
score = dictionary.get("score", 0)
# score = 0
One other use-case that I do not see mentioned is as the key argument for functions like sorted, max and min. The get method allows for keys to be returned based on their values.
>>> ages = {"Harry": 17, "Lucy": 16, "Charlie": 18}
>>> print(sorted(ages, key=ages.get))
['Lucy', 'Harry', 'Charlie']
>>> print(max(ages, key=ages.get))
Charlie
>>> print(min(ages, key=ages.get))
Lucy
Thanks to this answer to a different question for providing this use-case!
Short answer
The square brackets are used for conditional lookups which can fail with a KeyError when the key is missing.
The get() method is used from unconditional lookups that never fail because a default value has been supplied.
Base method and helper method
The square brackets call the __getitem__ method which is fundamental for mappings like dicts.
The get() method is a helper layered on top of that functionality. It is a short-cut for the common coding pattern:
try:
v = d[k]
except KeyError:
v = default_value
It allow you to provide a default value, instead of get an error when the value is not found. persuedocode like this :
class dictionary():
def get(self,key,default):
if self[key] is not found :
return default
else:
return self[key]
With Python 3.8 and after, the dictionary get() method can be used with the walrus operator := in an assignment expression to further reduce code:
if (name := dictonary.get("Name")) is not None
return name
Using [] instead of get() would require wrapping the code in a try/except block and catching KeyError (not shown). And without the walrus operator, you would need another line of code:
name = dictionary.get("Name")
if (name is not None)
return name