Python safe dict navigation, The Right Way - python

TLDR summary
I wrote a function navigateDict that does a safe navigation on a dict, similar to dict.get() but nested. It replaces code like
if 1 in data and 'i' in data[1] and 'a' in data[1]['i']:
print data[1]['i']['a']
else:
print "Not found"
with the roughly equivalent
found = navigateDict(data, 1, 'i', 'a')
if found is not None:
print found
else:
print "Not found"
Is anything similar to this already part of the standard library?
Is there a more idiomatic way to do the same thing?
Any response that requires typing any path component key more than once is probably a non-answer.
Additional details
The implementation is as follows:
# Allow fallback value other than None
def navigateDictEx(d, keys, fallback=None):
for key in keys:
if key in d:
d = d[key]
else:
return fallback
return d
def navigateDict(d, *keys):
return navigateDictEx(d, keys)
See the summary for example usage.
Pythonic or not, this function reduces repetition in a place where redundancy is a bad idea. For example, changing one path component in the example requires up to three distinct values to be modified as one in the original example, but only one in the modified example. Given my regular tendency to err, this is a big win.
Ultimately I'm asking this: Is there something in the standard library that does this, or am I going to need to find a place for it in my project's library?
If hits are expected to dominate misses
brionius correctly points out that catching KeyError will work:
try:
print data[1]['i']['a']
except KeyError:
print "Not found"
This might be the way I go; it's pretty terse and cuts the repetition. However, it does reflect an assumption that there will be more hits than misses. If there's a better way of assuming the opposite I'd like to know that, also.

One way to do this is as follows:
try:
print data[1]['i']['a']
except KeyError:
print "Not found!"
It's in line with the spirit of duck-typing. It may or may not be as fast, as I believe handling exceptions carries a certain amount of overhead, but it's certainly "safe".

a solution like this is cool
https://twitter.com/raymondh/status/343823801278140417
>>> from collections import defaultdict
>>> infinite_defaultdict = lambda: defaultdict(infinite_defaultdict)
>>> d = infinite_defaultdict()
>>> d['x']['y']['z'] = 10
>>> if d['x']['y']['z']: print d['x']['y']['z'] #better reflects that misses are common

Years late to the game, but for anyone stumbling upon this, there still does not seem to be a native, fluent way to safely navigate a Python dict.
Enter RestResponse:
"RestResponse aims to be a fluent python object for interfacing with RESTful JSON APIs"
This library includes a NoneProp object that allows for safely navigating (and building) JSON data structures.
>>> import RestResponse
>>> data = RestResponse.parse({})
>>> data.property.is_none
None
>>> bool(data.property.is_none)
False
>>> isinstance(data.property.is_none, RestResponse.NoneProp)
True
>>> data.property.is_none = None
>>> isinstance(data.property.is_none, RestResponse.NoneProp)
False
>>> print data.pretty_print()
{
"property": {
"is_none": null
}
}

Related

`isinstance` with non-conventional classinfo

Disclaimer:
unlike 99.9% of most out there, I didn't pick up python until very late in the progression of languages I write in. I won't harp on some of the odd behaviors of the import model, but I do find myself having an issue understanding why the type checking (ie: "what kinda thing is you random object some user has given me hmm?) is all over the place.
Really this is just checking what class of data a thing is, but in python it's never struck me as being straightforward and in my research on the interwebz, well let's just say their are opinions and the only thing anyone agrees on is using the term pythonic. My question boils down to type(x) == y vs isinstance(x, y) when the type isn't one of the more straightforward list, tuple, float, int, ... yadda yadda .
Current Conundrum:
I need the ability to determine if an object that is being passed(either directly, or dynamically within a recursive routine) is not just an iterable, but more specifically an object created by scandir. Please don't get lost in the singular issue, i'll show i have many ways to get to this, but the bigger question is:
A) Is the method I'm using to coerce the output of type() going to bite me in the backside given a case I am not thinking of?
B) Am I missing a simpler way of accessing the 'class|type' of an object that is language-specific type of thing?
C) TBD
I'll start by showing maybe where the root of my disconnect comes from, and have a little fun with the people I know will take the time to answer this question properly by a first example in R.
I'm going to cast my own class attribute just to show what i'm talking about:
> a <- 1:3
> class(a)
[1] "integer"
> attr(a, "class")
[1] "integer"
Ok so, like in python, we can ask if this is an int(eger) etc. Now I can re-class as I see fit, which is getting to the point of where i'm going with the python issue:
> class(a) <- "i.can.reclass.how.i.want"
> class(a)
[1] "i.can.reclass.how.i.want"
> attr(a, "class")
[1] "i.can.reclass.how.i.want"
So now in python, let's say I have a data.frame, or as you all put it DataFrame:
>>> import pandas as pd
>>> df = pd.DataFrame({"a":[1,2,3]})
>>> type(df)
pandas.core.frame.DataFrame
Ok, so if i want to determine if my object is a DataFrame:
>>> df = pd.DataFrame({"a":[1,2,3]})
# Get the mro of type(df)? and remove 'object' as an item in the mro tuple
>>> isinstance(df, type(df).__mro__[:-1])
True
# hmmmm
>>> isinstance(df, (pandas.core.frame.DataFrame))
NameError: name 'pandas' is not defined
# hmmm.. aight let's try..
>>> isinstance(df, (pd.core.frame.DataFrame))
True
# Lulz... alright then, I guess i get that, but why did __mro__ pass with pandas vs pd? Not the point...
For when you can't do that
# yes..i know.. 3.5+ os.scandir... focus on bigger picture of this question/issue
import scandir
>>> a = scandir.scandir("/home")
>>> type(a)
posix.ScandirIterator
>>> str(type(scandir.scandir("/home")))
"<class 'scandir.ScandirIterator'>"
>>> isinstance(scandir.scandir("/home"), (scandir,scandir.ScandirIterator))
AttributeError: module 'scandir' has no attribute 'ScandirIterator'
# Okay fair enough.. kinda thought it could work like pandas, maybe can but I can't find it?
Question:
Does that mean that my only way of knowing the instance/type of certain objects like the scandir example are essentially the below type hacks?
import re
def isinstance_from_type(x, class_info):
_chunk = re.search("(?<=\s['|\"]).*?(?=['|\"])", str(type(x)),re.DOTALL)
try:
return _chunk.group(0) == str(class_info)
except:
return False
>>> a = scandir.scandir("/home")
>>> type(a) == "scandir.ScandirIterator"
False
>>> isinstance_from_type(a, "scandir.ScandirIterator")
True
Okay I get why i don't get a string back from calling type etc, but please let me know if there's a better, more universal and consistent method i simply don't know, or the hot and dangerous things that are coming using a regex; trust me.. i get it.
Thanks for reading and any/all feedback about the mechanics of this specific to python are welcomed.

Avoid extra line for attribute check?

I am developing this Python project where I encounter a situation many times and I wondered if there is a better way.
There is a list of class instances. Some part of lists are empty(filled with None).
Here is an example list.
ins_list = [ins_1, ins_2, None, ins_3, None]
I have to do some confirmations throughout the program flow. There are points where I need the control an attribute of these instances. But only indexes are given for choosing an instance from the list and it may be one of the empty elements. Which would give an error when the attribute is called. Here is an example program flow.
ind = 2
if ins_list[ind].some_attribute == "thing":
# This would give error when empty element is selected.
I deal with this by using,
if ins_list[ind]:
if ins_list[ind].some_attribute == "thing":
# This works
I am okay with using this. However the program is a long one, I apply this hundreds of times. Is there an easier, better way of doing this, it means I am producing reduntant code and increasing indentation level for no reason. I wish to know if there is such a solution.
Use a boolean operator and.
if ins_list[ind] and ins_list[ind].some_attribute == "thing":
# Code
As coder proposed, you can remove None from your list, or use dictionaries instead, to avoid to have to create an entry for each index.
I want to propose another way: you can create a dummyclass and replace None by it. This way there will be no error if you set an attribute:
class dummy:
def __nonzero__(self):
return False
def __setattr__(self, k, v):
return
mydummy = dummy()
mylist = [ins_1, ins_2, mydummy, ins_3, mydummy]
nothing will be stored to the dummyinstances when setting an attribute
edit:
If the content of the original list cannot be chosen, then this class could help:
class PickyList(list):
def __init__(self, iterable, dummyval):
self.dummy = dummyval
return super(PickyList, self).__init__(iterable)
def __getitem__(self, k):
v = super(PickyList, self).__getitem__(k)
return (self.dummy if v is None else v)
mylist = PickyList(ins_list, mydummy)
There are these two options:
Using a dictionary:
Another way would be to use a dictionary instead. So you could create your dictionary once the list is filled up with elements. The dictionary's keys would be the values of your list and as values you could use the attributes of the elements that are not None and "No_attr" for those that are None. (Note: Have in mind that python dictionaries don't support duplicate keys and that's why I propose below to store as keys your list indexes else you will have to find a way to make keys be different)
For example for a list like:
l = [item1,item2,None,item4]
You could create a dictionary:
d = {item1:"thing1", item2:"thing2", None:"No_attr", item3:"thing3"}
So in this way every time you would need to make a check, you wouldn't have to check two conditions, but you could check only the value, such as:
if d.values()[your_index]=="thing":
The only cons of this method is that standard python dictionaries are inherently unordered, which makes accessing dictionary values by index a bit dangerous sometimes - you have to be careful not to change the form-arrangement of the dictionary.
Now, if you want to make sure that the index stays stable, then you would have to store it some way, for example select as keys of your dictionary the indexes, as you will have already stored the attributes of the items - But that is something that you will have to decide and depends strongly on the architecture of your project.
Using a list:
In using lists way I don't think there is a way to avoid your if statement - and is not bad actually. Maybe use an and operator as it is mentioned already in another answer but I don't think that makes any difference anyway.
Also, if you want to use your first approach:
if ins_list[ind].some_attribute == "thing":
You could try using and exception catcher like this:
try:
if ins_list[ind].some_attribute == "thing":
#do something
except:
#an error occured
pass
In this case I would use an try-except statement because of EAFP (easier to ask for forgivness than permission). It won't shorten yout code but it's a more Pythonic way to code when checking for valid attributes. This way you won't break against DRY (Don't Repat Yourself) either.
try:
if ins_list[ind].some_attribute == "thing":
# do_something()
except AttributeError:
# do_something_else()

Converting a String into a variable

For my examine command, because I don't want to do this:
def examine(Decision):
if Decision == "examine sword":
print sword.text
elif Decision == "examine gold":
print gold.text
elif Decision == "examine cake":
print cake.text
...
for every item in my game.
So I wanted to convert the second word of the Decision string into a variable so that I could use something like secondwordvar.text.
I tried to use eval(), but I always get an errors when I make a spelling mistake in a single word command.
The error
IndexError: list index out of range
It's be working otherwise though.
Right now my code is this:
def exam(Decision):
try:
examlist = shlex.split(Decision)
useditem = eval(examlist[1])
print useditem.text
except NameError:
print "This doesn't exist"
Does anyone have an idea, for an other option, how I could write that function in a easy way?
I should probably also include the full game. You can find it here:
http://pastebin.com/VVDSxQ0g
Somewhere in your program, create a dictionary mapping the name of the object to a variable that it represents. For example:
objects = {'sword': sword, 'gold': gold, 'cake': cake}
Then you can change your examine() function to something like the following:
def examine(Decision):
tokens = shlex.split(Decision)
if len(tokens) != 2 or tokens[0] != 'examine' or tokens[1] not in objects:
print "This doesn't exist"
else:
print objects[tokens[1]].text
What you could do (because with my knowledge in programming, somewhat limited, this is the most advanced way I could see it) is to utilize dictionaries. I'll try to explain in english, because my knowledge of code in this field is suspect and I don't want to mislead you.
Dictionaries are very array-like, allowing you to associate a decision with a value.
You would be able to associate Examine sword with an action code 4
This would (in a hack-y way) allow you to convert your string to a variable, more by direct and consistent referencing of key/value pairs.
Good luck with this approach; Read up some on Dictionaries and you may very well find them easier to handle than it sounds!
Finally, as a form of good coding practice, never use eval() unless you are sure of what you are doing. eval() executes the code inside the (), so if, god forbid, some malicious process manages to run that code with a malicious line injected inside it:
eval(###DELETE EVERYTHING RAWR###)
You'll have a bad time. Sincerely.
Also, for the sake of evaluating code, I've heard that it is a very slow command, and that there are better alternatives, performance-wise.
Happy coding!
These two print the same text:
Using a dictionary:
texts = dict(sword = "wathever",
gold = "eachever",
cake = "whomever")
def examine_dict(decision):
decision = decision.split()[1]
print texts[decision]
examine_dict("examine sword")
Using object attributes (a class):
class Texts():
sword = "wathever"
gold = "eachever"
cake = "whomever"
def examine_attribute(decision):
decision = decision.split()[1]
text = getattr(Texts, decision)
print text
examine_attribute("examine sword")
Depending on what you want, one method can be more appropriate than the other. The dictionary-based method, however, is in general, the easier and the faster one.
Your variables are stored in a dictionary somewhere. If they are global variables, globals() returns this dictionary. You can use this to look up the variable by name:
globals()['sword'].text
If the variables are stored in a class as attributes, you can use getattr:
getattr(object, 'sword').text
You'll want to catch possible exceptions for bad names.

Python FAQ: “How fast are exceptions?”

I was just looking at the Python FAQ because it was mentioned in another question. Having never really looked at it in detail before, I came across this question: “How fast are exceptions?”:
A try/except block is extremely efficient. Actually catching an exception is expensive. In versions of Python prior to 2.0 it was common to use this idiom:
try:
value = mydict[key]
except KeyError:
mydict[key] = getvalue(key)
value = mydict[key]
I was a little bit surprised about the “catching an exception is expensive” part. Is this referring only to those except cases where you actually save the exception in a variable, or generally all excepts (including the one in the example above)?
I’ve always thought that using such idioms as shown would be very pythonic, especially as in Python “it is Easier to Ask Forgiveness than it is to get Permission”. Also many answers on SO generally follow this idea.
Is the performance for catching Exceptions really that bad? Should one rather follow LBYL (“Look before you leap”) in such cases?
(Note that I’m not directly talking about the example from the FAQ; there are many other examples where you just look out for an exception instead of checking the types before.)
Catching exceptions is expensive, but exceptions should be exceptional (read, not happen very often). If exceptions are rare, try/catch is faster than LBYL.
The following example times a dictionary key lookup using exceptions and LBYL when the key exists and when it doesn't exist:
import timeit
s = []
s.append('''\
try:
x = D['key']
except KeyError:
x = None
''')
s.append('''\
x = D['key'] if 'key' in D else None
''')
s.append('''\
try:
x = D['xxx']
except KeyError:
x = None
''')
s.append('''\
x = D['xxx'] if 'xxx' in D else None
''')
for i,c in enumerate(s,1):
t = timeit.Timer(c,"D={'key':'value'}")
print('Run',i,'=',min(t.repeat()))
Output
Run 1 = 0.05600167960596991 # try/catch, key exists
Run 2 = 0.08530091918578364 # LBYL, key exists (slower)
Run 3 = 0.3486251291120652 # try/catch, key doesn't exist (MUCH slower)
Run 4 = 0.050621117060586585 # LBYL, key doesn't exist
When the usual case is no exception, try/catch is "extremely efficient" when compared to LBYL.
The cost depends on implementation, obviously, but I wouldn't worry about it. It's unlikely going to matter, anyway. Standard protocols raise exceptions in strangest of places (think StopIteration), so you're surrounded with raising and catching whether you like it or not.
When choosing between LBYL and EAFP, worry about readability of the code, instead of focusing on micro-optimisations. I'd avoid type-checking if possible, as it might reduce the generality of the code.
If the case where the key is not found is more than exceptional, I would suggest using the 'get' method, which provide a constant speed in all cases :
s.append('''\
x = D.get('key', None)
''')
s.append('''\
x = D.get('xxx', None)
''')

Basic Python: Exception raising and local variable scope / binding

I have a basic "best practices" Python question. I see that there are already StackOverflow answers tangentially related to this question but they're mired in complicated examples or involve multiple factors.
Given this code:
#!/usr/bin/python
def test_function():
try:
a = str(5)
raise
b = str(6)
except:
print b
test_function()
what is the best way to avoid the inevitable "UnboundLocalError: local variable 'b' referenced before assignment" that I'm going to get in the exception handler?
Does python have an elegant way to handle this? If not, what about an inelegant way? In a complicated function I'd prefer to avoid testing the existence of every local variable before I, for example, printed debug information about them.
Does python have an elegant way to
handle this?
To avoid exceptions from printing unbound names, the most elegant way is not to print them; the second most elegant is to ensure the names do get bound, e.g. by binding them at the start of the function (the placeholder None is popular for this purpose).
If not, what about an inelegant way?
try: print 'b is', b
except NameError: print 'b is not bound'
In a complicated function I'd prefer
to avoid testing the existence of
every local variable before I, for
example, printed debug information
about them
Keeping your functions simple (i.e., not complicated) is highly recommended, too. As Hoare wrote 30 years ago (in his Turing acceptance lecture "The Emperor's old clothes", reprinted e.g. in this PDF):
There are two ways of constructing a
software design: One way is to make it
so simple that there are obviously no
deficiencies, and the other way is to
make it so complicated that there are
no obvious deficiencies. The first
method is far more difficult.
Achieving and maintaining simplicity is indeed difficult: given that you have to implement a certain total functionality X, it's the most natural temptation in the world to do so via complicated accretion into a few complicated classes and functions of sundry bits and pieces, "clever" hacks, copy-and-paste-and-edit-a-bit episodes of "drive-by coding", etc, etc.
However, it's a worthwhile effort to strive instead to keep your functions "so simple that there are obviously no deficiencies". If a function's hard to completely unit-test, it's too complicated: break it up (i.e., refactor it) into its natural components, even though it will take work to unearth them. (That's actually one of the way in which a strong focus on unit testing helps code quality: by spurring you relentlessly to keep all the code perfectly testable, it's at the same time spurring you to make it simple in its structure).
You can initialize your variables outside of the try block
a = None
b = None
try:
a = str(5)
raise
b = str(6)
except:
print b
You could check to see if the variable is defined in local scope using the built-in method locals()
http://docs.python.org/library/functions.html#locals
#!/usr/bin/python
def test_function():
try:
a = str(5)
raise
b = str(6)
except:
if 'b' in locals(): print b
test_function()
def test_function():
try:
a = str(5)
raise
b = str(6)
except:
print b
b = str(6) is never run; the program exits try block just after raise. If you want to print some variable in the except block, evaluate it before raising an exception and put them into the exception you throw.
class MyException(Exception):
def __init__(self, var):
self.var = var
def test_function():
try:
a = str(5)
b = str(6)
raise MyException(b)
except MyException,e:
print e.var

Categories

Resources