I am trying to figure out the tradeoffs between different approaches of determining whether or not with object obj you can perform action do_stuff(). As I understand, there are three ways of determining if this is possible:
# Way 1
if isinstance(obj, Foo):
obj.do_stuff()
# Way 2
if hasattr(obj, 'do_stuff'):
obj.do_stuff()
# Way 3
try:
obj.do_stuff()
except:
print 'Do something else'
Which is the preferred method (and why)?
I believe that the last method is generally preferred by Python coders because of a motto taught in the Python community: "Easier to ask for forgiveness than permission" (EAFP).
In a nutshell, the motto means to avoid checking if you can do something before you do it. Instead, just run the operation. If it fails, handle it appropriately.
Also, the third method has the added advantage of making it clear that the operation should work.
With that said, you really should avoid using a bare except like that. Doing so will capture any/all exceptions, even the unrelated ones. Instead, it is best to capture exceptions specifically.
Here, you will want to capture for an AttributeError:
try:
obj.do_stuff() # Try to invoke do_stuff
except AttributeError:
print 'Do something else' # If unsuccessful, do something else
Checking with isinstance runs counter to the Python convention of using duck typing.
hasattr works fine, but is Look Before you Leap instead of the more Pythonic EAFP.
Your implementation of way 3 is dangerous, since it catches any and all errors, including those raised by the do_stuff method. You could go with the more precise:
try:
_ds = obj.do_stuff
except AttributeError:
print('Do something else')
else:
_ds()
But in this case, I'd prefer way 2 despite the slight overhead - it's just way more readable.
The correct answer is 'neither'
hasattr delivers functionality however it is possibly the worst of all options.
We use the object oriented nature of python because it works. OO analysis is never accurate and often confuses however we use class hierarchies because we know they help people do better work faster. People grasp objects and a good object model helps coders change things more quickly and with less errors. The right code ends up clustered in the right places. The objects:
Can just be used without considering which implementation is present
Make it clear what needs to be changed and where
Isolate changes to some functionality from changes to some other functionality – you can fix X without fearing you will break Y
hasattr vs isinstance
Having to use isinstance or hasattr at all indicates the object model is broken or we are using it incorrectly. The right thing to do is to fix the object model or change how we are using it.
These two constructs have the same effect and in the imperative ‘I need the code to do this’ sense they are equivalent. Structurally there is a huge difference. On meeting this method for the first time (or after some months of doing other things), isinstance conveys a wealth more information about what is actually going on and what else is possible. Hasattr does not ‘tell’ you anything.
A long history of development lead us away from FORTRAN and code with loads of ‘who am I’ switches. We choose to use objects because we know they help make the code easier to work with. By choosing hasattr we deliver functionality however nothing is fixed, the code is more broken than it was before we started. When adding or changing this functionality in the future we will have to deal with code that is unequally grouped and has at least two organising principles, some of it is where it ‘should be’ and the rest is randomly scattered in other places. There is nothing to make it cohere. This is not one bug but a minefield of potential mistakes scattered over any execution path that passes through your hasattr.
So if there is any choice, the order is:
Use the object model or fix it or at least work out what is wrong
with it and how to fix it
Use isinstance
Don’t use hasattr
Related
Whenever I chain conditions in Python (or any other language tbh) I stumble upon asking myself this, kicking me out of the productive "Zone".
When I chain conditions I can, by ordering them correctly, check conditions that without checking for the other conditions first, may produce an Error.
As an example lets assume the following snippet:
if "attr" in some_dictionary and some_value in some_dictionary["attr"]:
print("whooohooo")
If the first condition wasnt in the first place or even absent, the second condition my produce an KeyError
I do this pretty often to simply save space in the code, but I always wondered, if this is good style, if it comes with a risk or if its simply "pythonic".
A more Pythonic way is to "ask for forgivness rather than permission". In other words, use a try-except block:
try:
if some_value in some_dictionary["attr"]:
print("Woohoo")
except KeyError:
pass
Python is a late binding language, which is reflected in these kind of checks. The behavior is called short-circuiting. One thing I often do is:
def do(condition_check=None):
if condition_check is not None and condition_check():
# do stuff
Now, many people will argue that try: except: is more appropriate. This really depends on the use case!
if expressions are faster when the check is likely to fail, so use them when you know what is happening.
try expressions are faster when the check is likely to succeed, so use them to safeguard against exceptional circumstances.
if is explicit, so you know precisely what you are checking. Use it if you know what is happening, i.e. strongly typed situations.
try is implicit, so you only have to care about the outcome of a call. Use it when you don't care about the details, i.e. in weakly typed situations.
if works in a well-defined scope - namely right where you are performing the check. Use it for nested relations, where you want to check the top-most one.
try works on the entire contained call stack - an exception may be thrown several function calls deeper. Use it for flat or well-defined calls.
Basically, if is a precision tool, while try is a hammer - sometimes you need precision, and sometimes you just have nails.
I have a class with few methods - each one is setting some internal state, and usually requires some other method to be called first, to prepare stage.
Typical invocation goes like this:
c = MyMysteryClass()
c.connectToServer()
c.downloadData()
c.computeResults()
In some cases only connectToServer() and downloadData() will be called (or even just connectToServer() alone).
The question is: how should those methods behave when they are called in wrong order (or, in other words, when the internal state is not yet ready for their task)?
I see two solutions:
They should throw an exception
They should call correct previous method internally
Currently I'm using second approach, as it allows me to write less code (I can just write c.computeResults() and know that two other methods will be called if necessary). Plus, when I call them multiple times, I don't have to keep track of what was already called and so I avoid multiple reconnecting or downloading.
On the other hand, first approach seems more predictable from the caller perspective, and possibly less error prone.
And of course, there is a possibility for a hybrid solution: throw and exception, and add another layer of methods with internal state detection and proper calling of previous ones. But that seems to be a bit of an overkill.
Your suggestions?
They should throw an exception. As said in the Zen of Python: Explicit is better than implicit. And, for that matter, Errors should never pass silently. Unless explicitly silenced. If the methods are called out of order that's a programmer's mistake, and you shouldn't try to fix that by guessing what they mean. You might accidentally cover up an oversight in a way that looks like it works, but is not actually an accurate reflection of the programmer's intent. (That programmer may be future you.)
If these methods are usually called immediately one after another, you could consider collating them by adding a new method that simply calls them all in a row. That way you can use that method and not have to worry about getting it wrong.
Note that classes that handle internal state in this way are sometimes called for but are often not, in fact, necessary. Depending on your use case and the needs of the rest of your application, you may be better off doing this with functions and actually passing connection objects, etc. from one method to another, rather than using a class to store internal state. See for instance Stop Writing Classes. This is just something to consider and not an imperative; plenty of reasonable people disagree with the theory behind Stop Writing Classes.
You should write exceptions. It is good programming practice to write Exceptions to make your code easier to understand for the following reasons:
What you are describe fits the literal description of "exception" -- it is an exception to normal proceedings.
If you build in some kind of work around, you will likely have "spaghetti code" = BAD.
When you, or someone else goes back and reads this code later, it will be difficult to understand if you do not provide the hint that it is an exception to have these methods executed out of order.
Here's a good source:
http://jeffknupp.com/blog/2013/02/06/write-cleaner-python-use-exceptions/
As my CS professor always said "Good programmers can write code that computers can read, but great programmers write code that humans and computers can read".
I hope this helps.
If it's possible, you should make the dependencies explicit.
For your example:
c = MyMysteryClass()
connection = c.connectToServer()
data = c.downloadData(connection)
results = c.computeResults(data)
This way, even if you don't know how the library works, there's only one order the methods could be called in.
setattr and getattr kind of got into my style of programing (mainly scientific stuff, my knowledge about python is self told).
Considering that exec and eval inherit a potential danger since in some cases they might lead to security issues, I was wondering if for setattr the same argument is considered to be valid. (About getattr I found this question which contains some info - although the argumentation is not very convincing.)
From what I know, setattr can be used without worrying to much, but to be honest I don't trust my python knowledge enough to be sure, and if I'm wrong I'd like to try and get rid of the habit of using setattr.
Any input is very much appreciated!
First, it could definitely make it easier to an existing security hole.
For example, let's say you have code that does exec, eval, SQL queries or URLs built via string formatting, etc. And let's say you're passing, say, locals() or a filtered __dict__ to the formatting command or as the eval context or whatever. Using setattr clearly widens the security hole, making it much easier for me to find ways to attack your code, because you can no longer be sure what you're going to be passing to those functions.
But what if you don't do anything else unsafe? Is setattr safe then?
Not as bad, but it's still not safe. If I can influence the names of the attributes you're setting, I can, e.g., replace any method I want on your objects.
You can try to protect against this by, e.g., first checking that the old value was not callable, or not a method-type descriptor, or whatever. In the same way you can try to protect against people calling functions in eval or putting quotes and semicolons in SQL parameters and so on. This is effectively the same as any of those other cases. And it's a lot harder to try to close all illegitimate paths to an open door, than to just not open the door in the first place.
What if the name never comes from anything that can be influenced by the user?
Well, in that case, why are you using setattr? There is no good reason to call setattr with a literal.
Anyway, when Lattyware said that there are often better ways to solve the problem at hand, he was almost certainly talking about readability, maintainability, and idiomaticness. But the side effect of using those better ways is that you also often avoid any security implications.
90% of the time, the solution is to use a dict instead of an object. Unlike Javascript, they're not the same thing in Python, and they're not meant to be used the same way. A dict doesn't have methods, or inheritance, or built-in special names, so you don't have to worry about any of that. It also has a more convenient syntax, where you can say d['foo'] instead of setattr(o, 'foo'). And it's probably more efficient. And so on. But ultimately, the reason to use a dict is the conceptual reason: a dict is a named collection of values; a class instance is a representation of a model-space object, and those are not the same thing.
So, why does setattr even exist?
It's there for the same basic reasons as other low-level features, like being able to access im_func or func_closure, or having modules like traceback and imp, or treating special methods just like any other methods, or for that matter exec and eval.
First, you can build higher-level things out of these low-level tools. For example, to build collections.namedtuple, you'd need either exec or setattr.
Second, you occasionally need to monkey-patch code at runtime because you can't modify it (or maybe even see it) at compile time, and tools like setattr can be essential to doing that.
The setattr feature—much like eval—is often misused by people coming from Javascript, Tcl, or a few other languages. But as long as it can be used for good, you don't want to take it out of the language. (TOOWTDI shouldn't be taken so literally that only one program can ever be written.)
But that doesn't mean you should go around using this stuff whenever possible. You wouldn't write mylist.__getitem__(slice(1, 10, 2)) instead of mylist[1:10:2]. Sometimes, being able to call __getitem__ directly or build slice objects explicitly is a foundation to something that lets the rest of your code be more pythonic, or way to localize a workaround to avoid infecting the rest of your code. Otherwise, there are clearer and simpler ways to do it.
I have been trying to use properties instead of specific setters and getters in my app. They seem more pythonic and generally make my code more readable.
More readable except for one issue: Typos.
consider the following simple example (note, my properties actually do some processing even though the examples here just set or return a simple variable)
class GotNoClass(object):
def __init__(self):
object.__init__(self)
self.__a = None
def __set_a(self, a):
self.__a = a
def __get_a(self):
return self.__a
paramName = property(__get_a, __set_a)
if __name__ == "__main__":
classy = GotNoClass()
classy.paramName = 100
print classy.paramName
classy.paranName = 200
print classy.paramName
#oops! Typo above! as seen by this line:
print classy.paranName
The output, as anyone who reads a little closely will see, is:
100
100
200
Oops. Shouldn't have been except for the fact that I made a typo - I wrote paranName (two n's) instead of paramName.
This is easy to debug in this simple example, but it has been hurting me in my larger project. Since python happily creates a new variable when I accidentally meant to use a property, I get subtle errors in my code. Errors that I am finding hard to track down at times. Even worse, I once used the same typo twice (once as I was setting and later once as I was getting) so my code appeared to be working but much later, when a different branch of code finally tried to access this property (correctly) I got the wrong value - but it took me several days before I realized that my results were just a bit off.
Now that I know that this is an issue, I am spending more time closely reading my code, but ideally I would have a way to catch this situation automatically - if I miss just one I can introduce an error that does not show up until a fair bit of time has passed...
So I am wondering, should I just switch to using good old setters and getters? Or is there some neat way to avoid this situation? Do people just rely on themselves to catch these errors manually? Alas I am not a professional programmer, just someone trying to get some stuff done here at work and I don't really know the best way to approach this.
Thanks.
P.S.
I understand that this is also one of the benefits of Python and I am not complaining about that. Just wondering whether I would be better off using explicit setters and getters.
Have you tried a static analysis tool? Here is a great thread about them.
Depending on how your code works, you could try using slots. You'll get an AttributeError exception thrown when you try to assign something that's not in slots then, which will make such typo's more obvious.
There are times when compile-time checking really saves time. You seem to have identified one such case. By accident rather than careful choice I use getters and setters, and am happy ;-)
What is the easiest way to check if something is a list?
A method doSomething has the parameters a and b. In the method, it will loop through the list a and do something. I'd like a way to make sure a is a list, before looping through - thus avoiding an error or the unfortunate circumstance of passing in a string then getting back a letter from each loop.
This question must have been asked before - however my googles failed me. Cheers.
To enable more usecases, but still treat strings as scalars, don't check for a being a list, check that it isn't a string:
if not isinstance(a, basestring):
...
Typechecking hurts the generality, simplicity, and maintainability of your code. It is seldom used in good, idiomatic Python programs.
There are two main reasons people want to typecheck:
To issue errors if the caller provides the wrong type.
This is not worth your time. If the user provides an incompatible type for the operation you are performing, an error will already be raised when the compatibility is hit. It is worrisome that this might not happen immediately, but it typically doesn't take long at all and results in code that is more robust, simple, efficient, and easier to write.
Oftentimes people insist on this with the hope they can catch all the dumb things a user can do. If a user is willing to do arbitrarily dumb things, there is nothing you can do to stop him. Typechecking mainly has the potential of keeping a user who comes in with his own types that are drop-in replacements for the ones replaced or when the user recognizes that your function should actually be polymorphic and provides something different that can accept the same operation.
If I had a big system where lots of things made by lots of people should fit together right, I would use a system like zope.interface to make testing that everything fits together right.
To do different things based on the types of the arguments received.
This makes your code worse because your API is inconsistent. A function or method should do one thing, not fundamentally different things. This ends up being a feature not usually worth supporting.
One common scenario is to have an argument that can either be a foo or a list of foos. A cleaner solution is simply to accept a list of foos. Your code is simpler and more consistent. If it's an important, common use case only to have one foo, you can consider having another convenience method/function that calls the one that accepts a list of foos and lose nothing. Providing the first API would not only have been more complicated and less consistent, but it would break when the types were not the exact values expected; in Python we distinguish between objects based on their capabilities, not their actual types. It's almost always better to accept an arbitrary iterable or a sequence instead of a list and anything that works like a foo instead of requiring a foo in particular.
As you can tell, I do not think either reason is compelling enough to typecheck under normal circumstances.
I'd like a way to make sure a is a list, before looping through
Document the function.
Usually it's considered not a good style to perform type-check in Python, but try
if isinstance(a, list):
...
(I think you may also check if a.__iter__ exists.)