Most pythonic way to call dependant methods - python

I have a class with few methods - each one is setting some internal state, and usually requires some other method to be called first, to prepare stage.
Typical invocation goes like this:
c = MyMysteryClass()
c.connectToServer()
c.downloadData()
c.computeResults()
In some cases only connectToServer() and downloadData() will be called (or even just connectToServer() alone).
The question is: how should those methods behave when they are called in wrong order (or, in other words, when the internal state is not yet ready for their task)?
I see two solutions:
They should throw an exception
They should call correct previous method internally
Currently I'm using second approach, as it allows me to write less code (I can just write c.computeResults() and know that two other methods will be called if necessary). Plus, when I call them multiple times, I don't have to keep track of what was already called and so I avoid multiple reconnecting or downloading.
On the other hand, first approach seems more predictable from the caller perspective, and possibly less error prone.
And of course, there is a possibility for a hybrid solution: throw and exception, and add another layer of methods with internal state detection and proper calling of previous ones. But that seems to be a bit of an overkill.
Your suggestions?

They should throw an exception. As said in the Zen of Python: Explicit is better than implicit. And, for that matter, Errors should never pass silently. Unless explicitly silenced. If the methods are called out of order that's a programmer's mistake, and you shouldn't try to fix that by guessing what they mean. You might accidentally cover up an oversight in a way that looks like it works, but is not actually an accurate reflection of the programmer's intent. (That programmer may be future you.)
If these methods are usually called immediately one after another, you could consider collating them by adding a new method that simply calls them all in a row. That way you can use that method and not have to worry about getting it wrong.
Note that classes that handle internal state in this way are sometimes called for but are often not, in fact, necessary. Depending on your use case and the needs of the rest of your application, you may be better off doing this with functions and actually passing connection objects, etc. from one method to another, rather than using a class to store internal state. See for instance Stop Writing Classes. This is just something to consider and not an imperative; plenty of reasonable people disagree with the theory behind Stop Writing Classes.

You should write exceptions. It is good programming practice to write Exceptions to make your code easier to understand for the following reasons:
What you are describe fits the literal description of "exception" -- it is an exception to normal proceedings.
If you build in some kind of work around, you will likely have "spaghetti code" = BAD.
When you, or someone else goes back and reads this code later, it will be difficult to understand if you do not provide the hint that it is an exception to have these methods executed out of order.
Here's a good source:
http://jeffknupp.com/blog/2013/02/06/write-cleaner-python-use-exceptions/
As my CS professor always said "Good programmers can write code that computers can read, but great programmers write code that humans and computers can read".
I hope this helps.

If it's possible, you should make the dependencies explicit.
For your example:
c = MyMysteryClass()
connection = c.connectToServer()
data = c.downloadData(connection)
results = c.computeResults(data)
This way, even if you don't know how the library works, there's only one order the methods could be called in.

Related

Big exception handler or lots of try...except clauses

I have a question which is about code design in python.
I'm working on a certain project and I can see that there is a certain amount of different types of errors that I have to handle often, which results in lots of places where there is a try...execept clause that repeats itself.
Now the question is, will it be more preferred to create one exception handler (a decorator) and decorate with it all those functions that have those repeating errors.
The trade off here is that if I create this exception handler decorator it will become quite a big of a class/function which will then make the person reading the code to try and understand another piece of complicated (maybe) logic to understand how the error is handled, where if I don't use the decorator, its pretty clear to the reader how is it handled.
Another option is to create multiple decorators for each of the types of the errors.
Or maybe just leave all those try...except clauses even though they are being repeated.
Any opinions on the matter and maybe other solutions? Thanks!
A lot of this is subjective, but I personally think it's better for exception handling code to be close to where the error is occurring, for the sake of readability and debugging ease. So:
The trade off here is that if I create this exception handler decorator it will become quite a big of a class/function
I would recommend against the Mr. Fixit class. When an error occurs and the debugger drops you into the Mr. Fixit, you then have to walk back quite a bit before you can figure out why the error happened, and what needs to be fixed to make it go away. Likewise, an unfamiliar developer reading your code loses the ability to understand just one small snippet pertaining to a particular error, and now has to work through a large class. As an added issue, a lot of what's in the Mr. Fixit is irrelevant to the one error they're looking at, and the place where the error handling occurs is in an entirely different place. With decorators especially, I feel like you are sacrificing readability (especially for someone less familiar with decorators than you) while gaining not much.
If written with some care, try/catch blocks are not very performance intensive and do not clutter up code too much. I would suggest erring on the side of more try/catches, with every try/catch close to what it's handling, so that you can tell at a glance how errors are handled for any given piece of code (without having to go to a different file).
If you are repeating code a lot, you can either refactor by making the code inside the catch a method that can be repeatedly called, or by making the code inside the try its own method that does its error handling inside its body. When in doubt, keep it simple.
Also, I hate being a close/off-topic Nazi so I won't flag, but I do think this question is more suited to Programmers#SE (being an abstract philosophy/conceptual question) and you might get better responses on that site.

python isinstance vs hasattr vs try/except: What is better?

I am trying to figure out the tradeoffs between different approaches of determining whether or not with object obj you can perform action do_stuff(). As I understand, there are three ways of determining if this is possible:
# Way 1
if isinstance(obj, Foo):
obj.do_stuff()
# Way 2
if hasattr(obj, 'do_stuff'):
obj.do_stuff()
# Way 3
try:
obj.do_stuff()
except:
print 'Do something else'
Which is the preferred method (and why)?
I believe that the last method is generally preferred by Python coders because of a motto taught in the Python community: "Easier to ask for forgiveness than permission" (EAFP).
In a nutshell, the motto means to avoid checking if you can do something before you do it. Instead, just run the operation. If it fails, handle it appropriately.
Also, the third method has the added advantage of making it clear that the operation should work.
With that said, you really should avoid using a bare except like that. Doing so will capture any/all exceptions, even the unrelated ones. Instead, it is best to capture exceptions specifically.
Here, you will want to capture for an AttributeError:
try:
obj.do_stuff() # Try to invoke do_stuff
except AttributeError:
print 'Do something else' # If unsuccessful, do something else
Checking with isinstance runs counter to the Python convention of using duck typing.
hasattr works fine, but is Look Before you Leap instead of the more Pythonic EAFP.
Your implementation of way 3 is dangerous, since it catches any and all errors, including those raised by the do_stuff method. You could go with the more precise:
try:
_ds = obj.do_stuff
except AttributeError:
print('Do something else')
else:
_ds()
But in this case, I'd prefer way 2 despite the slight overhead - it's just way more readable.
The correct answer is 'neither'
hasattr delivers functionality however it is possibly the worst of all options.
We use the object oriented nature of python because it works. OO analysis is never accurate and often confuses however we use class hierarchies because we know they help people do better work faster. People grasp objects and a good object model helps coders change things more quickly and with less errors. The right code ends up clustered in the right places. The objects:
Can just be used without considering which implementation is present
Make it clear what needs to be changed and where
Isolate changes to some functionality from changes to some other functionality – you can fix X without fearing you will break Y
hasattr vs isinstance
Having to use isinstance or hasattr at all indicates the object model is broken or we are using it incorrectly. The right thing to do is to fix the object model or change how we are using it.
These two constructs have the same effect and in the imperative ‘I need the code to do this’ sense they are equivalent. Structurally there is a huge difference. On meeting this method for the first time (or after some months of doing other things), isinstance conveys a wealth more information about what is actually going on and what else is possible. Hasattr does not ‘tell’ you anything.
A long history of development lead us away from FORTRAN and code with loads of ‘who am I’ switches. We choose to use objects because we know they help make the code easier to work with. By choosing hasattr we deliver functionality however nothing is fixed, the code is more broken than it was before we started. When adding or changing this functionality in the future we will have to deal with code that is unequally grouped and has at least two organising principles, some of it is where it ‘should be’ and the rest is randomly scattered in other places. There is nothing to make it cohere. This is not one bug but a minefield of potential mistakes scattered over any execution path that passes through your hasattr.
So if there is any choice, the order is:
Use the object model or fix it or at least work out what is wrong
with it and how to fix it
Use isinstance
Don’t use hasattr

Can the usage of `setattr` (and `getattr`) be considered as bad practice?

setattr and getattr kind of got into my style of programing (mainly scientific stuff, my knowledge about python is self told).
Considering that exec and eval inherit a potential danger since in some cases they might lead to security issues, I was wondering if for setattr the same argument is considered to be valid. (About getattr I found this question which contains some info - although the argumentation is not very convincing.)
From what I know, setattr can be used without worrying to much, but to be honest I don't trust my python knowledge enough to be sure, and if I'm wrong I'd like to try and get rid of the habit of using setattr.
Any input is very much appreciated!
First, it could definitely make it easier to an existing security hole.
For example, let's say you have code that does exec, eval, SQL queries or URLs built via string formatting, etc. And let's say you're passing, say, locals() or a filtered __dict__ to the formatting command or as the eval context or whatever. Using setattr clearly widens the security hole, making it much easier for me to find ways to attack your code, because you can no longer be sure what you're going to be passing to those functions.
But what if you don't do anything else unsafe? Is setattr safe then?
Not as bad, but it's still not safe. If I can influence the names of the attributes you're setting, I can, e.g., replace any method I want on your objects.
You can try to protect against this by, e.g., first checking that the old value was not callable, or not a method-type descriptor, or whatever. In the same way you can try to protect against people calling functions in eval or putting quotes and semicolons in SQL parameters and so on. This is effectively the same as any of those other cases. And it's a lot harder to try to close all illegitimate paths to an open door, than to just not open the door in the first place.
What if the name never comes from anything that can be influenced by the user?
Well, in that case, why are you using setattr? There is no good reason to call setattr with a literal.
Anyway, when Lattyware said that there are often better ways to solve the problem at hand, he was almost certainly talking about readability, maintainability, and idiomaticness. But the side effect of using those better ways is that you also often avoid any security implications.
90% of the time, the solution is to use a dict instead of an object. Unlike Javascript, they're not the same thing in Python, and they're not meant to be used the same way. A dict doesn't have methods, or inheritance, or built-in special names, so you don't have to worry about any of that. It also has a more convenient syntax, where you can say d['foo'] instead of setattr(o, 'foo'). And it's probably more efficient. And so on. But ultimately, the reason to use a dict is the conceptual reason: a dict is a named collection of values; a class instance is a representation of a model-space object, and those are not the same thing.
So, why does setattr even exist?
It's there for the same basic reasons as other low-level features, like being able to access im_func or func_closure, or having modules like traceback and imp, or treating special methods just like any other methods, or for that matter exec and eval.
First, you can build higher-level things out of these low-level tools. For example, to build collections.namedtuple, you'd need either exec or setattr.
Second, you occasionally need to monkey-patch code at runtime because you can't modify it (or maybe even see it) at compile time, and tools like setattr can be essential to doing that.
The setattr feature—much like eval—is often misused by people coming from Javascript, Tcl, or a few other languages. But as long as it can be used for good, you don't want to take it out of the language. (TOOWTDI shouldn't be taken so literally that only one program can ever be written.)
But that doesn't mean you should go around using this stuff whenever possible. You wouldn't write mylist.__getitem__(slice(1, 10, 2)) instead of mylist[1:10:2]. Sometimes, being able to call __getitem__ directly or build slice objects explicitly is a foundation to something that lets the rest of your code be more pythonic, or way to localize a workaround to avoid infecting the rest of your code. Otherwise, there are clearer and simpler ways to do it.

Duck typing: how to avoid name collisions?

I think understand the idea of duck typing, and would like to use it more often in my code. However, I am concerned about one potential problem: name collision.
Suppose I want an object to do something. I know the appropriate method, so I simply call it and see what happens. In general, there are three possible outcomes:
The method is not found and AttributeError exception is raised. This indicates that the object isn't what I think it is. That's fine, since with duck typing I'm either catching such an exception, or I am willing to let the outer scope deal with it (or let the program terminate).
The method is found, it does precisely what I want, and everything is great.
The method is found, but it's not the method that I want; it's a same-name method from an entirely unrelated class. The execution continues, until either inconsistent state is detected later, or, in the worst case, the program silently produces incorrect output.
Now, I can see how good quality names can reduce the chances of outcome #3. But projects are combined, code is reused, libraries are swapped, and it's quite possible that at some point two methods have the same name and are completely unrelated (i.e., they are not intended to substitute for each other in a polymorphism).
One solution I was thinking about is to add a registry of method names. Each registry record would contain:
method name (unique; i.e., only one record per name)
its generalized description (i.e., applicable to any instance it might be called on)
the set of classes which it is intended to be used in
If a method is added to a new class, the class needs to be added to the registry (by hand). At that time, the programmer would presumably notice if the method is not consistent with the meaning already attached to it, and if necessary, use another name.
Whenever a method is called, the program would automatically verify that the name is in the registry and the class of the instance is one of the classes in the record. If not, an exception would be raised.
I understand this is a very heavy approach, but in some cases where precision is critical, I can see it might be useful. Has it been tried (in Python or other dynamically typed languages)? Are there any tools that do something similar? Are there any other approaches worth considering?
Note: I'm not referring to name clashes at the global level, where avoiding namespace pollution would be the right approach. I'm referring to clashes at the method names; these are not affected by namespaces.
Well, if this is critical, you probably should not be using duck typing...
In practice, programs are finite systems, and the range of possible types passed into any particular routine does not cause the issues you are worrying about (most often there's only ever one type passed in).
But if you want address this issue anyway, python provides ABCs (abstract base classes). these allow you to associated a "type" with any set of methods and so would work something like the registry you suggest (you can either inherit from an ABC in the normal way, or simply "register" with it).
You can then check for these types manually or automate the checking with decorators from pytyp.
But, despite being the author of pytyp, and finding these questions interesting, I personally do not find such an approach useful. In practice, what you are worrying about simply does not happen (if you want to worry about something, focus on the lack of documentation from types when using higher order functions!).
PS note - ABCs are purely metadata. They do not enforce anything. Also, checking with pytyp decorators is horrendously inefficient - you really want to do this only where it is critical.
If you are following good programming practice or let me rather say if your code is Pythoic then chances are you would seldom face such issues. Refer the FAQ What are the “best practices” for using import in a module?.
It is generally not advised to clutter the namespace and the only time when there could be a conflict if you are trying to reuse the Python reserved names and or standard libraries or name conflicts with module name. But if you encounter conflict as such then there is a serious issue with the code. For example
Why would someone name a variable as list or define a function called len?
Why would someone name a variable difflib when s/he is intending to import it in the current namespace?
To address your problem, look at abstract base classes. They're a Pythonic way to deal with this issue; you can define common behavior in a base class, and even define ways to determine if a particular object is a "virtual baseclass" of the abstract base class. This somewhat mimics the registry you're describing, without requiring all classes know about the registry beforehand.
In practice, though, this issue doesn't crop up as often as you might expect. Objects with an __iter__ method or a __str__ method are simply broken if the methods don't work the way you expect. Likewise, if you say an argument to your function requires a .callback() method defined on it, people are going to do the right thing.
If you are worried that the lack of static type checking will let some bugs get through, the answer isn't to bolt on type checking, it is to write tests.
In the presence of unit tests, a type checking system becomes largely redundant as a means of catching bugs. While it is true that a type checking system can catch some bugs, it will only catch a small subset of potential bugs. To catch the rest you'll need tests. Those unit tests will necessarily catch most of the type errors that a type checking system would have caught, as well as bugs that the type checking system cannot catch.

Check if something is a list

What is the easiest way to check if something is a list?
A method doSomething has the parameters a and b. In the method, it will loop through the list a and do something. I'd like a way to make sure a is a list, before looping through - thus avoiding an error or the unfortunate circumstance of passing in a string then getting back a letter from each loop.
This question must have been asked before - however my googles failed me. Cheers.
To enable more usecases, but still treat strings as scalars, don't check for a being a list, check that it isn't a string:
if not isinstance(a, basestring):
...
Typechecking hurts the generality, simplicity, and maintainability of your code. It is seldom used in good, idiomatic Python programs.
There are two main reasons people want to typecheck:
To issue errors if the caller provides the wrong type.
This is not worth your time. If the user provides an incompatible type for the operation you are performing, an error will already be raised when the compatibility is hit. It is worrisome that this might not happen immediately, but it typically doesn't take long at all and results in code that is more robust, simple, efficient, and easier to write.
Oftentimes people insist on this with the hope they can catch all the dumb things a user can do. If a user is willing to do arbitrarily dumb things, there is nothing you can do to stop him. Typechecking mainly has the potential of keeping a user who comes in with his own types that are drop-in replacements for the ones replaced or when the user recognizes that your function should actually be polymorphic and provides something different that can accept the same operation.
If I had a big system where lots of things made by lots of people should fit together right, I would use a system like zope.interface to make testing that everything fits together right.
To do different things based on the types of the arguments received.
This makes your code worse because your API is inconsistent. A function or method should do one thing, not fundamentally different things. This ends up being a feature not usually worth supporting.
One common scenario is to have an argument that can either be a foo or a list of foos. A cleaner solution is simply to accept a list of foos. Your code is simpler and more consistent. If it's an important, common use case only to have one foo, you can consider having another convenience method/function that calls the one that accepts a list of foos and lose nothing. Providing the first API would not only have been more complicated and less consistent, but it would break when the types were not the exact values expected; in Python we distinguish between objects based on their capabilities, not their actual types. It's almost always better to accept an arbitrary iterable or a sequence instead of a list and anything that works like a foo instead of requiring a foo in particular.
As you can tell, I do not think either reason is compelling enough to typecheck under normal circumstances.
I'd like a way to make sure a is a list, before looping through
Document the function.
Usually it's considered not a good style to perform type-check in Python, but try
if isinstance(a, list):
...
(I think you may also check if a.__iter__ exists.)

Categories

Resources