Mutable Default Arguments - (Why) is my code dangerous?

Mutable Default Arguments - (Why) is my code dangerous? - python

My code triggers a warning in pylint:
def getInsertDefault(collection=['key', 'value'], usefile='defaultMode.xml'):
return doInsert(collection,usefile,True)
The warning is pretty clear, it's Mutable Default Arguments, I'm getting the point in several instances it can give a wrong impression of what's happening. There are several posts on SA already, but it doesn't feel this one here is covered.
Most questions and examples deal with empty lists which are weak-referenced and can cause an error.
I'm also aware it's better practice to change the code to getInsertDefault(collection=None ...) but in this method for default-initialization, I don't intend to do anything with the list except reading, (why) is my code dangerous or could result in a pitfall?
--EDIT--
To the point: Why is the empty dictionary a dangerous default value in Python? would be answering the question.
Kind of: I am aware my code is against the convention and could result in a pitfall - but in this very specific case: Am I safe?
I found the suggestion in the comments useful to use collection=('key', 'value') instead as it's conventional and safe. Still, out of pure interest: Is my previous attempt able to create some kind of major problem?

Assuming that doInsert() (and whatever code doInsert is calling) is only ever reading collection, there is indeed no immediate issue - just a time bomb.
As soon as any part of the code seeing this list starts mutating it, your code will break in the most unexpected way, and you may have a hard time debugging the issue (imagine if what changes is a 3rd part library function tens of stack frames away... and that's in the best case, where the issue and it's root cause are still in the same direct branch of the call stack - it might stored as an instance attribute somewhere and mutated by some unrelated call, and then you're in for some fun).
Now the odds that this ever happens are rather low, but given the potential for subtles (it might just result in incorrect results once in a while, not necessarily crash the program) and hard to track bugs this introduces, you should think twice before assuming that this is really "safe".

Related

Is it wrong practice to modify function arguments?

Is it a bad practice to modify function arguments?
_list = [1,2,3]
def modify_list(list):
list.append(4)
print(_list)
modify_list(_list)
print(_list)

At first it was supposed to be a comment, but it needed more formatting and space to explain the example. ;)
If you:
know what you're doing
can justify the use
don't use mutable default arguments (they are way too confusing in the way they behave, I can't imagine the reason their use would ever be justified)
don't use global mutables anywhere near that thing (modifying mutable global's contents AND modifying mutable argument's contents?)
and, most importantly, document this thing,
this thing shouldn't cause much harm (but still might bite you if you only think you know what you're doing, but in fact you don't) and can be useful!
Example:
I've worked with scripts (made by other programmers) that used mutable arguments. In this case: dictionaries.
The script was supposed to be run with threads but also allowed single-thread run. Using dictionaries instead of return values removed the difference of getting the result in single- and multiple-thread runs:
Normally value returned by a thread is encapsulated, but we only used the value after .join anyway and didn't care about threads killed by exceptions (single-thread run was mostly for debugging/local run).
That way, dictionaries (more than one in a single function) were used for appending new results in each run, without the need of collecting returned values manually and filtering them (the called function knew in which dict to put the result in, used lock to ensure thread safety).
Was it a "good" or "wrong" way of doing things?
In my opinion it was a pythonic way of doing things:
easily readable in both forms - dealing with the result was the same in single- and multi-threaded
data was "automatically" nicely formatted - as opposed to de-capsulating thread results, manual collecting and parsing them
and fairly easy to understand - with first and last point in my list above ;)

Most pythonic way to call dependant methods

I have a class with few methods - each one is setting some internal state, and usually requires some other method to be called first, to prepare stage.
Typical invocation goes like this:
c = MyMysteryClass()
c.connectToServer()
c.downloadData()
c.computeResults()
In some cases only connectToServer() and downloadData() will be called (or even just connectToServer() alone).
The question is: how should those methods behave when they are called in wrong order (or, in other words, when the internal state is not yet ready for their task)?
I see two solutions:
They should throw an exception
They should call correct previous method internally
Currently I'm using second approach, as it allows me to write less code (I can just write c.computeResults() and know that two other methods will be called if necessary). Plus, when I call them multiple times, I don't have to keep track of what was already called and so I avoid multiple reconnecting or downloading.
On the other hand, first approach seems more predictable from the caller perspective, and possibly less error prone.
And of course, there is a possibility for a hybrid solution: throw and exception, and add another layer of methods with internal state detection and proper calling of previous ones. But that seems to be a bit of an overkill.
Your suggestions?

They should throw an exception. As said in the Zen of Python: Explicit is better than implicit. And, for that matter, Errors should never pass silently. Unless explicitly silenced. If the methods are called out of order that's a programmer's mistake, and you shouldn't try to fix that by guessing what they mean. You might accidentally cover up an oversight in a way that looks like it works, but is not actually an accurate reflection of the programmer's intent. (That programmer may be future you.)
If these methods are usually called immediately one after another, you could consider collating them by adding a new method that simply calls them all in a row. That way you can use that method and not have to worry about getting it wrong.
Note that classes that handle internal state in this way are sometimes called for but are often not, in fact, necessary. Depending on your use case and the needs of the rest of your application, you may be better off doing this with functions and actually passing connection objects, etc. from one method to another, rather than using a class to store internal state. See for instance Stop Writing Classes. This is just something to consider and not an imperative; plenty of reasonable people disagree with the theory behind Stop Writing Classes.

You should write exceptions. It is good programming practice to write Exceptions to make your code easier to understand for the following reasons:
What you are describe fits the literal description of "exception" -- it is an exception to normal proceedings.
If you build in some kind of work around, you will likely have "spaghetti code" = BAD.
When you, or someone else goes back and reads this code later, it will be difficult to understand if you do not provide the hint that it is an exception to have these methods executed out of order.
Here's a good source:
http://jeffknupp.com/blog/2013/02/06/write-cleaner-python-use-exceptions/
As my CS professor always said "Good programmers can write code that computers can read, but great programmers write code that humans and computers can read".
I hope this helps.

If it's possible, you should make the dependencies explicit.
For your example:
c = MyMysteryClass()
connection = c.connectToServer()
data = c.downloadData(connection)
results = c.computeResults(data)
This way, even if you don't know how the library works, there's only one order the methods could be called in.

What is the pythonic way to bubble up error conditions

I have been working on a Python project that has grown somewhat large, and has several layers of functions. Due to some boneheaded early decisions I'm finding that I have to go a fix a lot of crashers because the lower level functions are returning a type I did not expect in the higher level functions (usually None).
Before I go through and clean this up, I got to wondering what is the most pythonic way of indicating error conditions and handling them in higher functions?
What I have been doing for the most part is if a function can not complete and return its expected result, I'll return None. This gets a little gross, as you end up having to always check for None in all the functions that call it.
def lowLevel():
## some error occurred
return None
## processing was good, return normal result string
return resultString
def highLevel():
resultFromLow = lowLevel()
if not resultFromLow:
return None
## some processing error occurred
return None
## processing was good, return normal result string
return resultString
I guess another solution might be to throw exceptions. With that you still get a lot of code in the calling functions to handle the exception.
Nothing seems super elegant. What do other people use? In obj-c a common pattern is to return an error parameter by reference, and then the caller checks that.

It really depends on what you want to do about the fact this processing can't be completed. If these are really exceptions to the ordinary control flow, and you need to clean up, bail out, email people etc. then exceptions are what you want to throw. In this case the handling code is just necessary work, and is basically unavoidable, although you can rethrow up the stack and handle the errors in one place to keep it tidier.
If these errors can be tolerated, then one elegant solution is to use the Null Object pattern. Rather than returning None, you return an object that mimics the interface of the real object that would normally be returned, but just doesn't do anything. This allows all code downstream of the failure to continue to operate, oblivious to the fact there was a failure. The main downside of this pattern is that it can make it hard to spot real errors, since your code will not crash, but may not produce anything useful at the end of its run.
A common example of the Null Object pattern in Python is returning an empty list or dict when you're lower level function has come up empty. Any subsequent function using this returned value and iterating through elements will just fall through silently, without the need for error checking. Of course, if you require the list to have at least one element for some reason, then this won't work, and you're back to handling an exceptional situation again.

On the bright side, you have discovered exactly the problem with using a return value to indicate an error condition.
Exceptions are the pythonic way to deal with problems. The question you have to answer (and I suspect you already did) is: Is there a useful default that can be returned by low_level functions? If so, take it and run with it; otherwise, raise an exception (`ValueError', 'TypeError' or even a custom error).
Then, further up the call stack, where you know how to deal with the problem, catch the exception and deal with it. You don't have to catch exceptions immediately -- if high_level calls mid-level calls low_level, it's okay to not have any try/except in mid_level and let high_level deal with it. It may be that all you can do is have a try/except at the top of your program to catch and log all uncaught and undealt-with errors, and that can be okay.

This is not necessarily Pytonic as such but experience has taught me to let exceptions "lie where they lie".
That is to say; Don't unnecessarily hide them or re-raise a different exception.
It's sometimes good practice to let the callee fail rather than trying to capture and hide all kinds of error conditions.
Obviously this topic is and can be a little subjective; but if you don't hide or raise a different exception, then it's much easier to debug your code and much easier for the callee of your functions or api to understand what went wrong.
Note: This answer is not complete -- See comments. Some or all of the answers presented in this Q&A should probably be combined in a nice way presneting the various problems and solutions in a clear and concise manner.

In Python, is set.pop() threadsafe?

I know that the builtin set type in python is not generally threadsafe, but this answer claims that it is safe to call pop() from two competing threads. Sure, you might get an exception, but your data isn't corrupted. I can't seem to find a doc that validates this claim. Is it true? Documentation, please!

If you look at the set.pop method in the CPython source you'll see that it doesn't release the GIL.
That means that only one set.pop will ever be happening at a time within a CPython process.
Since set.pop checks if the set is empty, you can't cause anything but an IndexError by trying to pop from an empty set.
So no, you can't corrupt the data by popping from a set in multiple threads with CPython.

I believe the Set "pop" operation to be thread-safe due to being atomic, in the sense that two threads won't be able to pop the same value.
I wouldn't rely on it's behavior if another thread was, for instance, iterating over that collection.
I couldn't find any concrete documentation either, just some topics that point in this direction. Python official documentation would indeed benefit with information of this kind.

Check if something is a list

What is the easiest way to check if something is a list?
A method doSomething has the parameters a and b. In the method, it will loop through the list a and do something. I'd like a way to make sure a is a list, before looping through - thus avoiding an error or the unfortunate circumstance of passing in a string then getting back a letter from each loop.
This question must have been asked before - however my googles failed me. Cheers.

To enable more usecases, but still treat strings as scalars, don't check for a being a list, check that it isn't a string:
if not isinstance(a, basestring):
...

Typechecking hurts the generality, simplicity, and maintainability of your code. It is seldom used in good, idiomatic Python programs.
There are two main reasons people want to typecheck:
To issue errors if the caller provides the wrong type.
This is not worth your time. If the user provides an incompatible type for the operation you are performing, an error will already be raised when the compatibility is hit. It is worrisome that this might not happen immediately, but it typically doesn't take long at all and results in code that is more robust, simple, efficient, and easier to write.
Oftentimes people insist on this with the hope they can catch all the dumb things a user can do. If a user is willing to do arbitrarily dumb things, there is nothing you can do to stop him. Typechecking mainly has the potential of keeping a user who comes in with his own types that are drop-in replacements for the ones replaced or when the user recognizes that your function should actually be polymorphic and provides something different that can accept the same operation.
If I had a big system where lots of things made by lots of people should fit together right, I would use a system like zope.interface to make testing that everything fits together right.
To do different things based on the types of the arguments received.
This makes your code worse because your API is inconsistent. A function or method should do one thing, not fundamentally different things. This ends up being a feature not usually worth supporting.
One common scenario is to have an argument that can either be a foo or a list of foos. A cleaner solution is simply to accept a list of foos. Your code is simpler and more consistent. If it's an important, common use case only to have one foo, you can consider having another convenience method/function that calls the one that accepts a list of foos and lose nothing. Providing the first API would not only have been more complicated and less consistent, but it would break when the types were not the exact values expected; in Python we distinguish between objects based on their capabilities, not their actual types. It's almost always better to accept an arbitrary iterable or a sequence instead of a list and anything that works like a foo instead of requiring a foo in particular.
As you can tell, I do not think either reason is compelling enough to typecheck under normal circumstances.

I'd like a way to make sure a is a list, before looping through
Document the function.

Usually it's considered not a good style to perform type-check in Python, but try
if isinstance(a, list):
...
(I think you may also check if a.__iter__ exists.)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Mutable Default Arguments - (Why) is my code dangerous? - python

Related

Is it wrong practice to modify function arguments?

Most pythonic way to call dependant methods

What is the pythonic way to bubble up error conditions

In Python, is set.pop() threadsafe?

Check if something is a list

Categories

Resources