Is it a bad practice to modify function arguments?
_list = [1,2,3]
def modify_list(list):
list.append(4)
print(_list)
modify_list(_list)
print(_list)
At first it was supposed to be a comment, but it needed more formatting and space to explain the example. ;)
If you:
know what you're doing
can justify the use
don't use mutable default arguments (they are way too confusing in the way they behave, I can't imagine the reason their use would ever be justified)
don't use global mutables anywhere near that thing (modifying mutable global's contents AND modifying mutable argument's contents?)
and, most importantly, document this thing,
this thing shouldn't cause much harm (but still might bite you if you only think you know what you're doing, but in fact you don't) and can be useful!
Example:
I've worked with scripts (made by other programmers) that used mutable arguments. In this case: dictionaries.
The script was supposed to be run with threads but also allowed single-thread run. Using dictionaries instead of return values removed the difference of getting the result in single- and multiple-thread runs:
Normally value returned by a thread is encapsulated, but we only used the value after .join anyway and didn't care about threads killed by exceptions (single-thread run was mostly for debugging/local run).
That way, dictionaries (more than one in a single function) were used for appending new results in each run, without the need of collecting returned values manually and filtering them (the called function knew in which dict to put the result in, used lock to ensure thread safety).
Was it a "good" or "wrong" way of doing things?
In my opinion it was a pythonic way of doing things:
easily readable in both forms - dealing with the result was the same in single- and multi-threaded
data was "automatically" nicely formatted - as opposed to de-capsulating thread results, manual collecting and parsing them
and fairly easy to understand - with first and last point in my list above ;)
Related
My code triggers a warning in pylint:
def getInsertDefault(collection=['key', 'value'], usefile='defaultMode.xml'):
return doInsert(collection,usefile,True)
The warning is pretty clear, it's Mutable Default Arguments, I'm getting the point in several instances it can give a wrong impression of what's happening. There are several posts on SA already, but it doesn't feel this one here is covered.
Most questions and examples deal with empty lists which are weak-referenced and can cause an error.
I'm also aware it's better practice to change the code to getInsertDefault(collection=None ...) but in this method for default-initialization, I don't intend to do anything with the list except reading, (why) is my code dangerous or could result in a pitfall?
--EDIT--
To the point: Why is the empty dictionary a dangerous default value in Python? would be answering the question.
Kind of: I am aware my code is against the convention and could result in a pitfall - but in this very specific case: Am I safe?
I found the suggestion in the comments useful to use collection=('key', 'value') instead as it's conventional and safe. Still, out of pure interest: Is my previous attempt able to create some kind of major problem?
Assuming that doInsert() (and whatever code doInsert is calling) is only ever reading collection, there is indeed no immediate issue - just a time bomb.
As soon as any part of the code seeing this list starts mutating it, your code will break in the most unexpected way, and you may have a hard time debugging the issue (imagine if what changes is a 3rd part library function tens of stack frames away... and that's in the best case, where the issue and it's root cause are still in the same direct branch of the call stack - it might stored as an instance attribute somewhere and mutated by some unrelated call, and then you're in for some fun).
Now the odds that this ever happens are rather low, but given the potential for subtles (it might just result in incorrect results once in a while, not necessarily crash the program) and hard to track bugs this introduces, you should think twice before assuming that this is really "safe".
I usually do this:
[worker.do_work() for worker in workers]
This has the advantage of being very readable and contained in a single line, but the problem of creating an object (a list) which I do not need, which means garbage collection is unnecessarily triggered.
The obvious alternative:
for worker in workers:
worker.do_work()
Is also quite readable, but uses two lines.
Is there a single-line way of achieving the same result, without creating unnecessary objects?
Sure, there is.
def doLotsOfWork(wks):
for w in wks:
w.do_work()
And now, your "one liner":
doLotsOfWork(workers)
In short, there's no "shorter" (or better way) besides using a for loop. I'd advise you not to use the list comprehension because it uses side effects - that's code smell.
"GC" in python is quite different from java. A ref-count decrement is much much cheaper than mark-and-sweep. Benchmark it, then decide if you're placing too much emphasis on a small cost.
To make it a one liner, simply define a helper function and then it's a single line to invoke it. Bury the function in an imported library if convenient.
Is there an easy way to determine/remember which functions return a new object and which act on the existing object. For example, list.append('new stuff') acts on that actual object, whereas string.rstrip() returns a new string that needs to be assigned somewhere.
I'm forever having to look up (or open the Python Interpretor to check quickly) which functions act upon, and which functions return.
There's no completely reliable way, but there are a few good heuristics:
If an object is immutable, none of its methods mutate it, of course, even the ones like str.replace that really sound mutative.
Things like sorted or set.intersection, with adjective or noun names, generally produce a new object.
If an object is mutable, methods with verb names generally modify the object. This would be stuff like list.append or set.add.
Unfortunately, not everyone picks good names, so we have things like numpy.sort, which produces a sorted copy of a NumPy array and really should have been called numpy.sorted. The most reliable way will always be to check the docs or test it out in an interpreter session.
That is what I normally do, I open the python interpreter and try different things by writing snippets. There are however more powerful ways to get metadata to display as you type, for example you could use PyCharm to give you type hinting, which will tell you available functions within the object you are trying to access as well as arguments needed and return hints.
As far as I'm aware, no, there is not an easier way than those you describe.
The way most people (including me!) learn is to do more programming, remember the frequently used functions and check the manual on those you don't know :)
I'm creating a basic database utility class in Python. I'm refactoring an old module into a class. I'm now working on an executeQuery() function, and I'm unsure of whether to keep the old design or change it. Here are the 2 options:
(The old design:) Have one generic executeQuery method that takes the query to execute and a boolean commit parameter that indicates whether to commit (insert, update, delete) or not (select), and determines with an if statement whether to commit or to select and return.
(This is the way I'm used to, but that might be because you can't have a function that sometimes returns something and sometimes doesn't in the languages I've worked with:) Have 2 functions, executeQuery and executeUpdateQuery (or something equivalent). executeQuery will execute a simple query and return a result set, while executeUpdateQuery will make changes to the DB (insert, update, delete) and return nothing.
Is it accepted to use the first way? It seems unclear to me, but maybe it's more Pythonistic...? Python is very flexible, maybe I should take advantage of this feature that can't really be accomplished in this way in more strict languages...
And a second part of this question, unrelated to the main idea - what is the best way to return query results in Python? Using which function to query the database, in what format...?
It's propably just me and my FP fetish, but I think a function executed solely for side effects is very different from a non-destructive function that fetches some data, and therefore have different names. Especially if the generic function would do something different depending on exactly that (the part on the commit parameter seems to imply that).
As for how to return results... I'm a huge fan of generators, but if the library you use for database connections returns a list anyway, you might as well pass this list on - a generator wouldn't buy you anything in this case. But if it allows you to iterate over the results (one at a time), seize the opportunity to save a lot of memory on larger queries.
I don't know how to answer the first part of your question, it seems like a matter of style more than anything else. Maybe you could invoke the Single Responsibility Principle to argue that it should be two separate functions.
When you're going to return a sequence of indeterminate length, it's best to use a Generator.
I'd have two methods, one which updates the database and one which doesn't. Both could delegate to a common private method, if they share a lot of code.
By separating the two methods, it becomes clear to callers what the different semantics are between the two, makes documenting the different methods easier, and clarifies what return types to expect. Since you can pull out shared code into private methods on the object, there's no worry about duplicating code.
As for returning query results, it'll depend on whether you're loading all the results from the database before returning, or returning a cursor object. I'd be tempted to do something like the following:
with db.executeQuery('SELECT * FROM my_table') as results:
for row in results:
print row['col1'], row['col2']
... so the executeQuery method returns a ContextManager object (which cleans up any open connections, if needed), which also acts as a Generator. And the results from the generator act as read-only dicts.
What is the easiest way to check if something is a list?
A method doSomething has the parameters a and b. In the method, it will loop through the list a and do something. I'd like a way to make sure a is a list, before looping through - thus avoiding an error or the unfortunate circumstance of passing in a string then getting back a letter from each loop.
This question must have been asked before - however my googles failed me. Cheers.
To enable more usecases, but still treat strings as scalars, don't check for a being a list, check that it isn't a string:
if not isinstance(a, basestring):
...
Typechecking hurts the generality, simplicity, and maintainability of your code. It is seldom used in good, idiomatic Python programs.
There are two main reasons people want to typecheck:
To issue errors if the caller provides the wrong type.
This is not worth your time. If the user provides an incompatible type for the operation you are performing, an error will already be raised when the compatibility is hit. It is worrisome that this might not happen immediately, but it typically doesn't take long at all and results in code that is more robust, simple, efficient, and easier to write.
Oftentimes people insist on this with the hope they can catch all the dumb things a user can do. If a user is willing to do arbitrarily dumb things, there is nothing you can do to stop him. Typechecking mainly has the potential of keeping a user who comes in with his own types that are drop-in replacements for the ones replaced or when the user recognizes that your function should actually be polymorphic and provides something different that can accept the same operation.
If I had a big system where lots of things made by lots of people should fit together right, I would use a system like zope.interface to make testing that everything fits together right.
To do different things based on the types of the arguments received.
This makes your code worse because your API is inconsistent. A function or method should do one thing, not fundamentally different things. This ends up being a feature not usually worth supporting.
One common scenario is to have an argument that can either be a foo or a list of foos. A cleaner solution is simply to accept a list of foos. Your code is simpler and more consistent. If it's an important, common use case only to have one foo, you can consider having another convenience method/function that calls the one that accepts a list of foos and lose nothing. Providing the first API would not only have been more complicated and less consistent, but it would break when the types were not the exact values expected; in Python we distinguish between objects based on their capabilities, not their actual types. It's almost always better to accept an arbitrary iterable or a sequence instead of a list and anything that works like a foo instead of requiring a foo in particular.
As you can tell, I do not think either reason is compelling enough to typecheck under normal circumstances.
I'd like a way to make sure a is a list, before looping through
Document the function.
Usually it's considered not a good style to perform type-check in Python, but try
if isinstance(a, list):
...
(I think you may also check if a.__iter__ exists.)