What's the pythonic way of conditional variable initialization? - python

Due to the scoping rules of Python, all variables once initialized within a scope are available thereafter. Since conditionals do not introduce new scope, constructs in other languages (such as initializing a variable before that condition) aren't necessarily needed. For example, we might have:
def foo(optionalvar = None):
# some processing, resulting in...
message = get_message()
if optionalvar is not None:
# some other processing, resulting in...
message = get_other_message()
# ... rest of function that uses message
or, we could have instead:
def foo(optionalvar = None):
if optionalvar is None:
# processing, resulting in...
message = get_message()
else:
# other processing, resulting in...
message = get_other_message()
# ... rest of function that uses message
Of course, the get_message and get_other_message functions might be many lines of code and are basically irrelevant (you can assume that the state of the program after each path is the same); the goal here is making message ready for use beyond this section of the function.
I've seen the latter construct used several times in other questions, such as:
https://stackoverflow.com/a/6402327/18097
https://stackoverflow.com/a/7382688/18097
Which construct would be more acceptable?

Python also has a very useful if syntax pattern which you can use here
message = get_other_message() if optional_var else get_message()
Or if you want to compare strictly with None
message = get_other_message() if optional_var is not None else get_message()
Unlike with example 1) you posted this doesn't call get_message() unnecessarily.

In general second approach is better and more generic because it doesn't involve calling get_message unconditionally. Which may be ok if that function is not resource incentive but consider a search function
def search(engine):
results = get_from_google()
if engine == 'bing':
results = get_from_bing()
obviously this is not good, i can't think of such bad scenario for second case, so basically a approach which goes thru all options and finally does the default is best e.g.
def search(engine):
if engine == 'bing':
results = get_from_bing()
else:
results = get_from_google()

I think it's more pythonic to not set an explicit rule about this, and instead just keep to the idea that smallish functions are better (in part because it's possible to keep in your mind just when new names are introduced).
I suppose though that if your conditional tests get much more complicated than an if/else you may run the risk of all of them failing and you later using an undefined name, resulting in a possible runtime error, unless you are very careful. That might be an argument for the first style, when it's possible.

The answer depends on if there are side effects of get_message() which are wanted.
In most cases clearly the second one wins, because the code which produces the unwanted result is not executed. But if you need the side effects, you should choose the first version.

It might be better (read: safer) to initialize your variable outside the conditions. If you have to define other conditions or even remove some, the user of message later on might get an uninitialized variable exception.

Related

How to call a python function when you don't need to assign result

Suppose you have a function Car.find_owner_from_plate_number(plate_number) that will raise an Exception if plate is unknown and return an Owner object if plate number exists.
Now, you do not need Owner information in your script, just to know if plate number exists (ie no exception raised)
owner = Car.find_owner_from_plate_number('ABC123')
_ = Car.find_owner_from_plate_number('ABC123')
Car.fund_owner_from_plate_number('ABC123')
With first, IDE will complain that owner is not used afterwards
Second is ok since _ is a global variable, but will assign memory in line with Owner's size
Third should also do the job, cherry on the cake without consuming memory if I'm correct.
What's the best way / more pythonic between 2nd and 3rd way? I ask because I often see 2nd way but I would be tempted to say 3rd is best.
As a rule, if you don't care about the returned value of a function, then don't assign it to anything, and it will vanish once you're done with it. This applies not only in python, but also in Java and other garbage-collected languages. In C/C++ you need to be more careful with memory management to make sure you're not leaving any memory leaks, but they do allow this behavior, and if you didn't explicitly use malloc() it's usually fine.
As for whether it's good design for find_owner_from_plate_number() to be built this way: it's okay but not great. The optimal design (more idiomatic in Java, in my experience, less so in python since exceptions are more integrated into control flow in general) would probably be
if plate is present, return an Owner
if plate is not present, return None (or the language's equivalent null value, if not writing python)
and save exception-throwing for actual error behavior that isn't expected to happen during normal use (invalid license plate format, could not connect to database, etc.). In general, try to avoid using thrown exceptions for control flow, if you can avoid it.
That said, I've seen both patterns in use at various points, and using exception behavior in this way isn't unheard of.
TLDR: If you do not need the return value, do not assign it at all. If you need some parts of the return value, assign the rest to _.
Python explicitly has an "expression statement" that solely exists to evaluate an expression and ignore its return value. Since this is the simplest way to ignore return values, it should be the preferred approach whenever applicable.
Car.find_owner('ABC123')
When the return value should be ignored only partially, an "assignment statement" is required to assign the parts of interest. In this case, the name _ is idiomatic to mean "unused" for the parts that must be assigned to something but are not of interest; it is often used as *_¹ to ignore arbitrary many items.
# we only want the owner and ignore the model
owner, _ = Car.find_owner_and_model('ABC123')
# ignore everything but the first item
owner, *_ = Car.find_owner_model_make_and_other_stuff('ABC123')
Note that _ is still a fully functional variable which keeps its assigned object alive until the end of the scope or deletion. This is not usually an issue when used in short-lived functions, but may require explicit cleanup when used for large objects at global scope.
¹ The * is commonly known as the "star" or "splat" operator. It denotes iterable packing/unpacking.

Calling non-pure function in list comprehension

I have the following code (simplified):
def send_issue(issue):
message = bot.send_issue(issue)
return message
def send_issues(issues):
return [send_issue(issue) for issue in issues]
As you see, send_issues and send_issue are non-pure functions. Is this considered a good practice (and Pythonic) to call non-pure functions in list comprehensions? The reason I want to do this is that this is convenient. The reason against this is that when you see a list comprehension, you expect that this code just generates the list and nothing more, but that's not the case.
UPD:
I actually want to create and return the list contrary this question.
The question here is - Do you really need to create the list?
If this is so it is okay but not the best design.
It is good practice for a function to do only one thing especially if it has a side effect like I/O.
In your case, the function is creating and sending a message.
To fix this you can create a function that is sending the message and a function which is generating the message.
It is better to write it as.
msgs = [bot.get_message(issue) for issue in issues]
for msg in msgs:
bot.send(msg)
This is clearer and widens the use of API while keeping the side effect isolated.
If you don't want to create another function you can at least use map since it says - "map this function to every element".
map(lambda issue: bot.send_issue(issue), issues) # returns a list
Also, the function send_issue is not needed because it just wraps the bot.send_issue.
Adding such functions is only making the code noisy which is not a good practice.

Use of a temporary variable vs repeatedly read same key/value from a dictionary

Background: I need to read the same key/value from a dictionary (exactly) twice.
Question: There are two ways, as shown below,
Method 1. Read it with the same key twice, e.g.,
sample_map = {'A':1,}
...
if sample_map.get('A', None) is not None:
print("A's value in map is {}".format(sample_map.get('A')))
Method 2. Read it once and store it in a local variable, e.g,
sample_map = {'A':1,}
...
ret_val = sample.get('A', None)
if ret_val is not None:
print("A's value in map is {}".format(ret_val))
Which way is better? What are their Pros and Cons?
Note that I am aware that print() can naturally handle ret_val of None. This is a hypothetical example and I just use it for illustration purposes.
Under these conditions, I wouldn't use either. What you're really interested in is whether A is a valid key, and the KeyError (or lack thereof) raised by __getitem__ will tell you if it is or not.
try:
print("A's value in map is {}".format(sample['A'])
except KeyError:
pass
Or course, some would say there is too much code in the try block, in which case method 2 would be preferable.
try:
ret_val = sample['A']
except KeyError:
pass
else:
print("A's value in map is {}".format(ret_val))
or the code you already have:
ret_val = sample.get('A') # None is the default value for the second argument
if ret_val is not None:
print("A's value in map is {}".format(ret_val))
There isn't any effective difference between the options you posted.
Python: List vs Dict for look up table
Lookups in a dict are about o(1). Same goes for a variable you have stored.
Efficiency is about the same. In this case, I would skip defining the extra variable, since not much else is going on.
But in a case like below, where there's a lot of dict lookups going on, I have plans to refactor the code to make things more intelligible, as all of the lookups clutter or obfuscate the logic:
# At this point, assuming that these are floats is OK, since no thresholds had text values
if vname in paramRanges:
"""
Making sure the variable is one that we have a threshold for
"""
# We might care about it
# Don't check the equal case, because that won't matter
if float(tblChanges[vname][0]) < float(tblChanges[vname][1]):
# Check lower tolerance
# Distinction is important because tolerances are not always the same +/-
if abs(float(tblChanges[vname][0]) - float(tblChanges[vname][1])) >= float(
paramRanges[vname][2]):
# Difference from default is greater than tolerance
# vname : current value, default value, negative tolerance, tolerance units, change date
alerts[vname] = (
float(tblChanges[vname][0]), float(tblChanges[vname][1]), float(paramRanges[vname][2]),
paramRanges[vname][0], tblChanges[vname][2]
)
if abs(float(tblChanges[vname][0]) - float(tblChanges[vname][1])) >= float(
paramRanges[vname][1]):
alerts[vname] = (
float(tblChanges[vname][0]), float(tblChanges[vname][1]), float(paramRanges[vname][1]),
paramRanges[vname][0], tblChanges[vname][2]
)
In most cases—if you can't just rewrite your code to use EAFP as chepner suggests, which you probably can for this example—you want to avoid repeated method calls.
The only real benefit of repeating the get is saving an assignment statement.
If your code isn't crammed in the middle of a complex expression, that just means saving one line of vertical space—which isn't nothing, but isn't a huge deal.
If your code is crammed in the middle of a complex expression, pulling the get out may force you to rewrite things a bit. You may have to, e.g., turn a lambda into a def, or turn a while loop with a simple condition into a while True: with an if …: break. Usually that's a sign that you, e.g., really wanted a def in the first place, but "usually" isn't "always". So, this is where you might want to violate the rule of thumb—but see the section at the bottom first.
On the other side…
For dict.get, the performance cost of repeating the method is pretty tiny, and unlikely to impact your code. But what if you change the code to take an arbitrary mapping object from the caller, and someone passes you, say, a proxy that does a get by making a database query or an RPC to a remote server?
For single-threaded code, calling dict.get with the same arguments twice in a row without doing anything in between is correct. But what if you're taking a dict passed by the caller, and the caller has a background thread also modifying the same dict? Then your code is only correct if you put a Lock or other synchronization around the two accesses.
Or, what if your expression was something that might mutate some state, or do something dangerous?
Even if nothing like this is ever going to be an issue in your code, unless that fact is blindingly obvious to anyone reading your code, they're still going to have to think about the possibility of performance costs and ToCToU races and so on.
And, of course, it makes at least two of your lines longer. Assuming you're trying to write readable code that sticks to 72 or 79 or 99 columns, horizontal space is a scarce resource, while vertical space is much less of a big deal. I think your second version is easier to scan than your first, even without all of these other considerations, but imagine making the expression, say, 20 characters longer.
In the rare cases where pulling the repeated value out of an expression would be a problem, you still often want to assign it to a temporary.
Unfortunately, up to Python 3.7, you usually can't. It's either clumsy (e.g., requiring an extra nested comprehension or lambda just to give you an opportunity to bind a variable) or impossible.
But in Python 3.8, PEP 572 assignment expressions handle this case.
if (sample := sample_map.get('A', None)) is not None:
print("A's value in map is {}".format(sample))
I don't think this is a great use of an assignment expression (see the PEP for some better examples), especially since I'd probably write this the way chepner suggested… but it does show how to get the best of both worlds (assigning a temporary, and being embeddable in an expression) when you really need to.

Python: Should I avoid initialization of variables inside blocks?

Problem
I have a code like this
if condition:
a = f(x)
else:
a = g(y)
Initialization of a inside of the block looks bad for me. Can it be written better?
I cannot use ternary operator, because names of functions and/or lists of arguments are long.
Saying "long" I mean that the following expression
a = f(x) if condition else g(y)
will take more than 79 (sometimes even more than 119) symbols with real names instead of a, f, g, x, y and condition.
Usage of multiple slashes will make the code ugly and messy.
I don't want to initialize a with result of one of the functions by defaul, because both function are slow and I cannot allow such overhead
a = g(y)
if condition:
a = f(x)
I can initialize the variable with None, but is this solution pretty enough?
a = None
if condition:
a = f(x)
else:
a = g(y)
Let me explain my position: in C and C++ variables inside of a block have the block as their scope. In ES6 the let keyword was introduced — it allows to create variables with the same scoping rules as variables in C and C++. Variables defined with old var keyword have similar scoping rules as in Python.
That's why I think that initialization of variables should be made outside blocks if I want to use the variables outside these blocks.
Update
Here is more complicated example
for obj in gen:
# do something with the `obj`
if predicate(obj):
try:
result = f(obj)
except Exception as e:
log(e)
continue
else:
result = g(obj)
# do something useful with the `result`
else:
result = h(obj)
display(result)
I go through elements of some generator gen, process them and perform some actions on the result on each iteration.
Then I want to do something with the last result outside of the loop.
Is it pythonic enough to not assign a dummy value to the result beforehand?
Doesn't this make the code less readable?
Question
Is it good to initialize variables inside if/else/for/etc. in Python?
Python has no block scope... the scope is the whole function and it's perfectly pythonic to write
if <condition>:
a = f()
else:
a = g()
If you want to write in C++ then write in C++ using C++, don't write C++ using Python... it's a bad idea.
Ok, there are two points that need to be clarified here which are fundamental to python.
There is no variable declaration/initialization in python. An expression like a = f(x) is simply a scheme to name the object that is returned by f as a. That namea can be later used to name any other object no matter what its type is. See this answer.
A block in python is either the body of a module, a class or a function. Anything defined/named inside these objects are visible to later code until the end of a block. A loop or an if-else is not a block. So any name defined before outside the loop or if/else will be visible inside and vice versa. See this. global and nonlocal objects are a little different. There is no let in python since that is default behavior.
In your case the only concern is how you are using a further in the code. If you code expects the type of objects returned by f or g it should work fine unless there is an error. Because at least one of the if or the else should run in a normal operation so a will refer to some kind of an object (if the names were different in if and else that would be a problem). If you want to make sure that the subsequent code does not break you can use a try-except-else to catch any error generated by the functions and assign a default value to a in the except clause after appropriate reporting/logging of the error.
Hence to summarize and also to directly address your question, assigning names to objects inside an if-else statement or a loop is perfectly good practice provided:
The same name is used in both if and else clause so that the name is guaranteed to refer to an object at the end of the statement. Additional try-except-else error catching can take care of exceptions raised by the functions.
The names should not be too short, generic or something that does not make the intention of the code clear like a, res etc. A sensible name will lead to much better readability and prevent accidental use of the same name later for some other object thereby losing the original.
Let me clarify what I meant in my comments.
#this is not, strictly, needed, but it makes the
#exception handler more robust
a = b = None
try:
if condition:
a = f(x)
b = v(x)
else:
a = g(y)
b = v2(x)
return w(a, b)
except Exception, e:
logger.exception("exception:%s" % (e))
logger.exception(" the value of a was:%s" % (a))
logger.exception(" the value of b was:%s" % (b))
raise
This is pretty std code, you just want to wrap the whole thing in some logging code in case of exceptions. I re-raise the original exception, but could just as easily return a default value.
Problem is, unless the exception waits until return w(a, b) to happen, any access to a and b will throw its own NameError based on those variables not having been declared.
This has happened to me, a lot, with custom web unittesting code - I get a response from a get or post to an url and I run a bunch of tests against the response. If the original get/post that failed, the response doesn't exist, so any diagnostics like pretty printing that response's attributes will throw an exception, forcing you to clean things up before your exception handler is useful.
So, to guard against this, I initialize any variable referred to in the exception handler to None. Of course, if needed, you also have to guard against a being None, with something like logger("a.attr1:%s" % (getattr(a, "attr1","?")

When to type-check a function's arguments?

I'm asking about situations where if a wrong type of argument is passed to the function, it could:
Blow up the whole thing.
Return unexpected results
Return nothing
For instance, the function below expects the argument name to be a string. It would throw an exception for all other types that doesn't have a startswith method.
def fruits(name):
if name.startswith('O'):
print('Is it Orange?')
There are other cases where a function could halt or cause damage to the system if execution proceeds without type-checking. Whenever there are a lot of functions or functions with a lot of arguments, type checking is tedious and makes the code unreadable. So, is there a standard for doing this? As to 'how to type check' - there are plenty of examples here on stackexchange, but I couldn't find any about where it would be appropriate to do so.
Another example would be:
def fruits(names):
with open('important_file.txt', 'r+') as fil:
for name in names:
if name in fil:
# Edit the file
Here if the name is a string each character in it will influence the editing of the file. If it is any other iterable, each element provided by it would influence the editing. Both of these could produce different results.
So, when should we type-check an argument and should we not?
The answer off the top of my head would be: it depends where the input comes from.
If the functions are class methods that get invokes internally or things like that, you can assume the inputs are valid, because you wrote it!
For example
def add(x,y):
return x + y
def multiply(a,b):
product = 0
for i in range(a):
product = add(product, b)
return product
In my add function, I could check that there is a + operator for the parameters x and y. But since I wrote the multiply function, and that is the only function that uses add, it is safe to assume the inputs will be int because that's how I wrote it. Now that argument stands on shaky ground for large code bases where you (hopefully) have shared code, so you can't be sure people don't misuse your functions. But that's why you comment them well to describe the correct use of said function.
If it has to read from a file, get user input, etc, then you may want to do some validation first.
I almost never do type checking in Python. In accordance with Pythonic philosophy I assume that me and other programmers are adult people capable of reading the code (or at least the documentation) and using it properly. I assume that we test our code before we let it destroy something important. After all in most cases if you do something wrong, you'll just see an error and Python's error messages are quite informative most of the time.
The only occasion when I sometimes check types is when I want my function to behave differently depending on the argument's type. But although I sometimes feel compelled to do this, I don't consider it a good practice.
Most often it happens when my function iterates over a list of strings and I fear (or want) I could get a single string passed into it by accident - this won't throw an error at once because unfortunately string is an iterable too.

Categories

Resources