Choosing the right design pattern for execution on a dictionary key - python

Which design pattern is preferred?
For a dictionary d that may or may not have a key named 'foo'...
Pattern A
if d.get('foo'):
func(d.get('foo'))
Pattern B
foo = d.get('foo')
if foo:
func(foo)
I think I prefer the 2-line approach of Pattern A. Does the second lookup cost more than the extra assignment of Pattern B?

The two things you're writing do completely different things.
The first one uses the string 'foo' as the argument to func.
The second one uses whatever value is in d['foo'] as the argument to func.
Which one is right depends entirely on which one is what you actually want to do.
In your edited version, doing the lookup twice is silly.
Of course it "costs more"—you have to hash 'foo' twice, and look up that hash value (and possibly probe a few times) twice, and that costs twice as much as doing it once. But that performance cost is unlikely to matter in any real program. (Even if your key were expensive to hash—which a three-character string is not—most types that are expensive to hash will cache the hash value the first time…)
A more realistic potential problem is that you've added a race condition for no good reason if you ever want to make your code multithreaded or just reentrant.
But, more importantly, repeating yourself always introduces opportunities for error, because you have to repeat yourself exactly right, and it's not always obvious when you haven't done so, especially if you later edit one of the copies. (The fact that you got the question wrong in the first place is a pretty good argument…) And for the same reasons, you also introduce opportunities for people to read your code wrong, or just stumble over reading it and have to think about something that should have been obvious. That's why DRY (Don't Repeat Yourself—unless you have a good reason to do so) is a fundamental principle in programming.
That being said, the first one would be much better written as if 'foo' in d:. If you don't actually need the value, don't retrieve it.
And the second one might be better written with EAFP instead of LBYL:
try:
func(d['foo'])
except KeyError:
pass
Or, in Python 3.5:
func(d['foo']) except KeyError: None

Related

Basic python question about assignment and changing variable

Extremely basic question that I don't quite get.
If I have this line:
my_string = "how now brown cow"
and I change it like so
my_string.split()
Is this acceptable coding practice to just straight write it like that to change it?
or should I instead change it like so:
my_string = my_string.split()
don't both effectively do the same thing?
when would I use one over the other?
how does this ultimately affect my code?
always try to avoid:
my_string = my_string.split()
never, ever do something like that. the main problem with that is it's going to introduce a lot of code bugs in the future, especially for another maintainer of the code. the main problem with this, is that the result of this the split() operation is not a string anymore: it's a list. Therefore, assigning a result of this type to a variable named my_string is bound to cause more problems in the end.
The first line doesn't actually change it - it calls the .split() method on the string, but since you're not doing anything with what that function call returns, the results are just discarded.
In the second case, you assign the returned values to my_string - that means your original string is discarded, but my_string no refers to the parts returned by .split().
Both calls to .split() do the same thing, but the lines of your program do something different.
You would only use the first example if you wanted to know if a split would cause an error, for example:
try:
my_string.split()
except:
print('That was unexpected...')
The second example is the typical use, although you could us the result directly in some other way, for example passing it to a function:
print(my_string.split())
It's not a bad question though - you'll find that some libraries favour methods that change the contents of the object they are called on, while other libraries favour returning the processed result without touching the original. They are different programming paradigms and programmers can be very divided on the subject.
In most cases, Python itself (and its built-in functions and standard libraries) favours the more functional approach and will return the result of the operation, without changing the original, but there are exceptions.

Use of a temporary variable vs repeatedly read same key/value from a dictionary

Background: I need to read the same key/value from a dictionary (exactly) twice.
Question: There are two ways, as shown below,
Method 1. Read it with the same key twice, e.g.,
sample_map = {'A':1,}
...
if sample_map.get('A', None) is not None:
print("A's value in map is {}".format(sample_map.get('A')))
Method 2. Read it once and store it in a local variable, e.g,
sample_map = {'A':1,}
...
ret_val = sample.get('A', None)
if ret_val is not None:
print("A's value in map is {}".format(ret_val))
Which way is better? What are their Pros and Cons?
Note that I am aware that print() can naturally handle ret_val of None. This is a hypothetical example and I just use it for illustration purposes.
Under these conditions, I wouldn't use either. What you're really interested in is whether A is a valid key, and the KeyError (or lack thereof) raised by __getitem__ will tell you if it is or not.
try:
print("A's value in map is {}".format(sample['A'])
except KeyError:
pass
Or course, some would say there is too much code in the try block, in which case method 2 would be preferable.
try:
ret_val = sample['A']
except KeyError:
pass
else:
print("A's value in map is {}".format(ret_val))
or the code you already have:
ret_val = sample.get('A') # None is the default value for the second argument
if ret_val is not None:
print("A's value in map is {}".format(ret_val))
There isn't any effective difference between the options you posted.
Python: List vs Dict for look up table
Lookups in a dict are about o(1). Same goes for a variable you have stored.
Efficiency is about the same. In this case, I would skip defining the extra variable, since not much else is going on.
But in a case like below, where there's a lot of dict lookups going on, I have plans to refactor the code to make things more intelligible, as all of the lookups clutter or obfuscate the logic:
# At this point, assuming that these are floats is OK, since no thresholds had text values
if vname in paramRanges:
"""
Making sure the variable is one that we have a threshold for
"""
# We might care about it
# Don't check the equal case, because that won't matter
if float(tblChanges[vname][0]) < float(tblChanges[vname][1]):
# Check lower tolerance
# Distinction is important because tolerances are not always the same +/-
if abs(float(tblChanges[vname][0]) - float(tblChanges[vname][1])) >= float(
paramRanges[vname][2]):
# Difference from default is greater than tolerance
# vname : current value, default value, negative tolerance, tolerance units, change date
alerts[vname] = (
float(tblChanges[vname][0]), float(tblChanges[vname][1]), float(paramRanges[vname][2]),
paramRanges[vname][0], tblChanges[vname][2]
)
if abs(float(tblChanges[vname][0]) - float(tblChanges[vname][1])) >= float(
paramRanges[vname][1]):
alerts[vname] = (
float(tblChanges[vname][0]), float(tblChanges[vname][1]), float(paramRanges[vname][1]),
paramRanges[vname][0], tblChanges[vname][2]
)
In most cases—if you can't just rewrite your code to use EAFP as chepner suggests, which you probably can for this example—you want to avoid repeated method calls.
The only real benefit of repeating the get is saving an assignment statement.
If your code isn't crammed in the middle of a complex expression, that just means saving one line of vertical space—which isn't nothing, but isn't a huge deal.
If your code is crammed in the middle of a complex expression, pulling the get out may force you to rewrite things a bit. You may have to, e.g., turn a lambda into a def, or turn a while loop with a simple condition into a while True: with an if …: break. Usually that's a sign that you, e.g., really wanted a def in the first place, but "usually" isn't "always". So, this is where you might want to violate the rule of thumb—but see the section at the bottom first.
On the other side…
For dict.get, the performance cost of repeating the method is pretty tiny, and unlikely to impact your code. But what if you change the code to take an arbitrary mapping object from the caller, and someone passes you, say, a proxy that does a get by making a database query or an RPC to a remote server?
For single-threaded code, calling dict.get with the same arguments twice in a row without doing anything in between is correct. But what if you're taking a dict passed by the caller, and the caller has a background thread also modifying the same dict? Then your code is only correct if you put a Lock or other synchronization around the two accesses.
Or, what if your expression was something that might mutate some state, or do something dangerous?
Even if nothing like this is ever going to be an issue in your code, unless that fact is blindingly obvious to anyone reading your code, they're still going to have to think about the possibility of performance costs and ToCToU races and so on.
And, of course, it makes at least two of your lines longer. Assuming you're trying to write readable code that sticks to 72 or 79 or 99 columns, horizontal space is a scarce resource, while vertical space is much less of a big deal. I think your second version is easier to scan than your first, even without all of these other considerations, but imagine making the expression, say, 20 characters longer.
In the rare cases where pulling the repeated value out of an expression would be a problem, you still often want to assign it to a temporary.
Unfortunately, up to Python 3.7, you usually can't. It's either clumsy (e.g., requiring an extra nested comprehension or lambda just to give you an opportunity to bind a variable) or impossible.
But in Python 3.8, PEP 572 assignment expressions handle this case.
if (sample := sample_map.get('A', None)) is not None:
print("A's value in map is {}".format(sample))
I don't think this is a great use of an assignment expression (see the PEP for some better examples), especially since I'd probably write this the way chepner suggested… but it does show how to get the best of both worlds (assigning a temporary, and being embeddable in an expression) when you really need to.

Why isn't there an ignore special variable in python?

Let's say I want to partition a string. It returns a tuple of 3 items. I do not need the second item.
I have read that _ is used when a variable is to be ignored.
bar = 'asd cds'
start,_,end = bar.partition(' ')
If I understand it correctly, the _ is still filled with the value. I haven't heard of a way to ignore some part of the output.
Wouldn't that save cycles?
A bigger example would be
def foo():
return list(range(1000))
start,*_,end = foo()
It wouldn't really save any cycles to ignore the return argument, no, except for those which are trivially saved, by noting that there is no point to binding a name to a returned object that isn't used.
Python doesn't do any function inlining or cross-function optimization. It isn't targeting that niche in the slightest. Nor should it, as that would compromise many of the things that python is good at. A lot of core python functionality depends on the simplicity of its design.
Same for your list unpacking example. Its easy to think of syntactic sugar to have python pick the last and first item of the list, given that syntax. But there is no way, staying within the defining constraints of python, to actually not construct the whole list first. Who is to say that the construction of the list does not have relevant side-effects? Python, as a language, will certainly not guarantee you any such thing. As a dynamic language, python does not even have the slightest clue, or tries to concern itself, with the fact that foo might return a list, until the moment that it actually does so.
And will it return a list? What if you rebound the list identifier?
As per the docs, a valid variable name can be of this form
identifier ::= (letter|"_") (letter | digit | "_")*
It means that, first character of a variable name can be a letter or an underscore and rest of the name can have a letter or a digit or _. So, _ is a valid variable name in Python but that is less commonly used. So people normally use that like a use and throw variable.
And the syntax you have shown is not valid. It should have been
start,*_,end = foo()
Anyway, this kind of unpacking will work only in Python 3.x
In your example, you have used
list(range(1000))
So, the entire list is already constructed. When you return it, you are actually returning a reference to the list, the values are not copied actually. So, there is no specific way to ignore the values as such.
There certainly is a way to extract just a few elements. To wit:
l = foo()
start, end = foo[0], foo[-1]
The question you're asking is, then, "Why doesn't there exist a one-line shorthand for this?" There are two answers to that:
It's not common enough to need shorthand for. The two line solution is adequate for this uncommon scenario.
Features don't need a good reason to not exist. It's not like Guido van Rossum compiled a list of all possible ideas and then struck out yours. If you have an idea for improved syntax you could propose it to the Python community and see if you could get them to implement it.

Flexible Arguments in Python

This has probably been asked before, but I don't know how to look up the answer, because I'm not sure what's a good way to phrase the question.
The topic is function arguments that can semantically be expressed in many different ways. For example, to give a file to a function, you could either give the file directly, or you could give a string, which is the path to the file. To specify a number, you might allow an integer as an argument, or maybe you might allow a string (the numeral), or you might even allow a string such as "one". Another example might be a function that takes a list (of numbers, say), but as a convenience, it will convert a number into a list containing one element: that number.
Is there a more or less standard approach in Python to allowing this sort of flexibility? It certainly complicates the code for a program if you're not certain what types the arguments are, so my guess would be to try factor out the convenience functions into just one place, instead of scattered everywhere, but I don't really know how best to do that kind of factoring.
No, there is not more or less a "standard" approach to this.
Don't do that! ;)
But if you still want to, I'd suggest that you have an intermediate class or function handling this for you:
Pseudocode:
def printTheNumber(num):
print num
def intermediatePrintTheNumber(input):
num_int_dict = {'one':1, "two":2 ....
if input.isstring():
printTheNumber(num_int_dict[input])
elif input.isint():
printTheNumber(input)
else:
print "Sorry Dave, I don't understand you"
If this is pythonic I don't know, but that's how I'd solve it if I had to, of course with some more checking of the input to deem it valid.
When it comes to your comment you mention semantic similarity i.e "one" and 1 might mean the same thing.
Where should this kind of conversion be made you ask.
Well that depends on the design of your system, but I can tell you that it should not be done in the same function that I call printTheNumber for one very simple reason, and that is that that would give that function way to much responsibility.
Depending on the complexity of the input it could be the integer 1 or the string "1" or, in the worse case, "one" or maybe even worse "uno"|"one"|"yxi"|"ett" .. and so on. This should be handled by a function that has only that responsibility maybe with a database handling the mapping.
I would split it up so that I have one function handling the the strings "one", "two" ... etc, and one handling integers and have a third function that checks the input to see if it can be converted to an integer or not.
As I see it, there is a warning for a fundamental flaw in the design if you have to take measures for this kind of complexity, but you seem to be aware of that so I won't go on about it.
A good way to factor out code common to several functions is through decorators. For example,
from functools import wraps
def takes_list(func):
#wraps(func)
def wrapper(arg):
if not isinstance(arg, list):
arg = [arg]
return func(arg)
return wrapper
#takes_list
def my_func(x):
"Does something with list x."
I should note that for cases such as files, you don't want to get in the way of Python's duck typing: doing a check isinstance(arg, file) has the problem that it won't allow file-like things such as io.StringIO. Instead, check against str (or basestring) or even let open do the checking for you, with try-except.
However, it's generally better practice just to let the caller pass what they like into the function, and fail if it isn't valid.
Since you can not add methods at runtime to builtin classes such as int or str you would make a switch case statement like structure as mentioned by Daniel Figueroa.
Another way would be to just convert:
def func(i):
if not isinstance(i, int):
i = int(i) # objects can overwrite __int__ if needed.
If you have own classes that you can add methods to, you may use double dispatch to do the same thing for you. Smalltalk uses this for the Integer-Float-... conversion.
Another way would be to use subject oriented programming for which I have not found an implementation yet but I tried: https://gist.github.com/niccokunzmann/4971938

else vs return to use in a function to prematurely stop processing

[Edit] changed return 0 to return. Side effects of beinga Python n00b. :)
I'm defining a function, where i'm doing some 20 lines of processing. Before processing, i need to check if a certain condition is met. If so, then I should bypass all processing. I have defined the function this way.
def test_funciton(self,inputs):
if inputs == 0:
<Display Message box>
return
<20 line logic here>
Note that the 20 line logic does not return any value, and i'm not using the 0 returned in the first 'if'.
I want to know if this is better than using the below type of code (in terms of performance, or readability, or for any other matter), because the above method looks good to me as it is one indentation less:
def test_function(self,inputs):
if inputs == 0:
<Display Message box>
else:
<20 line logic here>
In general, it improves code readability to handle failure conditions as early as possible. Then the meat of your code doesn't have to worry about these, and the reader of your code doesn't have to consider them any more. In many cases you'd be raising exceptions, but if you really want to do nothing, I don't see that as a major problem even if you generally hew to the "single exit point" style.
But why return 0 instead of just return, since you're not using the value?
First, you can use return without anything after, you don't have to force a return 0.
For the performance way, this question seems to prove you won't notice any difference (except if you're realy unlucky ;) )
In this context, I think it's important to know why inputs can't be zero? Typically, I think the way most programs will handle this is to raise an exception if a bad value is passed. Then the exception can be handled (or not) in the calling routine.
You'll often see it written "Better to ask forgiveness" as opposed to "Look before you leap". Of course, If you're often passing 0 into the function, then the try / except clause could get expensive (try is cheap, except is not).
If you're set on "looking before you leap", I would probably use the first form to keep indentation down.
I doubt the performance is going to be significantly different in either case. Like you I would tend to lean more toward the first method for readability.
In addition to the smaller indentation(which doesn't really matter much IMO), it precludes the necessity to read further for when your inputs == 0:
In the second method one might assume that there is additional processing after the if/else statement, whereas the first one makes it obvious that the method is complete upon that condition.
It really just comes down to personal preference though, you will see both methods used in practice.
Your second example will return after it displays the message box in this case.
I prefer to "return early" as in my opinion it leads to better readability. But then again, most of my returns that happen prior to the actual end of the function tend to be more around short circuiting application logic if certain conditions are not met.

Categories

Resources