Let's say I want to partition a string. It returns a tuple of 3 items. I do not need the second item.
I have read that _ is used when a variable is to be ignored.
bar = 'asd cds'
start,_,end = bar.partition(' ')
If I understand it correctly, the _ is still filled with the value. I haven't heard of a way to ignore some part of the output.
Wouldn't that save cycles?
A bigger example would be
def foo():
return list(range(1000))
start,*_,end = foo()
It wouldn't really save any cycles to ignore the return argument, no, except for those which are trivially saved, by noting that there is no point to binding a name to a returned object that isn't used.
Python doesn't do any function inlining or cross-function optimization. It isn't targeting that niche in the slightest. Nor should it, as that would compromise many of the things that python is good at. A lot of core python functionality depends on the simplicity of its design.
Same for your list unpacking example. Its easy to think of syntactic sugar to have python pick the last and first item of the list, given that syntax. But there is no way, staying within the defining constraints of python, to actually not construct the whole list first. Who is to say that the construction of the list does not have relevant side-effects? Python, as a language, will certainly not guarantee you any such thing. As a dynamic language, python does not even have the slightest clue, or tries to concern itself, with the fact that foo might return a list, until the moment that it actually does so.
And will it return a list? What if you rebound the list identifier?
As per the docs, a valid variable name can be of this form
identifier ::= (letter|"_") (letter | digit | "_")*
It means that, first character of a variable name can be a letter or an underscore and rest of the name can have a letter or a digit or _. So, _ is a valid variable name in Python but that is less commonly used. So people normally use that like a use and throw variable.
And the syntax you have shown is not valid. It should have been
start,*_,end = foo()
Anyway, this kind of unpacking will work only in Python 3.x
In your example, you have used
list(range(1000))
So, the entire list is already constructed. When you return it, you are actually returning a reference to the list, the values are not copied actually. So, there is no specific way to ignore the values as such.
There certainly is a way to extract just a few elements. To wit:
l = foo()
start, end = foo[0], foo[-1]
The question you're asking is, then, "Why doesn't there exist a one-line shorthand for this?" There are two answers to that:
It's not common enough to need shorthand for. The two line solution is adequate for this uncommon scenario.
Features don't need a good reason to not exist. It's not like Guido van Rossum compiled a list of all possible ideas and then struck out yours. If you have an idea for improved syntax you could propose it to the Python community and see if you could get them to implement it.
Related
Clearly, it may not make that much sense to make an assignment to a function return, yet I've just encountered a strange situation and could not find a proper rule for that.
The code below does not run:
def foo():
a = [1,2,3,4,5]
return a
foo()= 10
however,
def foo():
a = [1,2,3,4,5]
return a
foo()[2] = 10
works perfectly. It is not clear to me what is going on here. It looks to me that there is an inconsistency here...
Thanks a lot.
It is not clear to me what is going on here.
It's fairly simple, really; the left-hand-side of an assignment statement is not allowed to just be any kind of expression, but some "kinds" of expression are valid assignment targets, and for those kinds of expression it doesn't matter what kinds of sub-expressions they may have. In your example, the left-hand-side is a subscription, which is a valid assignment target, so it's a syntactically valid assignment statement.
In particular, a subscription has two sub-expressions but it doesn't matter what lexical form those sub-expressions have. For example, the below may look perfectly normal to you, where the first sub-expression foo.bar is an attributeref and the second sub-expression index + 5 is an a_expr:
foo.bar[index + 5] = value
This example probably also looks perfectly normal: the first sub-expression baz[get_x()] is another subscription, and the second sub-expression get_y() is a call.
baz[get_x()][get_y()] = value
As for why it's allowed, the simple reason is that it would be a bit silly and arbitrary for the Python developers to decide on a list of restrictions on the sub-expressions. The meaning of the assignment statement is well-defined so long as the left-hand-side is one of the required lexical forms, so the only reason to impose further restrictions on the sub-expressions of those lexical forms would be to try to prevent mistakes by the programmer.
On the other hand, code like func_call()[index] = value is not necessarily a mistake anyway, because the list that func_call() returns might well be referenced elsewhere, so the mutation will be visible through other references to that list.
In python, not every value can be used of the left side of the assignment (so called lvalue). Basically, only 3 things can be lvalues:
a name, like abc
an attribute any_expression.name
a subscript any_expression[slice]
Other values, like literals, arithmetic expressions, function calls etc cannot be used on the left side:
123 = 123 # no
a + b = 123 # no
func() = 123 # no
See assignment in the grammar spec: https://docs.python.org/3/reference/grammar.html
Extremely basic question that I don't quite get.
If I have this line:
my_string = "how now brown cow"
and I change it like so
my_string.split()
Is this acceptable coding practice to just straight write it like that to change it?
or should I instead change it like so:
my_string = my_string.split()
don't both effectively do the same thing?
when would I use one over the other?
how does this ultimately affect my code?
always try to avoid:
my_string = my_string.split()
never, ever do something like that. the main problem with that is it's going to introduce a lot of code bugs in the future, especially for another maintainer of the code. the main problem with this, is that the result of this the split() operation is not a string anymore: it's a list. Therefore, assigning a result of this type to a variable named my_string is bound to cause more problems in the end.
The first line doesn't actually change it - it calls the .split() method on the string, but since you're not doing anything with what that function call returns, the results are just discarded.
In the second case, you assign the returned values to my_string - that means your original string is discarded, but my_string no refers to the parts returned by .split().
Both calls to .split() do the same thing, but the lines of your program do something different.
You would only use the first example if you wanted to know if a split would cause an error, for example:
try:
my_string.split()
except:
print('That was unexpected...')
The second example is the typical use, although you could us the result directly in some other way, for example passing it to a function:
print(my_string.split())
It's not a bad question though - you'll find that some libraries favour methods that change the contents of the object they are called on, while other libraries favour returning the processed result without touching the original. They are different programming paradigms and programmers can be very divided on the subject.
In most cases, Python itself (and its built-in functions and standard libraries) favours the more functional approach and will return the result of the operation, without changing the original, but there are exceptions.
Background: I need to read the same key/value from a dictionary (exactly) twice.
Question: There are two ways, as shown below,
Method 1. Read it with the same key twice, e.g.,
sample_map = {'A':1,}
...
if sample_map.get('A', None) is not None:
print("A's value in map is {}".format(sample_map.get('A')))
Method 2. Read it once and store it in a local variable, e.g,
sample_map = {'A':1,}
...
ret_val = sample.get('A', None)
if ret_val is not None:
print("A's value in map is {}".format(ret_val))
Which way is better? What are their Pros and Cons?
Note that I am aware that print() can naturally handle ret_val of None. This is a hypothetical example and I just use it for illustration purposes.
Under these conditions, I wouldn't use either. What you're really interested in is whether A is a valid key, and the KeyError (or lack thereof) raised by __getitem__ will tell you if it is or not.
try:
print("A's value in map is {}".format(sample['A'])
except KeyError:
pass
Or course, some would say there is too much code in the try block, in which case method 2 would be preferable.
try:
ret_val = sample['A']
except KeyError:
pass
else:
print("A's value in map is {}".format(ret_val))
or the code you already have:
ret_val = sample.get('A') # None is the default value for the second argument
if ret_val is not None:
print("A's value in map is {}".format(ret_val))
There isn't any effective difference between the options you posted.
Python: List vs Dict for look up table
Lookups in a dict are about o(1). Same goes for a variable you have stored.
Efficiency is about the same. In this case, I would skip defining the extra variable, since not much else is going on.
But in a case like below, where there's a lot of dict lookups going on, I have plans to refactor the code to make things more intelligible, as all of the lookups clutter or obfuscate the logic:
# At this point, assuming that these are floats is OK, since no thresholds had text values
if vname in paramRanges:
"""
Making sure the variable is one that we have a threshold for
"""
# We might care about it
# Don't check the equal case, because that won't matter
if float(tblChanges[vname][0]) < float(tblChanges[vname][1]):
# Check lower tolerance
# Distinction is important because tolerances are not always the same +/-
if abs(float(tblChanges[vname][0]) - float(tblChanges[vname][1])) >= float(
paramRanges[vname][2]):
# Difference from default is greater than tolerance
# vname : current value, default value, negative tolerance, tolerance units, change date
alerts[vname] = (
float(tblChanges[vname][0]), float(tblChanges[vname][1]), float(paramRanges[vname][2]),
paramRanges[vname][0], tblChanges[vname][2]
)
if abs(float(tblChanges[vname][0]) - float(tblChanges[vname][1])) >= float(
paramRanges[vname][1]):
alerts[vname] = (
float(tblChanges[vname][0]), float(tblChanges[vname][1]), float(paramRanges[vname][1]),
paramRanges[vname][0], tblChanges[vname][2]
)
In most cases—if you can't just rewrite your code to use EAFP as chepner suggests, which you probably can for this example—you want to avoid repeated method calls.
The only real benefit of repeating the get is saving an assignment statement.
If your code isn't crammed in the middle of a complex expression, that just means saving one line of vertical space—which isn't nothing, but isn't a huge deal.
If your code is crammed in the middle of a complex expression, pulling the get out may force you to rewrite things a bit. You may have to, e.g., turn a lambda into a def, or turn a while loop with a simple condition into a while True: with an if …: break. Usually that's a sign that you, e.g., really wanted a def in the first place, but "usually" isn't "always". So, this is where you might want to violate the rule of thumb—but see the section at the bottom first.
On the other side…
For dict.get, the performance cost of repeating the method is pretty tiny, and unlikely to impact your code. But what if you change the code to take an arbitrary mapping object from the caller, and someone passes you, say, a proxy that does a get by making a database query or an RPC to a remote server?
For single-threaded code, calling dict.get with the same arguments twice in a row without doing anything in between is correct. But what if you're taking a dict passed by the caller, and the caller has a background thread also modifying the same dict? Then your code is only correct if you put a Lock or other synchronization around the two accesses.
Or, what if your expression was something that might mutate some state, or do something dangerous?
Even if nothing like this is ever going to be an issue in your code, unless that fact is blindingly obvious to anyone reading your code, they're still going to have to think about the possibility of performance costs and ToCToU races and so on.
And, of course, it makes at least two of your lines longer. Assuming you're trying to write readable code that sticks to 72 or 79 or 99 columns, horizontal space is a scarce resource, while vertical space is much less of a big deal. I think your second version is easier to scan than your first, even without all of these other considerations, but imagine making the expression, say, 20 characters longer.
In the rare cases where pulling the repeated value out of an expression would be a problem, you still often want to assign it to a temporary.
Unfortunately, up to Python 3.7, you usually can't. It's either clumsy (e.g., requiring an extra nested comprehension or lambda just to give you an opportunity to bind a variable) or impossible.
But in Python 3.8, PEP 572 assignment expressions handle this case.
if (sample := sample_map.get('A', None)) is not None:
print("A's value in map is {}".format(sample))
I don't think this is a great use of an assignment expression (see the PEP for some better examples), especially since I'd probably write this the way chepner suggested… but it does show how to get the best of both worlds (assigning a temporary, and being embeddable in an expression) when you really need to.
As follow is my understanding of types & parameters passing in java and python:
In java, there are primitive types and non-primitive types. Former are not object, latter are objects.
In python, they are all objects.
In java, arguments are passed by value because:
primitive types are copied and then passed, so they are passed by value for sure. non-primitive types are passed by reference but reference(pointer) is also value, so they are also passed by value.
In python, the only difference is that 'primitive types'(for example, numbers) are not copied, but simply taken as objects.
Based on official doc, arguments are passed by assignment. What does it mean by 'passed by assignment'? Is objects in java work the same way as python? What result in the difference (passed by value in java and passed by argument in python)?
And is there any wrong understanding above?
tl;dr: You're right that Python's semantics are essentially Java's semantics, without any primitive types.
"Passed by assignment" is actually making a different distinction than the one you're asking about.1 The idea is that argument passing to functions (and other callables) works exactly the same way assignment works.
Consider:
def f(x):
pass
a = 3
b = a
f(a)
b = a means that the target b, in this case a name in the global namespace, becomes a reference to whatever value a references.
f(a) means that the target x, in this case a name in the local namespace of the frame built to execute f, becomes a reference to whatever value a references.
The semantics are identical. Whenever a value gets assigned to a target (which isn't always a simple name—e.g., think lst[0] = a or spam.eggs = a), it follows the same set of assignment rules—whether it's an assignment statement, a function call, an as clause, or a loop iteration variable, there's just one set of rules.
But overall, your intuitive idea that Python is like Java but with only reference types is accurate: You always "pass a reference by value".
Arguing over whether that counts as "pass by reference" or "pass by value" is pointless. Trying to come up with a new unambiguous name for it that nobody will argue about is even more pointless. Liskov invented the term "call by object" three decades ago, and if that never caught on, anything someone comes up with today isn't likely to do any better.
You understand the actual semantics, and that's what matters.
And yes, this means there is no copying. In Java, only primitive values are copied, and Python doesn't have primitive values, so nothing is copied.
the only difference is that 'primitive types'(for example, numbers) are not copied, but simply taken as objects
It's much better to see this as "the only difference is that there are no 'primitive types' (not even simple numbers)", just as you said at the start.
It's also worth asking why Python has no primitive types—or why Java does.2
Making everything "boxed" can be very slow. Adding 2 + 3 in Python means dereferencing the 2 and 3 objects, getting the native values out of them, adding them together, and wrapping the result up in a new 5 object (or looking it up in a table because you already have an existing 5 object). That's a lot more work than just adding two ints.3
While a good JIT like Hotspot—or like PyPy for Python—can often automatically do those optimizations, sometimes "often" isn't good enough. That's why Java has native types: to let you manually optimize things in those cases.
Python, instead, relies on third-party libraries like Numpy, which let you pay the boxing costs just once for a whole array, instead of once per element. Which keeps the language simpler, but at the cost of needing Numpy.4
1. As far as I know, "passed by assignment" appears a couple times in the FAQs, but is not actually used in the reference docs or glossary. The reference docs already lean toward intuitive over rigorous, but the FAQ, like the tutorial, goes much further in that direction. So, asking what a term in the FAQ means, beyond the intuitive idea it's trying to get across, may not be a meaningful question in the first place.
2. I'm going to ignore the issue of Java's lack of operator overloading here. There's no reason they couldn't include special language rules for a handful of core classes, even if they didn't let you do the same thing with your own classes—e.g., Go does exactly that for things like range, and people rarely complain.
3. … or even than looping over two arrays of 30-bit digits, which is what Python actually does. The cost of working on unlimited-size "bigints" is tiny compared to the cost of boxing, so Python just always pays that extra, barely-noticeable cost. Python 2 did, like Java, have separate fixed and bigint types, but a couple decades of experience showed that it wasn't getting any performance benefits out of the extra complexity.
4. The implementation of Numpy is of course far from simple. But using it is pretty simple, and a lot more people need to use Numpy than need to write Numpy, so that turns out to be a pretty decent tradeoff.
Similar to passing reference types by value in C#.
Docs: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/passing-reference-type-parameters#passing-reference-types-by-value
Code demo:
# mutable object
l = [9, 8, 7]
def createNewList(l1: list):
# l1+[0] will create a new list object, the reference address of the local variable l1 is changed without affecting the variable l
l1 = l1+[0]
def changeList(l1: list):
# Add an element to the end of the list, because l1 and l refer to the same object, so l will also change
l1.append(0)
print(l)
createNewList(l)
print(l)
changeList(l)
print(l)
# immutable object
num = 9
def changeValue(val: int):
# int is an immutable type, and changing the val makes the val point to the new object 8,
# it's not change the num value
value = 8
print(num)
changeValue(num)
print(num)
I have some functions in my code that accept either an object or an iterable of objects as input. I was taught to use meaningful names for everything, but I am not sure how to comply here. What should I call a parameter that can a sinlge object or an iterable of objects? I have come up with two ideas, but I don't like either of them:
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Normally I call iterables of objects just the plural of what I would call a single object. I know this might seem a little bit compulsive, but Python is supposed to be (among others) about readability.
I have some functions in my code that accept either an object or an iterable of objects as input.
This is a very exceptional and often very bad thing to do. It's trivially avoidable.
i.e., pass [foo] instead of foo when calling this function.
The only time you can justify doing this is when (1) you have an installed base of software that expects one form (iterable or singleton) and (2) you have to expand it to support the other use case. So. You only do this when expanding an existing function that has an existing code base.
If this is new development, Do Not Do This.
I have come up with two ideas, but I don't like either of them:
[Only two?]
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
What? Are you saying you provide NO other documentation, and no other training? No support? No advice? Who is the "someone not used to it"? Talk to them. Don't assume or imagine things about them.
Also, don't use Leading Upper Case Names.
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Terrible. Never. Do. This.
I looked in the Python library for examples. Most of the functions that do this have simple descriptions.
http://docs.python.org/library/functions.html#isinstance
isinstance(object, classinfo)
They call it "classinfo" and it can be a class or a tuple of classes.
You could do that, too.
You must consider the common use case and the exceptions. Follow the 80/20 rule.
80% of the time, you can replace this with an iterable and not have this problem.
In the remaining 20% of the cases, you have an installed base of software built around an assumption (either iterable or single item) and you need to add the other case. Don't change the name, just change the documentation. If it used to say "foo" it still says "foo" but you make it accept an iterable of "foo's" without making any change to the parameters. If it used to say "foo_list" or "foo_iter", then it still says "foo_list" or "foo_iter" but it will quietly tolerate a singleton without breaking.
80% of the code is the legacy ("foo" or "foo_list")
20% of the code is the new feature ("foo" can be an iterable or "foo_list" can be a single object.)
I guess I'm a little late to the party, but I'm suprised that nobody suggested a decorator.
def withmany(f):
def many(many_foos):
for foo in many_foos:
yield f(foo)
f.many = many
return f
#withmany
def process_foo(foo):
return foo + 1
processed_foo = process_foo(foo)
for processed_foo in process_foo.many(foos):
print processed_foo
I saw a similar pattern in one of Alex Martelli's posts but I don't remember the link off hand.
It sounds like you're agonizing over the ugliness of code like:
def ProcessWidget(widget_thing):
# Infer if we have a singleton instance and make it a
# length 1 list for consistency
if isinstance(widget_thing, WidgetType):
widget_thing = [widget_thing]
for widget in widget_thing:
#...
My suggestion is to avoid overloading your interface to handle two distinct cases. I tend to write code that favors re-use and clear naming of methods over clever dynamic use of parameters:
def ProcessOneWidget(widget):
#...
def ProcessManyWidgets(widgets):
for widget in widgets:
ProcessOneWidget(widget)
Often, I start with this simple pattern, but then have the opportunity to optimize the "Many" case when there are efficiencies to gain that offset the additional code complexity and partial duplication of functionality. If this convention seems overly verbose, one can opt for names like "ProcessWidget" and "ProcessWidgets", though the difference between the two is a single easily missed character.
You can use *args magic (varargs) to make your params always be iterable.
Pass a single item or multiple known items as normal function args like func(arg1, arg2, ...) and pass iterable arguments with an asterisk before, like func(*args)
Example:
# magic *args function
def foo(*args):
print args
# many ways to call it
foo(1)
foo(1, 2, 3)
args1 = (1, 2, 3)
args2 = [1, 2, 3]
args3 = iter((1, 2, 3))
foo(*args1)
foo(*args2)
foo(*args3)
Can you name your parameter in a very high-level way? people who read the code are more interested in knowing what the parameter represents ("clients") than what their type is ("list_of_tuples"); the type can be defined in the function documentation string, which is a good thing since it might change, in the future (the type is sometimes an implementation detail).
I would do 1 thing,
def myFunc(manyFoos):
if not type(manyFoos) in (list,tuple):
manyFoos = [manyFoos]
#do stuff here
so then you don't need to worry anymore about its name.
in a function you should try to achieve to have 1 action, accept the same parameter type and return the same type.
Instead of filling the functions with ifs you could have 2 functions.
Since you don't care exactly what kind of iterable you get, you could try to get an iterator for the parameter using iter(). If iter() raises a TypeError exception, the parameter is not iterable, so you then create a list or tuple of the one item, which is iterable and Bob's your uncle.
def doIt(foos):
try:
iter(foos)
except TypeError:
foos = [foos]
for foo in foos:
pass # do something here
The only problem with this approach is if foo is a string. A string is iterable, so passing in a single string rather than a list of strings will result in iterating over the characters in a string. If this is a concern, you could add an if test for it. At this point it's getting wordy for boilerplate code, so I'd break it out into its own function.
def iterfy(iterable):
if isinstance(iterable, basestring):
iterable = [iterable]
try:
iter(iterable)
except TypeError:
iterable = [iterable]
return iterable
def doIt(foos):
for foo in iterfy(foos):
pass # do something
Unlike some of those answering, I like doing this, since it eliminates one thing the caller could get wrong when using your API. "Be conservative in what you generate but liberal in what you accept."
To answer your original question, i.e. what you should name the parameter, I would still go with "foos" even though you will accept a single item, since your intent is to accept a list. If it's not iterable, that is technically a mistake, albeit one you will correct for the caller since processing just the one item is probably what they want. Also, if the caller thinks they must pass in an iterable even of one item, well, that will of course work fine and requires very little syntax, so why worry about correcting their misapprehension?
I would go with a name explaining that the parameter can be an instance or a list of instances. Say one_or_more_Foo_objects. I find it better than the bland param.
I'm working on a fairly big project now and we're passing maps around and just calling our parameter map. The map contents vary depending on the function that's being called. This probably isn't the best situation, but we reuse a lot of the same code on the maps, so copying and pasting is easier.
I would say instead of naming it what it is, you should name it what it's used for. Also, just be careful that you can't call use in on a not iterable.