Operations: Saving in Variables Then Operating vs Single Liners - python

I am writing a program in Python (using the numpy package). I am writing a program that contains a very long function that involves many terms:
result = a + b + c + d +...
...whatever. These terms a, b, c, d, etc...themselves are matrices that involve many operations, for example in Python code:
a = np.identity(3, dtype = np.double)/3.0
b = np.kron(vec1, vec2).reshape(3,3) # Also with np.double precision.
Just taking two variables, I have been wondering if doing:
a = np.identity(3, dtype = np.double)/3.0
b = np.kron(vec1, vec2).reshape(3,3) # Also with np.double precision.
c = a + b
is the same as doing:
c = np.identity(3, dtype = np.double)/3.0 + np.kron(vec1, vec2).reshape(3,3)
This may sound silly, but I require a very high numerical stability, i.e., introducing numerical errors, as subtle as they are, might ruing the program or yield a weird result. Of course, this question can be extended to other programming languages.
Which is suggested? Does it matter? Any suggested references?

Under "normal" circumstances, both approaches are equivalent.
In other words, whether you use a value through an explicit expression (eg, np.identity(3, dtype = np.double)/3.0) or through a variable-name that has been initialized with that expression (here, a), the outcome would "normally" be the same.
There are some not-so-normal circumstances, where they may produce different results. As far as I can see all these have to do with situations in which there are side-effects such that the outcome depends upon the order in which things happen. For example:
Consider a scenario where the initialization of the variable-name b involves a side-effect that affects the initialization of the variable-name a. And let's say your code depends on that side-effect. In this scenario, in the case of the fist approach (where you first initialize the variable-names and then use only those variables), your code would have to initialize b first, and a later -- the order of the initialization of the variable-names matters. In the second approach (where you would have explicit expressions rather than variable-names, participating in a larger expression), to achieve the same effect, you will have to pay attention to the order in which Python interpreter evaluates sub-expressions within an expression. If you don't, then the order of evaluation of sub-expressions may not produce the side-effect that your code needs, and you might end up getting a different result.
As for other programming languages the answer is a big yes, the two approaches can yield different results, in languages (such as Java), where the variable-names have associated data-types, which can cause some silent numerical conversions (such as truncations) to happen during variable-assignment.

Related

Keep receiving windows exit code -1073741819 when running python program from PyCharm [duplicate]

Fortran allows the use of array operations much easily. For example,
double precision :: a(3,3), b(3,3), c(3,3)
Given a and b are initialized, I am aware that a simple c=a+b would result in a matrix addition. The same can be achieved using c(:,:) = a(:,:)+b(:,:). I am aware that the second method allows slicing of arrays with the use of appropriate indexing. Apart from this, are there any differences between these two methods? Should a particular method be preferred over other?
In the expression
c = a + b
the references a, b and c are to whole arrays. In
c(:,:) = a(:,:) + b(:,:)
the references a(:,:), b(:,:) and c(:,:) are to array sections. These are different things.
In general an array section does not have the properties of the whole array: if c is a pointer or allocatable array even then c(:,:) is not. Such an aspect is most notable when reallocation may occur on assignment.
In c=a+b c may be unallocated and will be allocated in response, but in c(:)=... c must be allocated before the assignment. If c is allocated by the time of the assignment c=... then it will be deallocated if:
the right-hand side is a different shape from that of c; or
any length type parameters of the right-hand side differ from that of c; or
c is polymorphic and has either a different dynamic type or corresponding kind type parameters of the right-hand side expression differ from those of c.
If there is such deallocation then c is re-allocated to match the right-hand side expression.
With the array section no such freedom exists: c(:) must suitably match the right-hand side expression or there must be appropriate conversion available (including instead defined assignment).
There are other aspects following from the distinction of whole array and array section.
In the specific context of the question where the arrays are explicit shape then there is less to worry about.
In terms of style, one may view using the array section as adding clarity to human readers of code as "this is an array assignment" or using whole arrays as an aid to compilers in optimizing array operations. The balance between these is situation specific and King notes a related question which considers performance aspects.
Further, because of the deallocation/reallocation mentioned above, compilers are obliged to perform (potentially expensive) shape/type/type parameter checks on intrinsic assignment to allocatable whole arrays (to determine whether deallocation must happen). Using an array section means that these tests are not necessary. For example, with
c(:,:) = array_expr
the programmer guarantees that the array expression array_expr is of the same shape as c (if this is not the case then the fragment cannot be valid Fortran) and the compiler need not run the deallocation checks. Again, using this is a choice for the individual situation. (Also note that the compiler may offer runtime checks which look at whether such expressions match: if using this "trick" one should disable these checks.)

Python performance: repeating calculations vs temp variable

Does python recalculate every repeating expression in code?
For example does
a = [1,23,45,45,456,34]
b = len(a) + 213
c = len(a) + 3432
differ in performance from
a = [1,23,45,45,456,34]
l = len(a)
b = l + 213
c = l + 3432
I would guess second one uses more memory (to store l) but less cpu. Am I correct?
Does python recalculate every repeating expression in code?
It is unspecified in the language specification. In fact, this is highly dependent of the Python implementation. The mainstream Python implementation, called CPython, does recompute the expression. PyPy (an alternative implementation focusing on performance) usually do not recompute the expression in hot portions of the code, thanks to just-in-time compilation. There are many other implementation of Python (eg. Pyston, Jython, IronPython) and each one can behave differently.
I would guess second one uses more memory (to store l) but less cpu.
Yes, but the difference is actually marginal and still dependent of the Python implementation used (eg. PyPy may not require more memory in this case). Note that calling len on a list is very fast and this is done in constant time.
While the second code should be slightly faster, such micro-optimization will likely have no significant impact on a big code. Keep in mind that readable code are generally easier to maintain, improve and optimize.

In tensorflow, how do you create a graph that adds more than two arguments?

In tensorflow, I want to do something like:
A = tf.add(a, b, c)
That is, I want to create a graph that adds more than two arguments. How do I do this?
TL;DR
Use tf.accumulate_n.
More on that
The answer is actually more complicated in tensorflow than one would anticipate.
You can chain the additions:
res = a + b + c
but then you create as many nodes as there are additions. What's more, you force the order of the additions: a and b are summed first (so TF has to wait for their values to be ready) then c is added.
The solution seems to be
res = tf.add_n([a, b, c])
which creates a single node. Alas, tf.add_n is not efficient. It waits for all inputs to be ready before summing them, so it is actually less efficient than chaining additions, which can start as soon as a and b are ready. Second, all inputs must be in memory at the same time, which wastes memory — again, when chaining additions, a and b can be discarded before summing c.
The better way to sum multiple inputs is to use tf.accumulate_n, which alleviates the problems of tf.add_n, because it sums inputs as they come.
IMHO the only reason for still having tf.add_n around is for compatibility with TF < 1.7, for which tf.accumulate_n does not pass gradients through — a major con if you intend to support older version of TF.

what does it mean by 'passed by assignment'?

As follow is my understanding of types & parameters passing in java and python:
In java, there are primitive types and non-primitive types. Former are not object, latter are objects.
In python, they are all objects.
In java, arguments are passed by value because:
primitive types are copied and then passed, so they are passed by value for sure. non-primitive types are passed by reference but reference(pointer) is also value, so they are also passed by value.
In python, the only difference is that 'primitive types'(for example, numbers) are not copied, but simply taken as objects.
Based on official doc, arguments are passed by assignment. What does it mean by 'passed by assignment'? Is objects in java work the same way as python? What result in the difference (passed by value in java and passed by argument in python)?
And is there any wrong understanding above?
tl;dr: You're right that Python's semantics are essentially Java's semantics, without any primitive types.
"Passed by assignment" is actually making a different distinction than the one you're asking about.1 The idea is that argument passing to functions (and other callables) works exactly the same way assignment works.
Consider:
def f(x):
pass
a = 3
b = a
f(a)
b = a means that the target b, in this case a name in the global namespace, becomes a reference to whatever value a references.
f(a) means that the target x, in this case a name in the local namespace of the frame built to execute f, becomes a reference to whatever value a references.
The semantics are identical. Whenever a value gets assigned to a target (which isn't always a simple name—e.g., think lst[0] = a or spam.eggs = a), it follows the same set of assignment rules—whether it's an assignment statement, a function call, an as clause, or a loop iteration variable, there's just one set of rules.
But overall, your intuitive idea that Python is like Java but with only reference types is accurate: You always "pass a reference by value".
Arguing over whether that counts as "pass by reference" or "pass by value" is pointless. Trying to come up with a new unambiguous name for it that nobody will argue about is even more pointless. Liskov invented the term "call by object" three decades ago, and if that never caught on, anything someone comes up with today isn't likely to do any better.
You understand the actual semantics, and that's what matters.
And yes, this means there is no copying. In Java, only primitive values are copied, and Python doesn't have primitive values, so nothing is copied.
the only difference is that 'primitive types'(for example, numbers) are not copied, but simply taken as objects
It's much better to see this as "the only difference is that there are no 'primitive types' (not even simple numbers)", just as you said at the start.
It's also worth asking why Python has no primitive types—or why Java does.2
Making everything "boxed" can be very slow. Adding 2 + 3 in Python means dereferencing the 2 and 3 objects, getting the native values out of them, adding them together, and wrapping the result up in a new 5 object (or looking it up in a table because you already have an existing 5 object). That's a lot more work than just adding two ints.3
While a good JIT like Hotspot—or like PyPy for Python—can often automatically do those optimizations, sometimes "often" isn't good enough. That's why Java has native types: to let you manually optimize things in those cases.
Python, instead, relies on third-party libraries like Numpy, which let you pay the boxing costs just once for a whole array, instead of once per element. Which keeps the language simpler, but at the cost of needing Numpy.4
1. As far as I know, "passed by assignment" appears a couple times in the FAQs, but is not actually used in the reference docs or glossary. The reference docs already lean toward intuitive over rigorous, but the FAQ, like the tutorial, goes much further in that direction. So, asking what a term in the FAQ means, beyond the intuitive idea it's trying to get across, may not be a meaningful question in the first place.
2. I'm going to ignore the issue of Java's lack of operator overloading here. There's no reason they couldn't include special language rules for a handful of core classes, even if they didn't let you do the same thing with your own classes—e.g., Go does exactly that for things like range, and people rarely complain.
3. … or even than looping over two arrays of 30-bit digits, which is what Python actually does. The cost of working on unlimited-size "bigints" is tiny compared to the cost of boxing, so Python just always pays that extra, barely-noticeable cost. Python 2 did, like Java, have separate fixed and bigint types, but a couple decades of experience showed that it wasn't getting any performance benefits out of the extra complexity.
4. The implementation of Numpy is of course far from simple. But using it is pretty simple, and a lot more people need to use Numpy than need to write Numpy, so that turns out to be a pretty decent tradeoff.
Similar to passing reference types by value in C#.
Docs: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/passing-reference-type-parameters#passing-reference-types-by-value
Code demo:
# mutable object
l = [9, 8, 7]
def createNewList(l1: list):
# l1+[0] will create a new list object, the reference address of the local variable l1 is changed without affecting the variable l
l1 = l1+[0]
def changeList(l1: list):
# Add an element to the end of the list, because l1 and l refer to the same object, so l will also change
l1.append(0)
print(l)
createNewList(l)
print(l)
changeList(l)
print(l)
# immutable object
num = 9
def changeValue(val: int):
# int is an immutable type, and changing the val makes the val point to the new object 8,
# it's not change the num value
value = 8
print(num)
changeValue(num)
print(num)

Why are there no ++ and --​ operators in Python?

Why are there no ++ and -- operators in Python?
It's not because it doesn't make sense; it makes perfect sense to define "x++" as "x += 1, evaluating to the previous binding of x".
If you want to know the original reason, you'll have to either wade through old Python mailing lists or ask somebody who was there (eg. Guido), but it's easy enough to justify after the fact:
Simple increment and decrement aren't needed as much as in other languages. You don't write things like for(int i = 0; i < 10; ++i) in Python very often; instead you do things like for i in range(0, 10).
Since it's not needed nearly as often, there's much less reason to give it its own special syntax; when you do need to increment, += is usually just fine.
It's not a decision of whether it makes sense, or whether it can be done--it does, and it can. It's a question of whether the benefit is worth adding to the core syntax of the language. Remember, this is four operators--postinc, postdec, preinc, predec, and each of these would need to have its own class overloads; they all need to be specified, and tested; it would add opcodes to the language (implying a larger, and therefore slower, VM engine); every class that supports a logical increment would need to implement them (on top of += and -=).
This is all redundant with += and -=, so it would become a net loss.
This original answer I wrote is a myth from the folklore of computing: debunked by Dennis Ritchie as "historically impossible" as noted in the letters to the editors of Communications of the ACM July 2012 doi:10.1145/2209249.2209251
The C increment/decrement operators were invented at a time when the C compiler wasn't very smart and the authors wanted to be able to specify the direct intent that a machine language operator should be used which saved a handful of cycles for a compiler which might do a
load memory
load 1
add
store memory
instead of
inc memory
and the PDP-11 even supported "autoincrement" and "autoincrement deferred" instructions corresponding to *++p and *p++, respectively. See section 5.3 of the manual if horribly curious.
As compilers are smart enough to handle the high-level optimization tricks built into the syntax of C, they are just a syntactic convenience now.
Python doesn't have tricks to convey intentions to the assembler because it doesn't use one.
I always assumed it had to do with this line of the zen of python:
There should be one — and preferably only one — obvious way to do it.
x++ and x+=1 do the exact same thing, so there is no reason to have both.
Of course, we could say "Guido just decided that way", but I think the question is really about the reasons for that decision. I think there are several reasons:
It mixes together statements and expressions, which is not good practice. See http://norvig.com/python-iaq.html
It generally encourages people to write less readable code
Extra complexity in the language implementation, which is unnecessary in Python, as already mentioned
Because, in Python, integers are immutable (int's += actually returns a different object).
Also, with ++/-- you need to worry about pre- versus post- increment/decrement, and it takes only one more keystroke to write x+=1. In other words, it avoids potential confusion at the expense of very little gain.
Clarity!
Python is a lot about clarity and no programmer is likely to correctly guess the meaning of --a unless s/he's learned a language having that construct.
Python is also a lot about avoiding constructs that invite mistakes and the ++ operators are known to be rich sources of defects.
These two reasons are enough not to have those operators in Python.
The decision that Python uses indentation to mark blocks rather
than syntactical means such as some form of begin/end bracketing
or mandatory end marking is based largely on the same considerations.
For illustration, have a look at the discussion around introducing a conditional operator (in C: cond ? resultif : resultelse) into Python in 2005.
Read at least the first message and the decision message of that discussion (which had several precursors on the same topic previously).
Trivia:
The PEP frequently mentioned therein is the "Python Enhancement Proposal" PEP 308. LC means list comprehension, GE means generator expression (and don't worry if those confuse you, they are none of the few complicated spots of Python).
My understanding of why python does not have ++ operator is following: When you write this in python a=b=c=1 you will get three variables (labels) pointing at same object (which value is 1). You can verify this by using id function which will return an object memory address:
In [19]: id(a)
Out[19]: 34019256
In [20]: id(b)
Out[20]: 34019256
In [21]: id(c)
Out[21]: 34019256
All three variables (labels) point to the same object. Now increment one of variable and see how it affects memory addresses:
In [22] a = a + 1
In [23]: id(a)
Out[23]: 34019232
In [24]: id(b)
Out[24]: 34019256
In [25]: id(c)
Out[25]: 34019256
You can see that variable a now points to another object as variables b and c. Because you've used a = a + 1 it is explicitly clear. In other words you assign completely another object to label a. Imagine that you can write a++ it would suggest that you did not assign to variable a new object but ratter increment the old one. All this stuff is IMHO for minimization of confusion. For better understanding see how python variables works:
In Python, why can a function modify some arguments as perceived by the caller, but not others?
Is Python call-by-value or call-by-reference? Neither.
Does Python pass by value, or by reference?
Is Python pass-by-reference or pass-by-value?
Python: How do I pass a variable by reference?
Understanding Python variables and Memory Management
Emulating pass-by-value behaviour in python
Python functions call by reference
Code Like a Pythonista: Idiomatic Python
It was just designed that way. Increment and decrement operators are just shortcuts for x = x + 1. Python has typically adopted a design strategy which reduces the number of alternative means of performing an operation. Augmented assignment is the closest thing to increment/decrement operators in Python, and they weren't even added until Python 2.0.
I'm very new to python but I suspect the reason is because of the emphasis between mutable and immutable objects within the language. Now, I know that x++ can easily be interpreted as x = x + 1, but it LOOKS like you're incrementing in-place an object which could be immutable.
Just my guess/feeling/hunch.
To complete already good answers on that page:
Let's suppose we decide to do this, prefix (++i) that would break the unary + and - operators.
Today, prefixing by ++ or -- does nothing, because it enables unary plus operator twice (does nothing) or unary minus twice (twice: cancels itself)
>>> i=12
>>> ++i
12
>>> --i
12
So that would potentially break that logic.
now if one needs it for list comprehensions or lambdas, from python 3.8 it's possible with the new := assignment operator (PEP572)
pre-incrementing a and assign it to b:
>>> a = 1
>>> b = (a:=a+1)
>>> b
2
>>> a
2
post-incrementing just needs to make up the premature add by subtracting 1:
>>> a = 1
>>> b = (a:=a+1)-1
>>> b
1
>>> a
2
I believe it stems from the Python creed that "explicit is better than implicit".
First, Python is only indirectly influenced by C; it is heavily influenced by ABC, which apparently does not have these operators, so it should not be any great surprise not to find them in Python either.
Secondly, as others have said, increment and decrement are supported by += and -= already.
Third, full support for a ++ and -- operator set usually includes supporting both the prefix and postfix versions of them. In C and C++, this can lead to all kinds of "lovely" constructs that seem (to me) to be against the spirit of simplicity and straight-forwardness that Python embraces.
For example, while the C statement while(*t++ = *s++); may seem simple and elegant to an experienced programmer, to someone learning it, it is anything but simple. Throw in a mixture of prefix and postfix increments and decrements, and even many pros will have to stop and think a bit.
The ++ class of operators are expressions with side effects. This is something generally not found in Python.
For the same reason an assignment is not an expression in Python, thus preventing the common if (a = f(...)) { /* using a here */ } idiom.
Lastly I suspect that there operator are not very consistent with Pythons reference semantics. Remember, Python does not have variables (or pointers) with the semantics known from C/C++.
as i understood it so you won't think the value in memory is changed.
in c when you do x++ the value of x in memory changes.
but in python all numbers are immutable hence the address that x pointed as still has x not x+1. when you write x++ you would think that x change what really happens is that x refrence is changed to a location in memory where x+1 is stored or recreate this location if doe's not exists.
Other answers have described why it's not needed for iterators, but sometimes it is useful when assigning to increase a variable in-line, you can achieve the same effect using tuples and multiple assignment:
b = ++a becomes:
a,b = (a+1,)*2
and b = a++ becomes:
a,b = a+1, a
Python 3.8 introduces the assignment := operator, allowing us to achievefoo(++a) with
foo(a:=a+1)
foo(a++) is still elusive though.
Maybe a better question would be to ask why do these operators exist in C. K&R calls increment and decrement operators 'unusual' (Section 2.8page 46). The Introduction calls them 'more concise and often more efficient'. I suspect that the fact that these operations always come up in pointer manipulation also has played a part in their introduction.
In Python it has been probably decided that it made no sense to try to optimise increments (in fact I just did a test in C, and it seems that the gcc-generated assembly uses addl instead of incl in both cases) and there is no pointer arithmetic; so it would have been just One More Way to Do It and we know Python loathes that.
This may be because #GlennMaynard is looking at the matter as in comparison with other languages, but in Python, you do things the python way. It's not a 'why' question. It's there and you can do things to the same effect with x+=. In The Zen of Python, it is given: "there should only be one way to solve a problem." Multiple choices are great in art (freedom of expression) but lousy in engineering.
I think this relates to the concepts of mutability and immutability of objects. 2,3,4,5 are immutable in python. Refer to the image below. 2 has fixed id until this python process.
x++ would essentially mean an in-place increment like C. In C, x++ performs in-place increments. So, x=3, and x++ would increment 3 in the memory to 4, unlike python where 3 would still exist in memory.
Thus in python, you don't need to recreate a value in memory. This may lead to performance optimizations.
This is a hunch based answer.
I know this is an old thread, but the most common use case for ++i is not covered, that being manually indexing sets when there are no provided indices. This situation is why python provides enumerate()
Example : In any given language, when you use a construct like foreach to iterate over a set - for the sake of the example we'll even say it's an unordered set and you need a unique index for everything to tell them apart, say
i = 0
stuff = {'a': 'b', 'c': 'd', 'e': 'f'}
uniquestuff = {}
for key, val in stuff.items() :
uniquestuff[key] = '{0}{1}'.format(val, i)
i += 1
In cases like this, python provides an enumerate method, e.g.
for i, (key, val) in enumerate(stuff.items()) :
In addition to the other excellent answers here, ++ and -- are also notorious for undefined behavior. For example, what happens in this code?
foo[bar] = bar++;
It's so innocent-looking, but it's wrong C (and C++), because you don't know whether the first bar will have been incremented or not. One compiler might do it one way, another might do it another way, and a third might make demons fly out of your nose. All would be perfectly conformant with the C and C++ standards.
(EDIT: C++17 has changed the behavior of the given code so that it is defined; it will be equivalent to foo[bar+1] = bar; ++bar; — which nonetheless might not be what the programmer is expecting.)
Undefined behavior is seen as a necessary evil in C and C++, but in Python, it's just evil, and avoided as much as possible.

Categories

Resources