Python: is the iteration of the multidimensional array super slow?

Python: is the iteration of the multidimensional array super slow? - python

I have to iterate all items in two-dimensional array of integers and change the value (according to some rule, not important).
I'm surprised how significant difference in performance is there between python runtime and C# or java runtime. Did I wrote totally wrong python code (v2.7.2)?
import numpy
a = numpy.ndarray((5000,5000), dtype = numpy.int32)
for x in numpy.nditer(a.T):
x = 123
>python -m timeit -n 2 -r 2 -s "import numpy; a = numpy.ndarray((5000,5000), dtype=numpy.int32)" "for x in numpy.nditer(a.T):" " x = 123"
2 loops, best of 2: 4.34 sec per loop
For example the C# code performs only 50ms, i.e. python is almost 100 times slower! (suppose the matrix variable is already initialized)
for (y = 0; y < 5000; y++)
for (x = 0; x < 5000; x++)
matrix[y][x] = 123;

Yep! Iterating through numpy arrays in python is slow. (Slower than iterating through a python list, as well.)
Typically, you avoid iterating through them directly.
If you can give us an example of the rule you're changing things based on, there's a good chance that it's easy to vectorize.
As a toy example:
import numpy as np
x = np.linspace(0, 8*np.pi, 100)
y = np.cos(x)
x[y > 0] = 100
However, in many cases you have to iterate, either due to the algorithm (e.g. finite difference methods) or to lessen the memory cost of temporary arrays.
In that case, have a look at Cython, Weave, or something similar.

The example you gave was presumably meant to set all items of a two-dimensional NumPy array to 123. This can be done efficiently like this:
a.fill(123)
or
a[:] = 123

Python is a much more dynamic language than C or C#. The main reason why the loop is so slow is that on every pass, the CPython interpreter is doing some extra work that wastes time: specifically, it is binding the name x with the next object from the iterator, then when it evaluates the assignment it has to look up the name x again.
As #Sven Marnach noted, you can call a method function numpy.fill() and it is fast. That function is compiled C or maybe Fortran, and it will simply loop over the addresses of the numpy.array data structure and fill in the values. Much less dynamic than Python, which is good for this simple case.
But now consider PyPy. Once you run your program under PyPy, a JIT analyzes what your code is actually doing. In this example, it notes that the name x isn't used for anything but the assignment, and it can optimize away binding the name. This example should be one that PyPy speeds up tremendously; likely PyPy will be ten times faster than plain Python (so only one-tenth as fast as C, rather than 1/100 as fast).
http://pypy.org
As I understand it, PyPy won't be working with Numpy for a while yet, so you can't just run your existing Numpy code under PyPy yet. But the day is coming.
I'm excited about PyPy. It offers the hope that we can write in a very high-level language (Python) and yet get nearly the performance of writing things in "portable assembly language" (C). For examples like this one, the Numpy might even beat the performance of naive C code, by using SIMD instructions from the CPU (SSE2, NEON, or whatever). For this example, with SIMD, you could set four integers to 123 with each loop, and that would be faster than a plain C loop. (Unless the C compiler used a SIMD optimization also! Which, come to think of it, is likely for this case. So we are back to "nearly the speed of C" rather than faster for this example. But we can imagine trickier cases that the C compiler isn't smart enough to optimize, where a future PyPy might.)
But never mind PyPy for now. If you will be working with Numpy, it is a good idea to learn all the functions like numpy.fill() that are there to speed up your code.

C++ emphasizes machine time over programmer time.
Python emphasizes programmer time over machine time.
Pypy is a python written in python, and they have the beginnings of numpy; you might try that. Pypy has a nice JIT that makes things quite fast.
You could also try cython, which allows you to translate a dialect of Python to C, and compile the C to a Python C extension module; this allows one to continue using CPython for most of your code, while still getting a bit of a speedup. However, in the one microbenchmark I've tried comparing Pypy and Cython, Pypy was quite a bit faster than Cython.
Cython uses a highly pythonish syntax, but it allows you to pretty freely intermix Python datatypes with C datatypes. If you redo your hotspots with C datatypes, it should be pretty fast. Continuing to use Python datatypes is sped up by Cython too, but not as much.

The nditer code does not assign a value to the elements of a. This doesn't affect the timings issue, but I mention it because it should not be taken as a good use of nditer.
a correct version is:
for i in np.nditer(a, op_flags=[["readwrite"]]):
i[...] = 123
The [...] is needed to retain the reference to loop value, which is an array of shape ().
There's no point in using A.T, since its the values of the base A that get changed.
I agree that the proper way of doing this assignment is a[:]=123.

If you need to do operations on a multidimensional array that depend on the value of the array but don't depend on the position inside the array then .itemset is 5 times faster than nditer for me.
So instead of doing something like
image = np.random.random_sample((200, 200,3));
with np.nditer(image, op_flags=['readwrite']) as it:
for x in it:
x[...] = x*4.5 if x<0.2 else x
You can do this
image2 = np.random.random_sample((200, 200,3));
for i in range(0,image2.size):
x = image2.item(i)
image2.itemset(i, x*4.5 if x<0.2 else x);

Related

Python performance: repeating calculations vs temp variable

Does python recalculate every repeating expression in code?
For example does
a = [1,23,45,45,456,34]
b = len(a) + 213
c = len(a) + 3432
differ in performance from
a = [1,23,45,45,456,34]
l = len(a)
b = l + 213
c = l + 3432
I would guess second one uses more memory (to store l) but less cpu. Am I correct?

Does python recalculate every repeating expression in code?
It is unspecified in the language specification. In fact, this is highly dependent of the Python implementation. The mainstream Python implementation, called CPython, does recompute the expression. PyPy (an alternative implementation focusing on performance) usually do not recompute the expression in hot portions of the code, thanks to just-in-time compilation. There are many other implementation of Python (eg. Pyston, Jython, IronPython) and each one can behave differently.
I would guess second one uses more memory (to store l) but less cpu.
Yes, but the difference is actually marginal and still dependent of the Python implementation used (eg. PyPy may not require more memory in this case). Note that calling len on a list is very fast and this is done in constant time.
While the second code should be slightly faster, such micro-optimization will likely have no significant impact on a big code. Keep in mind that readable code are generally easier to maintain, improve and optimize.

Why is `word == word[::-1]` to test for palindrome faster than a more algorithmic solution in Python?

I wrote a disaster of a question on Code Review asking why Python programmers normally test if a string is a palindrome by comparing the string to itself reversed, instead of a more algorithmic way with lower complexity, assuming that the normal way would be faster.
Here is the pythonic way:
def is_palindrome_pythonic(word):
# The slice requires N operations, plus memory
# and the equality requires N operations in the worst case
return word == word[::-1]
Here is my attempt at a more efficient way to accomplish this:
def is_palindrome_normal(word):
# This requires N/2 operations in the worst case
low = 0
high = len(word) - 1
while low < high:
if word[low] != word[high]:
return False
low += 1
high -= 1
return True
I would expect the normal way would be faster than the pythonic way. See for example this great article
Timing it with timeit, however, brought exactly the opposite result:
setup = '''
def is_palindrome_pythonic(word):
# ...
def is_palindrome_normal(word):
# ...
# N here is 2000
first_half = ''.join(map(str, (i for i in range(1000))))
word = first_half + first_half[::-1]
'''
timeit.timeit('is_palindrome_pythonic(word)', setup=setup, number=1000)
# 0.0052
timeit.timeit('is_palindrome_normal(word)', setup=setup, number=1000)
# 0.4268
I then figured that my n was too small, so I changed the length of word from 2000 to 2,000,000. The pythonic way took about 16 seconds on average, whereas the normal way ran several minutes before I canceled it.
Incidentally, in the best case scenario, where the very first letter does not match the very last letter, the normal algorithm was much faster.
What explains the extreme difference between the speeds of the two algorithms?

Because the "Pythonic" way with slicing is implemented in C. The interpreter / VM doesn't need to execute more than approximately once. The bulk of the algorithm is spent in a tight loop of native code.

As much as I love Python, I have to say that if you want maximum speed you probably shouldn't be using Python. ;)
The rule of thumb in Python time optimization is to use operators or module functions that do the bulk of the work at C speed rather than equivalent code running at Python speed. Even if the two equivalent approaches are using algorithms with the same big-O complexity, the time scaling factor of (mostly) running directly on the CPU vs running on the Python virtual machine has a big impact.
This is even true of an algorithm that's mostly just integer arithmetic, since Python integers are immutable objects, so when you do arithmetic there's the overhead of allocating and initialising a new integer object and disposing of the old one. CPython tries to be frugal, and is pretty smart at managing memory (so every new object doesn't require a system call to allocate memory), and of course the CPython interpreter maintains a cache of integers from -5 to 256 (inclusive) so that arithmetic with small numbers isn't so bad. But it's certainly slower than doing arithmetic at C speed with machine integers.
You can see the difference even with a simple counting loop. On my admittedly ancient 32 bit machine running Python 3.6, using the Bash time command to do the timings,
m = 5000000
for i in range(m):
i
is roughly twice as fast as
m = 5000000
i = 0
while i<m:
i += 1
because range can do the arithmetic at C speed, even though it still has to create a new integer object on each iteration. If you replace the i line in the range version with pass the time is roughly halved.
With more complicated algorithms the time differences can be much more significant, eg string or list copying that happens at the C level can often be done with efficient CPU operators that are much faster than chugging along on the Python virtual machine with Python code.
I agree that this can take a while to get used to if you come from a language that gets compiled to native machine code. And I admit that even after over 10 years of using Python it still feels a little weird to me that when (for example) you need to do some bit manipulation stuff that it can often be faster in Python to do it using string operations on a string composed of '0's and '1's that to do it using the traditional bitwise and arithmetic integer operators.
OTOH, I think it's useful to know the traditional algorithms as well as the Pythonic ones. It's rare that a programmer will work only in Python, so it's good to know how to do things in languages that don't work the way that Python does.

In python, is math.acos() faster than numpy.arccos() for scalars?

I'm doing some scientific computing in Python with a lot of geometric calculations, and I ran across a significant difference between using numpy versus the standard math library.
>>> x = timeit.Timer('v = np.arccos(a)', 'import numpy as np; a = 0.6')
>>> x.timeit(100000)
0.15387153439223766
>>> y = timeit.Timer('v = math.acos(a)', 'import math; a = 0.6')
>>> y.timeit(100000)
0.012333301827311516
That's more than a 10x speedup! I'm using numpy for almost all standard math functions, and I just assumed it was optimized and at least as fast as math. For long enough vectors, numpy.arccos() will eventually win vs. looping with math.acos(), but since I only use the scalar case, is there any downside to using math.acos(), math.asin(), math.atan() across the board, instead of the numpy versions?

Using the functions from the math module for scalars is perfectly fine. The numpy.arccos function is likely to be slower due to
conversion to an array (and a C data type)
C function call overhead
conversion of the result back to a python type
If this difference in performance is important for your problem, you should check if you really can't use array operations. As user2357112 said in the comments, arrays are what numpy really is great at.

Interpretation between fortran and python

Now I am trying to rewrite the fortran code to python script.
In original fortran code, it declares real number as:
real a(n),b(n),D(64)
Here how can I convert this D(64) in python code?
a(n), b(n) are values from the data which I used, but D(64) is not.
I need to put this into sub-module fortran code which I wrapped with f2py.
That sub-module code looks like below. M is just a integer which will be define in main code.
subroutine multires (a,b,M, D)
implicit none
real a(*),b(*),D(*)
.
.
if(nw.gt.1) D(ms+1)=(sumab/nw)

If you code is looking for good performance, stay away from python lists. Instead you want to use numpy arrays.
import numpy as np
D = np.zeros((64), np.float32)
This will construct a numpy ndarray with 64 elements of 32 bit reals initialized to 0. Using these kind of arrays rather than lists can greatly improve the performance of your python code over lists. They also give you finer control over the typing for interoperability and you incur less overhead, especially if you get into cython.

Python is dynamically typed, so you have not to declare variables with their type. A Fortran array can be translated in a Python list, but as you should not want to dynamically add elements to the list, you should initialize it to its size. So Fortran real D(64) could become in Python:
D = [ 0. for i in range(64) ]
(declares D to be a list and initializes it with 64 0. values)
Of course, if you can use numpy and not just plain Python, you could use numpy.array type. Among other qualities (efficiency, type control, ...), it can be declared with F (for Fortran) order, meaning first-index varies the fastest, as Fortran programmers are used to.

Why are there no ++ and -- operators in Python?

Why are there no ++ and -- operators in Python?

It's not because it doesn't make sense; it makes perfect sense to define "x++" as "x += 1, evaluating to the previous binding of x".
If you want to know the original reason, you'll have to either wade through old Python mailing lists or ask somebody who was there (eg. Guido), but it's easy enough to justify after the fact:
Simple increment and decrement aren't needed as much as in other languages. You don't write things like for(int i = 0; i < 10; ++i) in Python very often; instead you do things like for i in range(0, 10).
Since it's not needed nearly as often, there's much less reason to give it its own special syntax; when you do need to increment, += is usually just fine.
It's not a decision of whether it makes sense, or whether it can be done--it does, and it can. It's a question of whether the benefit is worth adding to the core syntax of the language. Remember, this is four operators--postinc, postdec, preinc, predec, and each of these would need to have its own class overloads; they all need to be specified, and tested; it would add opcodes to the language (implying a larger, and therefore slower, VM engine); every class that supports a logical increment would need to implement them (on top of += and -=).
This is all redundant with += and -=, so it would become a net loss.

This original answer I wrote is a myth from the folklore of computing: debunked by Dennis Ritchie as "historically impossible" as noted in the letters to the editors of Communications of the ACM July 2012 doi:10.1145/2209249.2209251
The C increment/decrement operators were invented at a time when the C compiler wasn't very smart and the authors wanted to be able to specify the direct intent that a machine language operator should be used which saved a handful of cycles for a compiler which might do a
load memory
load 1
add
store memory
instead of
inc memory
and the PDP-11 even supported "autoincrement" and "autoincrement deferred" instructions corresponding to *++p and *p++, respectively. See section 5.3 of the manual if horribly curious.
As compilers are smart enough to handle the high-level optimization tricks built into the syntax of C, they are just a syntactic convenience now.
Python doesn't have tricks to convey intentions to the assembler because it doesn't use one.

I always assumed it had to do with this line of the zen of python:
There should be one — and preferably only one — obvious way to do it.
x++ and x+=1 do the exact same thing, so there is no reason to have both.

Of course, we could say "Guido just decided that way", but I think the question is really about the reasons for that decision. I think there are several reasons:
It mixes together statements and expressions, which is not good practice. See http://norvig.com/python-iaq.html
It generally encourages people to write less readable code
Extra complexity in the language implementation, which is unnecessary in Python, as already mentioned

Because, in Python, integers are immutable (int's += actually returns a different object).
Also, with ++/-- you need to worry about pre- versus post- increment/decrement, and it takes only one more keystroke to write x+=1. In other words, it avoids potential confusion at the expense of very little gain.

Clarity!
Python is a lot about clarity and no programmer is likely to correctly guess the meaning of --a unless s/he's learned a language having that construct.
Python is also a lot about avoiding constructs that invite mistakes and the ++ operators are known to be rich sources of defects.
These two reasons are enough not to have those operators in Python.
The decision that Python uses indentation to mark blocks rather
than syntactical means such as some form of begin/end bracketing
or mandatory end marking is based largely on the same considerations.
For illustration, have a look at the discussion around introducing a conditional operator (in C: cond ? resultif : resultelse) into Python in 2005.
Read at least the first message and the decision message of that discussion (which had several precursors on the same topic previously).
Trivia:
The PEP frequently mentioned therein is the "Python Enhancement Proposal" PEP 308. LC means list comprehension, GE means generator expression (and don't worry if those confuse you, they are none of the few complicated spots of Python).

My understanding of why python does not have ++ operator is following: When you write this in python a=b=c=1 you will get three variables (labels) pointing at same object (which value is 1). You can verify this by using id function which will return an object memory address:
In [19]: id(a)
Out[19]: 34019256
In [20]: id(b)
Out[20]: 34019256
In [21]: id(c)
Out[21]: 34019256
All three variables (labels) point to the same object. Now increment one of variable and see how it affects memory addresses:
In [22] a = a + 1
In [23]: id(a)
Out[23]: 34019232
In [24]: id(b)
Out[24]: 34019256
In [25]: id(c)
Out[25]: 34019256
You can see that variable a now points to another object as variables b and c. Because you've used a = a + 1 it is explicitly clear. In other words you assign completely another object to label a. Imagine that you can write a++ it would suggest that you did not assign to variable a new object but ratter increment the old one. All this stuff is IMHO for minimization of confusion. For better understanding see how python variables works:
In Python, why can a function modify some arguments as perceived by the caller, but not others?
Is Python call-by-value or call-by-reference? Neither.
Does Python pass by value, or by reference?
Is Python pass-by-reference or pass-by-value?
Python: How do I pass a variable by reference?
Understanding Python variables and Memory Management
Emulating pass-by-value behaviour in python
Python functions call by reference
Code Like a Pythonista: Idiomatic Python

It was just designed that way. Increment and decrement operators are just shortcuts for x = x + 1. Python has typically adopted a design strategy which reduces the number of alternative means of performing an operation. Augmented assignment is the closest thing to increment/decrement operators in Python, and they weren't even added until Python 2.0.

I'm very new to python but I suspect the reason is because of the emphasis between mutable and immutable objects within the language. Now, I know that x++ can easily be interpreted as x = x + 1, but it LOOKS like you're incrementing in-place an object which could be immutable.
Just my guess/feeling/hunch.

To complete already good answers on that page:
Let's suppose we decide to do this, prefix (++i) that would break the unary + and - operators.
Today, prefixing by ++ or -- does nothing, because it enables unary plus operator twice (does nothing) or unary minus twice (twice: cancels itself)
>>> i=12
>>> ++i
12
>>> --i
12
So that would potentially break that logic.
now if one needs it for list comprehensions or lambdas, from python 3.8 it's possible with the new := assignment operator (PEP572)
pre-incrementing a and assign it to b:
>>> a = 1
>>> b = (a:=a+1)
>>> b
2
>>> a
2
post-incrementing just needs to make up the premature add by subtracting 1:
>>> a = 1
>>> b = (a:=a+1)-1
>>> b
1
>>> a
2

I believe it stems from the Python creed that "explicit is better than implicit".

First, Python is only indirectly influenced by C; it is heavily influenced by ABC, which apparently does not have these operators, so it should not be any great surprise not to find them in Python either.
Secondly, as others have said, increment and decrement are supported by += and -= already.
Third, full support for a ++ and -- operator set usually includes supporting both the prefix and postfix versions of them. In C and C++, this can lead to all kinds of "lovely" constructs that seem (to me) to be against the spirit of simplicity and straight-forwardness that Python embraces.
For example, while the C statement while(*t++ = *s++); may seem simple and elegant to an experienced programmer, to someone learning it, it is anything but simple. Throw in a mixture of prefix and postfix increments and decrements, and even many pros will have to stop and think a bit.

The ++ class of operators are expressions with side effects. This is something generally not found in Python.
For the same reason an assignment is not an expression in Python, thus preventing the common if (a = f(...)) { /* using a here */ } idiom.
Lastly I suspect that there operator are not very consistent with Pythons reference semantics. Remember, Python does not have variables (or pointers) with the semantics known from C/C++.

as i understood it so you won't think the value in memory is changed.
in c when you do x++ the value of x in memory changes.
but in python all numbers are immutable hence the address that x pointed as still has x not x+1. when you write x++ you would think that x change what really happens is that x refrence is changed to a location in memory where x+1 is stored or recreate this location if doe's not exists.

Other answers have described why it's not needed for iterators, but sometimes it is useful when assigning to increase a variable in-line, you can achieve the same effect using tuples and multiple assignment:
b = ++a becomes:
a,b = (a+1,)*2
and b = a++ becomes:
a,b = a+1, a
Python 3.8 introduces the assignment := operator, allowing us to achievefoo(++a) with
foo(a:=a+1)
foo(a++) is still elusive though.

Maybe a better question would be to ask why do these operators exist in C. K&R calls increment and decrement operators 'unusual' (Section 2.8page 46). The Introduction calls them 'more concise and often more efficient'. I suspect that the fact that these operations always come up in pointer manipulation also has played a part in their introduction.
In Python it has been probably decided that it made no sense to try to optimise increments (in fact I just did a test in C, and it seems that the gcc-generated assembly uses addl instead of incl in both cases) and there is no pointer arithmetic; so it would have been just One More Way to Do It and we know Python loathes that.

This may be because #GlennMaynard is looking at the matter as in comparison with other languages, but in Python, you do things the python way. It's not a 'why' question. It's there and you can do things to the same effect with x+=. In The Zen of Python, it is given: "there should only be one way to solve a problem." Multiple choices are great in art (freedom of expression) but lousy in engineering.

I think this relates to the concepts of mutability and immutability of objects. 2,3,4,5 are immutable in python. Refer to the image below. 2 has fixed id until this python process.
x++ would essentially mean an in-place increment like C. In C, x++ performs in-place increments. So, x=3, and x++ would increment 3 in the memory to 4, unlike python where 3 would still exist in memory.
Thus in python, you don't need to recreate a value in memory. This may lead to performance optimizations.
This is a hunch based answer.

I know this is an old thread, but the most common use case for ++i is not covered, that being manually indexing sets when there are no provided indices. This situation is why python provides enumerate()
Example : In any given language, when you use a construct like foreach to iterate over a set - for the sake of the example we'll even say it's an unordered set and you need a unique index for everything to tell them apart, say
i = 0
stuff = {'a': 'b', 'c': 'd', 'e': 'f'}
uniquestuff = {}
for key, val in stuff.items() :
uniquestuff[key] = '{0}{1}'.format(val, i)
i += 1
In cases like this, python provides an enumerate method, e.g.
for i, (key, val) in enumerate(stuff.items()) :

In addition to the other excellent answers here, ++ and -- are also notorious for undefined behavior. For example, what happens in this code?
foo[bar] = bar++;
It's so innocent-looking, but it's wrong C (and C++), because you don't know whether the first bar will have been incremented or not. One compiler might do it one way, another might do it another way, and a third might make demons fly out of your nose. All would be perfectly conformant with the C and C++ standards.
(EDIT: C++17 has changed the behavior of the given code so that it is defined; it will be equivalent to foo[bar+1] = bar; ++bar; — which nonetheless might not be what the programmer is expecting.)
Undefined behavior is seen as a necessary evil in C and C++, but in Python, it's just evil, and avoided as much as possible.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.