I'm doing some scientific computing in Python with a lot of geometric calculations, and I ran across a significant difference between using numpy versus the standard math library.
>>> x = timeit.Timer('v = np.arccos(a)', 'import numpy as np; a = 0.6')
>>> x.timeit(100000)
0.15387153439223766
>>> y = timeit.Timer('v = math.acos(a)', 'import math; a = 0.6')
>>> y.timeit(100000)
0.012333301827311516
That's more than a 10x speedup! I'm using numpy for almost all standard math functions, and I just assumed it was optimized and at least as fast as math. For long enough vectors, numpy.arccos() will eventually win vs. looping with math.acos(), but since I only use the scalar case, is there any downside to using math.acos(), math.asin(), math.atan() across the board, instead of the numpy versions?
Using the functions from the math module for scalars is perfectly fine. The numpy.arccos function is likely to be slower due to
conversion to an array (and a C data type)
C function call overhead
conversion of the result back to a python type
If this difference in performance is important for your problem, you should check if you really can't use array operations. As user2357112 said in the comments, arrays are what numpy really is great at.
Related
I've just started using numexpr's evaluate function and I've come across an annoying error.
I want it to print, per se, sin(10), and it does that just perfectly, but if I do sec(10), I get "TypeError: 'VariableNode' object is not callable"
Example code:
import mpmath as mp
from numexpr import evaluate as ne
cos = mp.cos
sin = mp.sin
csc = mp.csc
sec = mp.sec
print(ne('cos(50)'))
>>> 0.9649660284921133
print(ne('sin(50)')
>>> -0.26237485370392877
print(ne('csc(50)')
>>> TypeError: 'VariableNode' object is not callable
print(ne('sec(50)')
>>> TypeError: 'VariableNode' object is not callable
When I use eval, it instead returns the correct values, like it should.
Why does this occur? Is it because numexpr is an expansion of numpy and automatically sources its functions from numpy (numpy doesn't have sec, csc, cot) and thus, cannot source functions from mpmath?
Many thanks in advance! :)
Taking a look at the documentation, there is a paragraph about supported functions and it appears that csc and sec are not supported.
However, the division operator and sin and cos are supported, so substituting csc(50) for 1 / sin(50) and sec(50) for 1 / cos(50) works.
There would be little point in implementing these auxiliary and somewhat rare functions as they can be easily substituted with supported operators and functions.
Typically, numexpr is used to improve performance when working on large arrays.
tI need to perform operations on very large arrays (several millions entries), with the cumulated size of these arrays close to the available memory.
I'm understanding that when doing naive operation using numpy like a=a*3+b-c**2, several temporary arrays are created and thus occupy more memory.
As I'm planning to work at the limit of the memory occupancy, I'm afraid this simple approach won't work. So I'd like to start my developments with the right approach.
I know that packages like numba or pythran can help with improving performance when manipulating arrays, but it is not clear to me if they can deal automatically or not with in-place operations, avoiding temporary objects... ?
As a simple example here's one function I'll have to use on large arrays :
def find_bins(a, indices):
global offset, width, nstep
i = (a-offset) *nstep/ width
i = np.where(i<0,0,i)
i = np.where(i>=nstep,nstep, i)
indices[:] = i.astype(int)
So something that mixes arithmetic operations and calls to numpy functions.
How easy would it be to write such functions using numba or pythran (or something else?) ?
What would be the pros and cons in each case ?
Thanks for any hint !
ps: I know about numexpr, but I'm not sure it is convenient or well adapted to functions more complex than a single arithmetic expression ?
using numexpr. For example:
import numexpr
numexpr.evaluate("a+b*c", out=a)
this could help you to avoid the tmp variables and you could refer to High Performance Python, M.G, I.O.
Pythran avoids many temporary arrays by design. For the simple expression you're pointing at, that would be
#pythran export find_bins(float[], int[], float, float, int)
import numpy as np
def find_bins(a, indices, offset, width, nstep):
i = (a-offset) *nstep/ width #
i = np.where(i<0,0,i)
i = np.where(i>=nstep,nstep, i)
indices[:] = i.astype(int)
This both avoids temporary and speeds-up computation.
Not that you should use np.clip function here, it's supported by Pythran as well.
I am currently looking into the use of Numba to speed up my python software. I am entirely new to the concept and currently trying to learn the absolute basics. What I am stuck on for now is:
I don't understand, what's the big benefit of the vectorize decorator.
The documentation explains, that the decorator is used to turn a normal python function into a Numpy ufunc. From what I understand, the benefit of a ufunc is, that it can take numpy arrays (instead of scalars) and provide features such as broadcasting.
But all examples I can find online, can be just as easily solved without this decorator.
Take for instance, this example from the numba documentation.
#vectorize([float64(float64, float64)])
def f(x, y):
return x + y
They claim, that now the function works like a numpy ufunc. But doesn't it anyways, even without the decorator? If I were to just run the following code:
def f(x,y):
return x+y
x = np.arange(10)
y = np.arange(10)
print(f(x,y))
That works just as fine. The function already takes arguments of type other than scalars.
What am I misunderstanding here?
Just read the docs few lines below :
You might ask yourself, “why would I go through this instead of compiling a simple iteration loop using the #jit decorator?”. The answer is that NumPy ufuncs automatically get other features such as reduction, accumulation or broadcasting.
For example f.reduce(arr) will sum all the elements off arr at C speed, what f cannot.
the scipy.special.gammainc can not take negative values for the first argument. Are there any other implementations that could in python? I can do a manual integration for sure but I'd like to know if there are good alternatives that already exist.
Correct result: 1 - Gamma[-1,1] = 0.85
Use Scipy: scipy.special.gammainc(-1, 1) = 0
Thanks.
I typically reach for mpmath whenever I need special functions and I'm not too concerned about performance. (Although its performance in many cases is pretty good anyway.)
For example:
>>> import mpmath
>>> mpmath.gammainc(-1,1)
mpf('0.14849550677592205')
>>> 1-mpmath.gammainc(-1,1)
mpf('0.85150449322407795')
>>> mpmath.mp.dps = 50 # arbitrary precision!
>>> 1-mpmath.gammainc(-1,1)
mpf('0.85150449322407795208164000529866078158523616237514084')
I just had the same issue and ended up using the recurrence relations for the function when a<0.
http://en.wikipedia.org/wiki/Incomplete_gamma_function#Properties
Note also that the scipy functions gammainc and gammaincc give the regularized forms Gamma(a,x)/Gamma(a)
Still an issue in 2021, and they still haven't improved this in scipy. Especially it is frustrating that scipy does not even provide unregularised versions of the upper and lower incomplete Gamma functions. I also ended up using mpmath, which uses its own data type (here mpf for mpmath floating - which supports arbitrary precision). In order to cook up something quick for the upper and lower incomplete Gamma function that works with numpy arrays, and that behaves like one would expect from evaluating those integrals I came up with the following:
import numpy as np
from mpmath import gammainc
"""
In both functinos below a is a float and z is a numpy.array.
"""
def gammainc_up(a,z):
return np.asarray([gammainc(a, zi, regularized=False)
for zi in z]).astype(float)
def gammainc_low(a,z):
return np.asarray([gamainc(a, 0, zi, regularized=False)
for zi in z]).astype(float)
Note again, this is for the un-regularised functions (Eq. 8.2.1 and 8.2.2 in the DLMF), the regularised functions (Eq. 8.2.3 and 8.2.4) can be obtined in mpmath by setting the keyword regularized=True.
I have to iterate all items in two-dimensional array of integers and change the value (according to some rule, not important).
I'm surprised how significant difference in performance is there between python runtime and C# or java runtime. Did I wrote totally wrong python code (v2.7.2)?
import numpy
a = numpy.ndarray((5000,5000), dtype = numpy.int32)
for x in numpy.nditer(a.T):
x = 123
>python -m timeit -n 2 -r 2 -s "import numpy; a = numpy.ndarray((5000,5000), dtype=numpy.int32)" "for x in numpy.nditer(a.T):" " x = 123"
2 loops, best of 2: 4.34 sec per loop
For example the C# code performs only 50ms, i.e. python is almost 100 times slower! (suppose the matrix variable is already initialized)
for (y = 0; y < 5000; y++)
for (x = 0; x < 5000; x++)
matrix[y][x] = 123;
Yep! Iterating through numpy arrays in python is slow. (Slower than iterating through a python list, as well.)
Typically, you avoid iterating through them directly.
If you can give us an example of the rule you're changing things based on, there's a good chance that it's easy to vectorize.
As a toy example:
import numpy as np
x = np.linspace(0, 8*np.pi, 100)
y = np.cos(x)
x[y > 0] = 100
However, in many cases you have to iterate, either due to the algorithm (e.g. finite difference methods) or to lessen the memory cost of temporary arrays.
In that case, have a look at Cython, Weave, or something similar.
The example you gave was presumably meant to set all items of a two-dimensional NumPy array to 123. This can be done efficiently like this:
a.fill(123)
or
a[:] = 123
Python is a much more dynamic language than C or C#. The main reason why the loop is so slow is that on every pass, the CPython interpreter is doing some extra work that wastes time: specifically, it is binding the name x with the next object from the iterator, then when it evaluates the assignment it has to look up the name x again.
As #Sven Marnach noted, you can call a method function numpy.fill() and it is fast. That function is compiled C or maybe Fortran, and it will simply loop over the addresses of the numpy.array data structure and fill in the values. Much less dynamic than Python, which is good for this simple case.
But now consider PyPy. Once you run your program under PyPy, a JIT analyzes what your code is actually doing. In this example, it notes that the name x isn't used for anything but the assignment, and it can optimize away binding the name. This example should be one that PyPy speeds up tremendously; likely PyPy will be ten times faster than plain Python (so only one-tenth as fast as C, rather than 1/100 as fast).
http://pypy.org
As I understand it, PyPy won't be working with Numpy for a while yet, so you can't just run your existing Numpy code under PyPy yet. But the day is coming.
I'm excited about PyPy. It offers the hope that we can write in a very high-level language (Python) and yet get nearly the performance of writing things in "portable assembly language" (C). For examples like this one, the Numpy might even beat the performance of naive C code, by using SIMD instructions from the CPU (SSE2, NEON, or whatever). For this example, with SIMD, you could set four integers to 123 with each loop, and that would be faster than a plain C loop. (Unless the C compiler used a SIMD optimization also! Which, come to think of it, is likely for this case. So we are back to "nearly the speed of C" rather than faster for this example. But we can imagine trickier cases that the C compiler isn't smart enough to optimize, where a future PyPy might.)
But never mind PyPy for now. If you will be working with Numpy, it is a good idea to learn all the functions like numpy.fill() that are there to speed up your code.
C++ emphasizes machine time over programmer time.
Python emphasizes programmer time over machine time.
Pypy is a python written in python, and they have the beginnings of numpy; you might try that. Pypy has a nice JIT that makes things quite fast.
You could also try cython, which allows you to translate a dialect of Python to C, and compile the C to a Python C extension module; this allows one to continue using CPython for most of your code, while still getting a bit of a speedup. However, in the one microbenchmark I've tried comparing Pypy and Cython, Pypy was quite a bit faster than Cython.
Cython uses a highly pythonish syntax, but it allows you to pretty freely intermix Python datatypes with C datatypes. If you redo your hotspots with C datatypes, it should be pretty fast. Continuing to use Python datatypes is sped up by Cython too, but not as much.
The nditer code does not assign a value to the elements of a. This doesn't affect the timings issue, but I mention it because it should not be taken as a good use of nditer.
a correct version is:
for i in np.nditer(a, op_flags=[["readwrite"]]):
i[...] = 123
The [...] is needed to retain the reference to loop value, which is an array of shape ().
There's no point in using A.T, since its the values of the base A that get changed.
I agree that the proper way of doing this assignment is a[:]=123.
If you need to do operations on a multidimensional array that depend on the value of the array but don't depend on the position inside the array then .itemset is 5 times faster than nditer for me.
So instead of doing something like
image = np.random.random_sample((200, 200,3));
with np.nditer(image, op_flags=['readwrite']) as it:
for x in it:
x[...] = x*4.5 if x<0.2 else x
You can do this
image2 = np.random.random_sample((200, 200,3));
for i in range(0,image2.size):
x = image2.item(i)
image2.itemset(i, x*4.5 if x<0.2 else x);