Good day, I'm writing a Python module for some numeric work. Since there's a lot of stuff going on, I've been spending the last few days optimizing code to improve calculations times.
However, I have a question concerning Numba.
Basically, I have a class with some fields which are numpy arrays, which I initialize in the following way:
def init(self):
a = numpy.arange(0, self.max_i, 1)
self.vibr_energy = self.calculate_vibr_energy(a)
def calculate_vibr_energy(i):
return numpy.exp(-self.harmonic * i - self.anharmonic * (i ** 2))
So, the code is vectorized, and using Numba's JIT results in some improvement. However, sometimes I need to access the calculate_vibr_energy function from outside the class, and pass a single integer instead of an array in place of i.
As far as I understand, if I use Numba's JIT on the calculate_vibr_energy, it will have to always take an array as an argument.
So, which of the following options is better:
1) Create a new function calculate_vibr_energy_single(i), which will only take a single integer number, and use Numba on it too
2) Replace all usages of the function that are similar to this one:
myclass.calculate_vibr_energy(1)
with this:
tmp = np.array([1])
myclass.calculate_vibr_energy(tmp)[0]
Or are there other, more efficient (or at least, more Python-ic) ways of doing that?
I have only played a little with numba yet so I may be mistaken, but as far as I've understood it, using the "autojit" decorator should give functions that can take arguments of any type.
See e.g. http://numba.pydata.org/numba-doc/dev/pythonstuff.html
Related
Previously, I asked a question about a relatively simple loop that Numba was failing to parallelize. A solution turned out to make all the loops explicit.
Now, I need to do a simpler version of the same task: I now have arrays alpha and beta respectively of shape (m,n) and (b,m,n), and I want to compute the computes the Frobenius product of 2D slices of the arguments and find the slice of beta which maximizes this product. Previously, there was an additional, large first dimension of alpha so it was over this dimension that I parallelized; now I want to parallelize over the first dimension of beta as the calculation becomes expensive when b>1000.
If I naively modify the code that worked for the previous problem, I obtain:
#njit(parallel=True)
def parallel_value_numba(alpha,beta):
dot = np.zeros(beta.shape[0])
for i in prange(beta.shape[0]):
for j in prange(beta.shape[1]):
for k in prange(beta.shape[2]):
dot[i] += alpha[j,k]*beta[i, j, k]
index=np.argmax(dot)
value=dot[index]
return value,index
But Numba doesn't like this for some reason and complains:
numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
scalar type memoryview(float64, 2d, C) given for non scalar argument #3
So instead, I tried
#njit(parallel=True)
def parallel_value_numba_2(alpha,beta):
product=np.multiply(alpha,beta)
dot1=np.sum(product,axis=2)
dot2=np.sum(dot1,axis=1)
index=np.argmax(dot2)
value=dot2[index]
return value,index
This compiles as long as you broadcast alpha to beta.shape before passing it to the function, and in principal Numba is capable of parallelizing the numpy operations. But it runs painfully slowly, much slower than the serial, pure Python code
def einsum_value(alpha,beta):
dot=np.einsum('kl,jkl->j',alpha,beta)
index=np.argmax(dot)
value=dot[index]
return value,index
So, my current working code uses this last implementation, but this function is still bottlenecking the runtime and I'd like to speed it up. Can anyone convince Numba to parallelize this function with an appreciable speedup?
This is not exactly an answer with a solution, but formatting comments is harder.
Numba generates different code depending on the arguments passed to the function. For example, your code works with the following example:
>>> alpha = np.random.random((5, 4))
>>> beta = np.random.random((3, 5, 4))
>>> parallel_value_numba(alpha, beta)
(5.89447648574048, 0)
In order to diagnose the problem, it's necessary to have an example of the specific argument values causing the problem.
Reading the error message, it seems you are passing a memoryview object, but Numba may not have full support for it.
As a side comment, you don't need to use prange in every loop. It's normally enough to use it in the outer loop, as long as the number of expected iterations is larger than the number of cores in your machine.
I am currently looking into the use of Numba to speed up my python software. I am entirely new to the concept and currently trying to learn the absolute basics. What I am stuck on for now is:
I don't understand, what's the big benefit of the vectorize decorator.
The documentation explains, that the decorator is used to turn a normal python function into a Numpy ufunc. From what I understand, the benefit of a ufunc is, that it can take numpy arrays (instead of scalars) and provide features such as broadcasting.
But all examples I can find online, can be just as easily solved without this decorator.
Take for instance, this example from the numba documentation.
#vectorize([float64(float64, float64)])
def f(x, y):
return x + y
They claim, that now the function works like a numpy ufunc. But doesn't it anyways, even without the decorator? If I were to just run the following code:
def f(x,y):
return x+y
x = np.arange(10)
y = np.arange(10)
print(f(x,y))
That works just as fine. The function already takes arguments of type other than scalars.
What am I misunderstanding here?
Just read the docs few lines below :
You might ask yourself, “why would I go through this instead of compiling a simple iteration loop using the #jit decorator?”. The answer is that NumPy ufuncs automatically get other features such as reduction, accumulation or broadcasting.
For example f.reduce(arr) will sum all the elements off arr at C speed, what f cannot.
I need to use ctypes functions to reduce the running time of quad in python. Here is my original question original question, but now i know what path i need to follow. I need to follow the same steps as in here similar problem link.
However in my case the function that will be handled in the numerical integration is calling another python function. Like this:
from sklearn.neighbors import KernelDensity
import numpy as np
funcA = lambda x: np.exp(kde_bad.score_samples([[x]]))
quad(funcA, 0, cut_off)
where cut_off is just a scalar that I decide in my code, and kde_bad is the kernel object created using KernelDensity.
So my question is how do I need to specify the function in C? the equivalent of this:
//testlib.c
double f(int n, double args[n])
{
return args[0] - args[1] * args[2]; //corresponds to x0 - x1 * x2
}
Any input is appreciated!
You can do this using ctypes's callback function facilities.
That said, it's debatable whether or not you'll actually achieve any speed gains if your function calls something from Python. There are essentially two reasons that ctypes speeds up integration: (1) the integrand function itself is faster as compiled C than as Python bytecode, and (2) it avoids calling back to Python from the compiled (Fortran!) QUADPACK routines. What you're proposing completely eliminates the second of these sources of performance gains, and might even increase the penalty if you make such a call more than once. If, however, the large bulk of the execution time of your integrand is in its own code, rather than in these other Python functions that you need to call, then you might see some benefit.
As answered in the other question, quadpy is here to save the day with its vectorized computation capabilities.
I understand a bit about python function decorators. I think the answer to my question is no, but I want to make sure. With a decorator and a numpy array of x = np.array([1,2,3]) I can override x.sqrt() and change the behavior. Is there some way I can override np.sqrt(x) in Python?
Use case: working on the quantities package. Would like to be able to take square root of uncertain quantities without changing code base that currently uses np.sqrt().
Edit:
I'd like to modify np.sqrt in the quantities package so that the following code works (all three should print identical results, note the 0 uncertainty when using np.sqrt()). I hope to not require end-users to modify their code, but in the quantities package properly wrap/decorate np.sqrt(). Currently many numpy functions are decorated (see https://github.com/python-quantities/python-quantities/blob/ca87253a5529c0a6bee37a9f7d576f1b693c0ddd/quantities/quantity.py), but seem to only work when x.func() is called, not numpy.func(x).
import numpy as np
import quantities as pq
x = pq.UncertainQuantity(2, pq.m, 2)
print x.sqrt()
>>> 1.41421356237 m**0.5 +/- 0.707106781187 m**0.5 (1 sigma)
print x**0.5
>>> 1.41421356237 m**0.5 +/- 0.707106781187 m**0.5 (1 sigma)
print np.sqrt(x)
>>> 1.41421356237 m**0.5 +/- 0.0 dimensionless (1 sigma)
Monkeypatching
If I understand your situation correctly, your use case is not really about decoration (modifying a function you write, in a standard manner)
but rather about monkey patching:
Modifying a function somebody else wrote without actually changing that function's definition's source code.
The idiom for what you then need is something like
import numpy as np # provide local access to the numpy module object
original_np_sqrt = np.sqrt
def my_improved_np_sqrt(x):
# do whatever you please, including:
# - contemplating the UncertainQuantity-ness of x and
# - calling original_np_sqrt as needed
np.sqrt = my_improved_np_sqrt
Of course, this can change only the future meaning of numpy.sqrt,
not the past one.
So if anybody has imported numpy before the above and has already used numpy.sqrt in a way you would have liked to influence, you lose.
(And the name to which they map numpy does not matter.)
But after the above code was executed, the meaning of numpy.sqrt in all
modules (whether they imported numpy before it or after it)
will be that of my_improved_np_sqrt, whether the creators of those modules
like it or not (and of course unless some more monkeypatching of numpy.sqrt
is going on elsewhere).
Note that
When you do weird things, Python can become a weird platform!
When you do weird things, Python can become a weird platform!
When you do weird things, Python can become a weird platform!
This is why monkey patching is not normally considered good design style.
So if you take that route, make sure you announce it very prominently
in all relevant documentation.
Oh, and if you do not want to modify other code than that which is
directly or indirectly executed from your own methods, you could
introduce a decorator that performs monkeypatching before the call
and un-monkeypatching (reassigning original_np_sqrt)
after the call and apply that decorator to
all your functions in question.
Make sure you handle exceptions in that decorator then, so that
the un-monkeypatching is really executed in all cases.
Maybe, as BrenBarn stated,
np.sqrt = decorator(np.sqrt)
because a decorator is just a callable that takes an object and returns a modified object.
First, let me show you the codez:
a = array([...])
for n in range(10000):
func_curry = functools.partial(func, y=n)
result = array(map(func_curry, a))
do_something_else(result)
...
What I'm doing here is trying to apply func to an array, changing every time the value of the func's second parameter. This is SLOOOOW (creating a new function every iteration surely does not help), and I also feel I missed the pythonic way of doing it. Any suggestion?
Could a solution that gives me a 2D array be a good idea? I don't know, but maybe it is.
Answers to possible questions:
Yes, this is (using a broad definition), an optimization problem (do_something_else() hides this)
No, scipy.optimize hasn't worked because I'm dealing with boolean values and it never seems to converge.
Did you try numpy.vectorize?
...
vfunc_curry = vectorize(functools.partial(func, y=n))
result = vfunc_curry(a)
...
If a is of significant size the bottleneck should not be the creation of the function, but the duplication of the array.
Can you rewrite the function? If possible, you should write the function to take two numpy arrays a and numpy.arange(n). You may need to reshape to get the arrays to line up for broadcasting.