Does numba np.diff have a bug? - python

I'm having this problem where the numba implementation of np.diff does not work on a slice of a matrix. Is this a bug or am I doing something wrong?
import numpy as np
from numba import njit
v = np.ones((2,2))
np.diff(v[:,0])
array([0.])
#njit
def numbadiff(x):
return np.diff(x)
numbadiff(v[:,0])
This last call returns in an error, but I'm not sure why.

The problem is that np.diff in Numba does an internal reshape, which is only supported for contiguous arrays. The slice that you are making, v[:, 0], is not contiguous, hence the error. You can get it to work using np.ascontiguousarray, which returns a contiguous copy of the given array if it is not already contiguous:
numbadiff(np.ascontiguousarray(v[:, 0]))
Note you can also just avoid np.diff and redefine numbadiff as:
#njit
def numbadiff(x):
return x[1:] - x[:-1]

When you encounter an error, the polite thing is to show the error. Sometimes the full error with traceback is appropriate. For numba that may be too much, but you should try to post a summary. It makes it easier for us, especially if we aren't in a position to run your code and see the error ourselves. You might even learn something.
I ran your example and got (in part):
In [428]: numbadiff(np.ones((2,2))[:,0])
---------------------------------------------------------------------------
TypingError
...
TypeError: reshape() supports contiguous array only
...
def diff_impl(a, n=1):
<source elided>
# To make things easier, normalize input and output into 2d arrays
a2 = a.reshape((-1, size))
...
TypeError: reshape() supports contiguous array only
....
This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.
This supports the diagnosis and fix that #jdehesa provides. It's not a bug in numba; it's a problem with your input.
One disadvantage with using numba is that the errors are harder to understand. Another apparently is that it isn't quite as flexible about inputs such as this array view. If you seriously want the speed advantages, you need to be willing to dig into the error messages yourself.

Related

Parallelizing a numba loop

Previously, I asked a question about a relatively simple loop that Numba was failing to parallelize. A solution turned out to make all the loops explicit.
Now, I need to do a simpler version of the same task: I now have arrays alpha and beta respectively of shape (m,n) and (b,m,n), and I want to compute the computes the Frobenius product of 2D slices of the arguments and find the slice of beta which maximizes this product. Previously, there was an additional, large first dimension of alpha so it was over this dimension that I parallelized; now I want to parallelize over the first dimension of beta as the calculation becomes expensive when b>1000.
If I naively modify the code that worked for the previous problem, I obtain:
#njit(parallel=True)
def parallel_value_numba(alpha,beta):
dot = np.zeros(beta.shape[0])
for i in prange(beta.shape[0]):
for j in prange(beta.shape[1]):
for k in prange(beta.shape[2]):
dot[i] += alpha[j,k]*beta[i, j, k]
index=np.argmax(dot)
value=dot[index]
return value,index
But Numba doesn't like this for some reason and complains:
numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
scalar type memoryview(float64, 2d, C) given for non scalar argument #3
So instead, I tried
#njit(parallel=True)
def parallel_value_numba_2(alpha,beta):
product=np.multiply(alpha,beta)
dot1=np.sum(product,axis=2)
dot2=np.sum(dot1,axis=1)
index=np.argmax(dot2)
value=dot2[index]
return value,index
This compiles as long as you broadcast alpha to beta.shape before passing it to the function, and in principal Numba is capable of parallelizing the numpy operations. But it runs painfully slowly, much slower than the serial, pure Python code
def einsum_value(alpha,beta):
dot=np.einsum('kl,jkl->j',alpha,beta)
index=np.argmax(dot)
value=dot[index]
return value,index
So, my current working code uses this last implementation, but this function is still bottlenecking the runtime and I'd like to speed it up. Can anyone convince Numba to parallelize this function with an appreciable speedup?
This is not exactly an answer with a solution, but formatting comments is harder.
Numba generates different code depending on the arguments passed to the function. For example, your code works with the following example:
>>> alpha = np.random.random((5, 4))
>>> beta = np.random.random((3, 5, 4))
>>> parallel_value_numba(alpha, beta)
(5.89447648574048, 0)
In order to diagnose the problem, it's necessary to have an example of the specific argument values causing the problem.
Reading the error message, it seems you are passing a memoryview object, but Numba may not have full support for it.
As a side comment, you don't need to use prange in every loop. It's normally enough to use it in the outer loop, as long as the number of expected iterations is larger than the number of cores in your machine.

Very large in-place numpy array operations : numba, pythran or other?

tI need to perform operations on very large arrays (several millions entries), with the cumulated size of these arrays close to the available memory.
I'm understanding that when doing naive operation using numpy like a=a*3+b-c**2, several temporary arrays are created and thus occupy more memory.
As I'm planning to work at the limit of the memory occupancy, I'm afraid this simple approach won't work. So I'd like to start my developments with the right approach.
I know that packages like numba or pythran can help with improving performance when manipulating arrays, but it is not clear to me if they can deal automatically or not with in-place operations, avoiding temporary objects... ?
As a simple example here's one function I'll have to use on large arrays :
def find_bins(a, indices):
global offset, width, nstep
i = (a-offset) *nstep/ width
i = np.where(i<0,0,i)
i = np.where(i>=nstep,nstep, i)
indices[:] = i.astype(int)
So something that mixes arithmetic operations and calls to numpy functions.
How easy would it be to write such functions using numba or pythran (or something else?) ?
What would be the pros and cons in each case ?
Thanks for any hint !
ps: I know about numexpr, but I'm not sure it is convenient or well adapted to functions more complex than a single arithmetic expression ?
using numexpr. For example:
import numexpr
numexpr.evaluate("a+b*c", out=a)
this could help you to avoid the tmp variables and you could refer to High Performance Python, M.G, I.O.
Pythran avoids many temporary arrays by design. For the simple expression you're pointing at, that would be
#pythran export find_bins(float[], int[], float, float, int)
import numpy as np
def find_bins(a, indices, offset, width, nstep):
i = (a-offset) *nstep/ width #
i = np.where(i<0,0,i)
i = np.where(i>=nstep,nstep, i)
indices[:] = i.astype(int)
This both avoids temporary and speeds-up computation.
Not that you should use np.clip function here, it's supported by Pythran as well.

Indexing empty array, Numba vs. Numpy

I was experimenting with the behavior of Numba vs Numpy for array indexing, and I came across something which I do not quite understand; so I hoped someone can point me in the right direction for what is probably a very simple question. Below are two functions, both of which create an empty array using the np.arange command. I then "append" (experimenting with a variety of methods to see how both Numba and Numpy perform/break) to the array using an index of 0, example[0] = 1.
The Numba function with jit runs without error, but the Numpy example gives the error:
IndexError: index 0 is out of bounds for axis 0 with size 0
The Numpy error makes sense, but I am unsure as to why Numba with jit enabled allows the operation without error.
import numba as nb
import numpy as np
#nb.jit()
def funcnumba():
'''
Add item to position 0 using Numba
'''
example = np.arange(0)
example[0] = 1
return example
def funcnumpy():
'''
Add item to position 0 using Numpy. This produces an error which makes sense
'''
example = np.arange(0)
example[0] = 1
return example
print(funcnumba())
print(funcnumpy())
See the Numba documentation on arrays:
Currently there are no bounds checking for array indexing and slicing (...)
That means that you will be writing out of the bounds of the array in this case. Since it is just one element you may be lucky and get away with it, but you can also crash your program or, even worse, silently overwrite some other value. See issue #730 for a discussion about it.

I have issues understanding the use of Numbas vectorize decorator in Python

I am currently looking into the use of Numba to speed up my python software. I am entirely new to the concept and currently trying to learn the absolute basics. What I am stuck on for now is:
I don't understand, what's the big benefit of the vectorize decorator.
The documentation explains, that the decorator is used to turn a normal python function into a Numpy ufunc. From what I understand, the benefit of a ufunc is, that it can take numpy arrays (instead of scalars) and provide features such as broadcasting.
But all examples I can find online, can be just as easily solved without this decorator.
Take for instance, this example from the numba documentation.
#vectorize([float64(float64, float64)])
def f(x, y):
return x + y
They claim, that now the function works like a numpy ufunc. But doesn't it anyways, even without the decorator? If I were to just run the following code:
def f(x,y):
return x+y
x = np.arange(10)
y = np.arange(10)
print(f(x,y))
That works just as fine. The function already takes arguments of type other than scalars.
What am I misunderstanding here?
Just read the docs few lines below :
You might ask yourself, “why would I go through this instead of compiling a simple iteration loop using the #jit decorator?”. The answer is that NumPy ufuncs automatically get other features such as reduction, accumulation or broadcasting.
For example f.reduce(arr) will sum all the elements off arr at C speed, what f cannot.

input/output validation/casting of a numpy calculation

This is a situation that happens quite often in my codes. Say I have a function do_sth(a,b), that, only for the sake of this example, simply calculates a+b, with a,b either 1D numpy arrays or scalars. In many occasions, I need the function to broadcast the operation, so that if both a,b are 1D arrays, the result will be a 2D array. An example of what I mean follows:
do_sth(1,2) -> 3
do_sth([1,2],0) -> array([1, 2])
do_sth(0,[3,4]) -> array([3, 4])
do_sth([1,2],[3,4]) -> array([[4, 5], [5, 6]])
This is a bit similar to how a numpy ufunc behaves. A possible implementation follows:
from numpy import newaxis, atleast_1d
def do_sth(a, b):
"a,b should be either 1d numpy arrays or scalars"
a, b = map(atleast_1d, [a, b])
# the line below mocks a more complicated calculation
res = a[:, newaxis] + b[newaxis]
conds = [a.size == 1, b.size == 1]
if all(conds):
return res[0, 0]
elif any(conds):
return res.ravel()
else:
return res
As you can see, there's quite a lot of boilerplate. The first question is: is this the right way to do this input/output casting? Is there any reason to not use a decorator to deal with a situation like this? Is there any guideline on the matter?
Moreover, the more complicated calculation, here mocked by the addition, often fails badly if a or b are numpy arrays with 2D,3D shape for example. I say badly in the sense that the point where the calculation fails is not obvious, or may change with time in different revisions of the code, and it is hard to see the connection between the error and the wrong input shape. I think it is then NOT advisable to put the complicated calculation in a try/except block (following python EAFP). In this case, is it correct to check the shape of the 2 arrays at the beginning of the function? Is there any alternative? Is there a numpy function that allows at the same time to convert the input to a numpy array, and also check that the input is compatible with a certain number of dimensions, something like asarray_withdim(arr,ndim=5)?
Regarding the use of decorators - I haven't seen much use of decorators in numpy code, but I think that's because most of the functionality was developed before decorators become common in Python. If you can make it work, there shouldn't be a any downside (but I'm not an expert with either decorators or ufunc).
Non complied numpy functions often have a lot of code that massages the inputs into convenient dimensions. Then they do the core action, followed by final reshaping and type wrapping. They might use functions like np.atleast_2d to ensure there are enough dimensions, and .reshape(-1,1,1) to compress excess dimensions.
np.tensordot is an example of one that performs axes transpose plus reshape on the inputs so it can apply the compiled np.dot. np.insert starts with a number of ndim and isinstance tests. Special cases are handled early, while the general one is left to the end. np.einsum is compiled, but there's a lot of preprocessing being done in C code, before it finally creates an nditer object and does the calculation.

Categories

Resources