I was experimenting with the behavior of Numba vs Numpy for array indexing, and I came across something which I do not quite understand; so I hoped someone can point me in the right direction for what is probably a very simple question. Below are two functions, both of which create an empty array using the np.arange command. I then "append" (experimenting with a variety of methods to see how both Numba and Numpy perform/break) to the array using an index of 0, example[0] = 1.
The Numba function with jit runs without error, but the Numpy example gives the error:
IndexError: index 0 is out of bounds for axis 0 with size 0
The Numpy error makes sense, but I am unsure as to why Numba with jit enabled allows the operation without error.
import numba as nb
import numpy as np
#nb.jit()
def funcnumba():
'''
Add item to position 0 using Numba
'''
example = np.arange(0)
example[0] = 1
return example
def funcnumpy():
'''
Add item to position 0 using Numpy. This produces an error which makes sense
'''
example = np.arange(0)
example[0] = 1
return example
print(funcnumba())
print(funcnumpy())
See the Numba documentation on arrays:
Currently there are no bounds checking for array indexing and slicing (...)
That means that you will be writing out of the bounds of the array in this case. Since it is just one element you may be lucky and get away with it, but you can also crash your program or, even worse, silently overwrite some other value. See issue #730 for a discussion about it.
Related
Previously, I asked a question about a relatively simple loop that Numba was failing to parallelize. A solution turned out to make all the loops explicit.
Now, I need to do a simpler version of the same task: I now have arrays alpha and beta respectively of shape (m,n) and (b,m,n), and I want to compute the computes the Frobenius product of 2D slices of the arguments and find the slice of beta which maximizes this product. Previously, there was an additional, large first dimension of alpha so it was over this dimension that I parallelized; now I want to parallelize over the first dimension of beta as the calculation becomes expensive when b>1000.
If I naively modify the code that worked for the previous problem, I obtain:
#njit(parallel=True)
def parallel_value_numba(alpha,beta):
dot = np.zeros(beta.shape[0])
for i in prange(beta.shape[0]):
for j in prange(beta.shape[1]):
for k in prange(beta.shape[2]):
dot[i] += alpha[j,k]*beta[i, j, k]
index=np.argmax(dot)
value=dot[index]
return value,index
But Numba doesn't like this for some reason and complains:
numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
scalar type memoryview(float64, 2d, C) given for non scalar argument #3
So instead, I tried
#njit(parallel=True)
def parallel_value_numba_2(alpha,beta):
product=np.multiply(alpha,beta)
dot1=np.sum(product,axis=2)
dot2=np.sum(dot1,axis=1)
index=np.argmax(dot2)
value=dot2[index]
return value,index
This compiles as long as you broadcast alpha to beta.shape before passing it to the function, and in principal Numba is capable of parallelizing the numpy operations. But it runs painfully slowly, much slower than the serial, pure Python code
def einsum_value(alpha,beta):
dot=np.einsum('kl,jkl->j',alpha,beta)
index=np.argmax(dot)
value=dot[index]
return value,index
So, my current working code uses this last implementation, but this function is still bottlenecking the runtime and I'd like to speed it up. Can anyone convince Numba to parallelize this function with an appreciable speedup?
This is not exactly an answer with a solution, but formatting comments is harder.
Numba generates different code depending on the arguments passed to the function. For example, your code works with the following example:
>>> alpha = np.random.random((5, 4))
>>> beta = np.random.random((3, 5, 4))
>>> parallel_value_numba(alpha, beta)
(5.89447648574048, 0)
In order to diagnose the problem, it's necessary to have an example of the specific argument values causing the problem.
Reading the error message, it seems you are passing a memoryview object, but Numba may not have full support for it.
As a side comment, you don't need to use prange in every loop. It's normally enough to use it in the outer loop, as long as the number of expected iterations is larger than the number of cores in your machine.
I'm having this problem where the numba implementation of np.diff does not work on a slice of a matrix. Is this a bug or am I doing something wrong?
import numpy as np
from numba import njit
v = np.ones((2,2))
np.diff(v[:,0])
array([0.])
#njit
def numbadiff(x):
return np.diff(x)
numbadiff(v[:,0])
This last call returns in an error, but I'm not sure why.
The problem is that np.diff in Numba does an internal reshape, which is only supported for contiguous arrays. The slice that you are making, v[:, 0], is not contiguous, hence the error. You can get it to work using np.ascontiguousarray, which returns a contiguous copy of the given array if it is not already contiguous:
numbadiff(np.ascontiguousarray(v[:, 0]))
Note you can also just avoid np.diff and redefine numbadiff as:
#njit
def numbadiff(x):
return x[1:] - x[:-1]
When you encounter an error, the polite thing is to show the error. Sometimes the full error with traceback is appropriate. For numba that may be too much, but you should try to post a summary. It makes it easier for us, especially if we aren't in a position to run your code and see the error ourselves. You might even learn something.
I ran your example and got (in part):
In [428]: numbadiff(np.ones((2,2))[:,0])
---------------------------------------------------------------------------
TypingError
...
TypeError: reshape() supports contiguous array only
...
def diff_impl(a, n=1):
<source elided>
# To make things easier, normalize input and output into 2d arrays
a2 = a.reshape((-1, size))
...
TypeError: reshape() supports contiguous array only
....
This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.
This supports the diagnosis and fix that #jdehesa provides. It's not a bug in numba; it's a problem with your input.
One disadvantage with using numba is that the errors are harder to understand. Another apparently is that it isn't quite as flexible about inputs such as this array view. If you seriously want the speed advantages, you need to be willing to dig into the error messages yourself.
I am currently looking into the use of Numba to speed up my python software. I am entirely new to the concept and currently trying to learn the absolute basics. What I am stuck on for now is:
I don't understand, what's the big benefit of the vectorize decorator.
The documentation explains, that the decorator is used to turn a normal python function into a Numpy ufunc. From what I understand, the benefit of a ufunc is, that it can take numpy arrays (instead of scalars) and provide features such as broadcasting.
But all examples I can find online, can be just as easily solved without this decorator.
Take for instance, this example from the numba documentation.
#vectorize([float64(float64, float64)])
def f(x, y):
return x + y
They claim, that now the function works like a numpy ufunc. But doesn't it anyways, even without the decorator? If I were to just run the following code:
def f(x,y):
return x+y
x = np.arange(10)
y = np.arange(10)
print(f(x,y))
That works just as fine. The function already takes arguments of type other than scalars.
What am I misunderstanding here?
Just read the docs few lines below :
You might ask yourself, “why would I go through this instead of compiling a simple iteration loop using the #jit decorator?”. The answer is that NumPy ufuncs automatically get other features such as reduction, accumulation or broadcasting.
For example f.reduce(arr) will sum all the elements off arr at C speed, what f cannot.
I'm writing a function in njit to speed up a very slow reservoir operations optimization code. The function is returning the maximum value for spill releases based on the reservoir level and gate availability. I am passing in a parameter size that specifies the number of flows to calculate (in some calls it's one and in some its many). I'm also passing in a numpy.zeros array that I can then fill with the function output. A simplified version of the function is written as follows:
import numpy as np
from numba import njit
#njit(cache=True)
def fncMaxFlow(elev, flag, size, MaxQ):
if (flag == 1): # SPOG2 running
if size==0:
if (elev>367.28):
return 861.1
else: return 0
else:
for i in range(size):
if((elev[i]>367.28) & (elev[i]<385)):
MaxQ[i]=861.1
return MaxQ
else:
if size==0: return 0
else: return MaxQ
fncMaxFlow(np.random.randint(368, 380, 3), 1, 3, np.zeros(3))
The error I'm getting:
Can't unify return type from the following types: array(float64, 1d, C), float64, int32
What is the reason for this? Is there any workaround or some step I'm missing so I can use numba to speed things up? This function and others like it are being called millions of times so they are a major factor in the computational efficiency. Any advice would help - I'm pretty new to python.
A variable within a numba function must have consistent type including the return variable. In your code you can either return MaxQ (an array), 861.1 (a float) or 0 (an int).
You need to refactor this code so that it always returns a consistent type regardless of code path.
Also note that in several places where you are comparing a numpy array to a scalar (elev > 367.28), what you are getting back is an array of boolean values, which is going to cause you issues. Your example function doesn't run as a pure python function (dropping the numba decorator) because of this.
This is a situation that happens quite often in my codes. Say I have a function do_sth(a,b), that, only for the sake of this example, simply calculates a+b, with a,b either 1D numpy arrays or scalars. In many occasions, I need the function to broadcast the operation, so that if both a,b are 1D arrays, the result will be a 2D array. An example of what I mean follows:
do_sth(1,2) -> 3
do_sth([1,2],0) -> array([1, 2])
do_sth(0,[3,4]) -> array([3, 4])
do_sth([1,2],[3,4]) -> array([[4, 5], [5, 6]])
This is a bit similar to how a numpy ufunc behaves. A possible implementation follows:
from numpy import newaxis, atleast_1d
def do_sth(a, b):
"a,b should be either 1d numpy arrays or scalars"
a, b = map(atleast_1d, [a, b])
# the line below mocks a more complicated calculation
res = a[:, newaxis] + b[newaxis]
conds = [a.size == 1, b.size == 1]
if all(conds):
return res[0, 0]
elif any(conds):
return res.ravel()
else:
return res
As you can see, there's quite a lot of boilerplate. The first question is: is this the right way to do this input/output casting? Is there any reason to not use a decorator to deal with a situation like this? Is there any guideline on the matter?
Moreover, the more complicated calculation, here mocked by the addition, often fails badly if a or b are numpy arrays with 2D,3D shape for example. I say badly in the sense that the point where the calculation fails is not obvious, or may change with time in different revisions of the code, and it is hard to see the connection between the error and the wrong input shape. I think it is then NOT advisable to put the complicated calculation in a try/except block (following python EAFP). In this case, is it correct to check the shape of the 2 arrays at the beginning of the function? Is there any alternative? Is there a numpy function that allows at the same time to convert the input to a numpy array, and also check that the input is compatible with a certain number of dimensions, something like asarray_withdim(arr,ndim=5)?
Regarding the use of decorators - I haven't seen much use of decorators in numpy code, but I think that's because most of the functionality was developed before decorators become common in Python. If you can make it work, there shouldn't be a any downside (but I'm not an expert with either decorators or ufunc).
Non complied numpy functions often have a lot of code that massages the inputs into convenient dimensions. Then they do the core action, followed by final reshaping and type wrapping. They might use functions like np.atleast_2d to ensure there are enough dimensions, and .reshape(-1,1,1) to compress excess dimensions.
np.tensordot is an example of one that performs axes transpose plus reshape on the inputs so it can apply the compiled np.dot. np.insert starts with a number of ndim and isinstance tests. Special cases are handled early, while the general one is left to the end. np.einsum is compiled, but there's a lot of preprocessing being done in C code, before it finally creates an nditer object and does the calculation.