int64 vs array(int64, 0d, C) in numba - python

I define a jited function returning a tuple using numba. It's something like below.
import numba as nb
from numba.types import Tuple
import numpy as np
FOO_T = Tuple.from_types((nb.types.NPDatetime('M'), nb.int64))
#nb.jit([FOO_T(nb.types.NPDatetime('M'), nb.types.NPDatetime('M'))], nopython=True)
def foo(input1, input2):
temp1 = input1
temp2 = np.array(input1 - input2).view(nb.int64)
output = (temp1, temp2)
return output
A TypeError is reported as below. The second element of output tuple is defined as int64. However, it's actually compiled as array(int64, 0d, C).
TypingError: No conversion from Tuple(datetime64[M], array(int64, 0d, C)) to Tuple(datetime64[M], int64) for '$38return_value.15', defined at None
Have no idea how to make them consistent. Thanks for your help.

np.array(input1 - input2).view(nb.int64) is an array of int64 and not a scalar. This is why Numba report an error. Note that np.array(input1 - input2) results in a weird type: an array of dimension 0. AFAIK, this is what Numpy use to represent scalars but such an array cannot be indexed in Numba nor converted to a scalar.
You could subtract two scalar and build an array with np.array([input1 - input2]) and then call view. That being said, view is probably not what you want to do here as it reinterpret the binary representation of a NPDatetime as an integer. This is really unsafe and AFAIK there is no reason to assume that this can work. You can just make the difference and cast the result with (np.uint64)(input1 - input2).

Related

How do I specify a tuple in a Numba Vectorize signature?

I'm defining a function and want to use Numba Vectorize to speed it up, with cuda. I'm having trouble with the function signature. The function will return a float64 value. I want to pass two float64 values, which will be vectorized, and in addition a 9-tuple of float64 values, which will be scalars.
Here is my function header:
from numba import vectorize
#vectorize(['float64(float64, float64, UniTuple(float64, 9))'], target='cuda')
def fn_vec(E, L, fparams):
# calculations...
return result
but this gives an error:
TypeError: data type "(float64 x 9)" not understood
I've tried many variations, including (float64, ..., float64) in place of the UniTuple(), but can't get anything to work. How do I do this?
How do I specify a tuple in a Numba Vectorize signature?
In a numba.vectorize function you cannot use a tuple. That's because vectorize vectorizes the code for arrays of these types.
So using a float, float, tuple signature creates a function that expects two arrays containing floats and one array containing tuples. The problem is that there is no dtype for an array containing tuples - it could work if you use a structured array instead of an array containing tuples but I haven't tried that.
How do I specify a tuple in a Numba jit signature?
The correct way to specify a UniTuple in a numba signature is with numba.types.containers.UniTuple. In your case:
nb.types.containers.UniTuple(nb.types.float64, 9)
So the correct signature would be somthing like this:
import numba as nb
#nb.njit(
nb.types.float64(
nb.types.float64,
nb.types.float64,
nb.types.containers.UniTuple(nb.types.float64, 9)))
def func(f1, f2, ftuple):
# ...
return f1
I often avoid typing my numba functions explicitly - but when I do I found it very useful to use numba.typeof, for example:
>>> nb.typeof((1.0, ) * 9)
tuple(float64 x 9)
>>> type(nb.typeof((1.0, ) * 9))
numba.types.containers.UniTuple
>>> help(type(nb.typeof((1.0, ) * 9))) # I shortened the result:
Help on class UniTuple in module numba.types.containers:
class UniTuple(BaseAnonymousTuple, _HomogeneousTuple, numba.types.abstract.Sequence)
| UniTuple(*args, **kwargs)
|
| Type class for homogeneous tuples.
|
| Methods defined here:
|
| __init__(self, dtype, count)
| Initialize self. See help(type(self)) for accurate signature.
So the information is all there: It's numba.types.containes.UniTuple and you instantiate it with two arguments, the dtype (here float64) and the number (in this case 9).
In case you wanted to vectorize over the float arrays only
If you did not want to vectorize the function for the tuple argument you could simply create the vectorized function inside another function and call it there:
import numba as nb
import numpy as np
def func(E, L, fparams):
#nb.vectorize(['float64(float64, float64)'])
def fn_vec(e, l):
return e + l + fparams[1] # just to illustrate that the tuple is available
return fn_vec(E, L)
This makes the tuple available inside the vectorized function. However it has to create the inner function and compile it everytime you call the outer function, so this may be actually slower. I'm also not sure that this will work with the target="cuda", you may need to test that yourself.

Cython prange with an array of string

I'm trying to use prange in order to process multiple strings.
As it is not possible to do this with a python list, I'm using a numpy array.
With an array of floats, this function works :
from cython.parallel import prange
cimport numpy as np
from numpy cimport ndarray as ar
cpdef func_float(ar[np.float64_t,cast=True] x, double alpha):
cdef int i
for i in prange(x.shape[0], nogil=True):
x[i] = alpha * x[i]
return x
When I try this simple one :
cpdef func_string(ar[np.str,cast=True] x):
cdef int i
for i in prange(x.shape[0], nogil=True):
x[i] = x[i] + str(i)
return x
I'm getting this
>> func_string(x = np.array(["apple","pear"],dtype=np.str))
File "processing.pyx", line 8, in processing.func_string
cpdef func_string(ar[np.str,cast=True] x):
ValueError: Item size of buffer (20 bytes) does not match size of 'str object' (8 bytes)
I'm probably missing something and I can't find an alternative to str.
Is there a way to properly use prange with an array of string ?
Beside the fact, that your code should fail when cythonized, because you try to create a Python-object (i.e. str(i)) without gil, your code isn't doing what you think it should do.
In order to analyse what is going on, let's take a look at a much simple cython-version:
%%cython -2
cimport numpy as np
from numpy cimport ndarray as ar
cpdef func_string(ar[np.str, cast=True] x):
print(len(x))
From your error message, one can deduct that you use Python 3 and the Cython-extension is built with (still default) language_level=2, thus I'm using -2 in the %%cython-magic cell.
And now:
>>> x = np.array(["apple", "pear"], dtype=np.str)
>>> func_string(x)
ValueError: Item size of buffer (20 bytes) does not match size of 'str object' (8 bytes)
What is going on?
x is not what you think it is
First, let's take a look at x:
>>> x.dtype
<U5
So x isn't a collection of unicode-objects. An element of x consist of 5 unicode-characters and those elements are stored contiguously in memory, one after another. What is important: The same information as in unicode-objects stored in a different memory layout.
This is one of numpy's quirks and how np.array works: every element in the list is converted to an unicode-object, than the maximal size of the element is calculated and dtype (in this case <U5) is calculated and used.
np.str is interpreted differently in cython code (ar[np.str] x) (twice!)
First difference: in your Python3-code np.str is for unicode, but in your cython code, which is cythonized with language_level=2, np.str is for bytes (see doc).
Second difference: seeing np.str, Cython will interpret it as array with Python-objects (maybe it should be seen as a Cython-bug) - it is almost the same as if dtype were np.object - actually the only difference to np.object are slightly different error messages.
With this information we can understand the error message. During the runtime, the input-array is checked (before the first line of the function is executed!):
expected is an array with python-objects, i.e. 8-byte pointers, i.e. array with element size of 8bytes
received is an array with element size 5*4=20 bytes (one unicode-character is 4 bytes)
thus the cast cannot be done and the observed exception is thrown.
you cannot change the size of an element in an <U..-numpy-array:
Now let's take a look at the following:
>>> x = np.array(["apple", b"pear"], dtype=np.str)
>>> x[0] = x[0]+str(0)
>>> x[0]
'apple'
the element didn't change, because the string x[0]+str(0) was truncated while written back to x-array: there is only place for 5 characters! It would work (to some degree, as long as resulting string has no more than 5 characters) with "pear" though:
>>> x[1] = x[1]+str(1)
>>> x[1]
'pear0'
Where does this all leave you?
you probably want to use bytes and not unicodes (i.e. dtype=np.bytes_)
given you don't know the element size of your numpy-array at the compile type, you should declare the input-array x as ar x in the signature and roll out the runtime checks, similar as done in the Cython's "depricated" numpy-tutorial.
if changes should be done in-place, the elements in the input-array should be big enough for the resulting strings.
All of the above, has nothing to do with prange. To use prange you cannot use str(i) because it operates on python-objects.

python numpy - Function to perform vector & matrix addition

I'm working on the problem which asks me:
Add two NumPy vectors or matrices together, if possible. If it is not possible to add the two vectors/matrices together (because their sizes differ), return False.
Here is my approach:
import numpy as np
def mat_addition(A, B):
if A.shape != B.shape:
return False
else:
return np.sum(A,B)
But when I run the code for testing, it says
TypeError: only integer scalar arrays can be converted to a scalar index
Can someone tell me what's wrong with my code?
np.sum can in fact be used in the way you want. You just need to wrap the arguments you pass to np.sum in a list:
import numpy as np
def mat_addition(A, B):
if A.shape != B.shape:
return False
else:
return np.sum([A,B])
a = np.arange(5*3).reshape(5,3)
b = np.arange(5*3, 5*3*2).reshape(5,3)
print(mat_addition(a,b))
Output:
435
As per the numpy.sum docs, this function expects a single "array_like" object as its first argument. A list of arrays is a perfectly valid "array_like" object, so the above code works.

How do I vectorize a function with Numba, when the function takes arrays as arguments?

I'd like to use Numba to vectorize a function that will evaluate each row of a matrix. This would essentially apply a Numpy ufunc to the matrix, as opposed to looping over the rows. According to the docs:
You might ask yourself, “why would I go through this instead of compiling a simple iteration loop using the #jit decorator?”. The answer is that NumPy ufuncs automatically get other features such as reduction, accumulation or broadcasting.
With that in mind, I can't get even a toy example to work. The following simple example tries to calculate the sum of elements in each row.
import numba, numpy as np
# Define the row-wise function to be vectorized:
#numba.guvectorize(["void(float64[:],float64)"],"(n)->()")
def f(a,b):
b = a.sum()
# Apply the function to an array with five rows:
a = np.arange(10).reshape(5,2)
b = f(a)
I used the #guvectorize decorator, since I'd like the decorated function to take the argument a as each row of the matrix, which is an array; #vectorize takes only scalar inputs. I also wrote the signature to take an array argument and modify a scalar output. As per the docs, the decorated function does not use a return statement.
The result should be b = [1,5,9,13,17], but instead I got b=[0.,1.,2.,3.,4.]. Clearly, I'm missing something. I'd appreciate some direction, keeping in mind that the sum is just an example.
b = a.sum() can't ever modify the original value of b in python syntax.
numba gets around this by requiring every param to a gufunc be an array - scalars are just length 1, that you can then assign into. So you need both params as arrays, and the assignment must use []
#numba.guvectorize(["void(float64[:],float64[:])"],"(n)->()")
def f(a,b):
b[:] = a.sum()
# or b[0] = a.sum()
f(a)
Out[246]: array([ 1., 5., 9., 13., 17.])
#chrisb has a great answer above. This answer should add a bit of clarification for those newer to vectorization.
In terms of vectorization (in numpy and numba), you pass vectors of inputs.
For example:
import numpy as np
a=[1,2]
b=[3,4]
#np.vectorize
def f(x_1,x_2):
return x_1+x_2
print(f(a,b))
#-> [4,6]
In numba, you would traditionally need to pass in input types to the vectorize decorator. In more recent versions of numba, you do not need to specify vector input types if you pass in numpy arrays as inputs to a generically vectorized function.
For example:
import numpy as np
import numba as nb
a=np.array([1,2])
b=np.array([3,4])
# Note a generic vectorize decorator with input types not specified
#nb.vectorize
def f(x_1,x_2):
return x_1+x_2
print(f(a,b))
#-> [4,6]
So far, variables are simple single objects that get passed into the function from the input arrays. This makes it possible for numba to convert the python code to simple ufuncs that can operate on the numpy arrays.
In your example of summing up a vector, you would need to pass data as a single vector of vectors. To do this you need to create ufuncs that operate on vectors themselves. This requires a bit more work and specification for how you want the arbitrary outputs to be created Enter the guvectorize function (docs here and here).
Since you are providing a vector of vectors. Your outer vector is approached similar to how you use vectorize above. Now you need to specify what each inner vector looks like for your input values.
EG adding an arbitrary vector of integers. (This will not work for a few reasons explained below)
#nb.guvectorize([(nb.int64[:])])
def f(x):
return x.sum()
Now you will also need to add an extra input to your function and decorator. This allows you to specify an arbitrary type to store the output of your function. Instead of returning output, you will now update this input variable. Think of this final variable as a custom variable numba uses to generate an arbitrary output vector when creating the ufunc for numpy evaluation.
This input also needs to be specified in the decorator and your function should look something like this:
#nb.guvectorize([(nb.int64[:],nb.int64[:])])
def f(x, out):
out[:]=x.sum()
Finally you need to specify input and output formats in the decorator. These are given as matrix shapes in the order of input vectors and uses an arrow to indicate the output vector shape (which is actually your final input). In this case you are taking a vector of size n and outputing the results as a value and not a vector. Your format should be (n)->().
As a more complex example, assuming you have two input vectors for matrix multiplication of size (m,n) and (n,o) and you wanted your output vector to be of size (m,o) your decorator format would look like (m,n),(n,o)->(m,o).
A complete function for the current problem would look something like:
#nb.guvectorize([(nb.int64[:],nb.int64[:])], '(n)->()')
def f(x, out):
out[:]=x.sum()
Your end code should look something like:
import numpy as np
import numba as nb
a=np.arange(10).reshape(5,2)
# Equivalent to
# a=np.array([
# [0,1],
# [2,3],
# [4,5],
# [6,7],
# [8,9]
# ])
#nb.guvectorize([(nb.int64[:],nb.int64[:])], '(n)->()')
def f(x, out):
out[:]=x.sum()
print(f(a))
#-> [ 1 5 9 13 17]

Pass numpy array of list of integers in Cython method from Python

I would like to pass the following array of list of integers (i.e., it's not an two dimensional array) to the Cython method from python code.
Python Sample Code:
import numpy as np
import result
a = np.array([[1], [2,3]])
process_result(a)
The output of a is array([list([1]), list([2, 3])], dtype=object)
Cython Sample Code:
def process_result(int[:,:] a):
pass
The above code gives the following error:
ValueError: Buffer has wrong number of dimensions (expected 2, got 1)
I tried to pass a simple array instead of numpy I got the following error
a = [[1], [2,3]]
process_result(a)
TypeError: a bytes-like object is required, not 'list'
Kindly assist me how to pass the value of a into the Cython method process_result and whats the exact datatype needs to use to receive this value in Cython method.
I think you're using the wrong data-type. Instead of a numpy array of lists, you should be using a list of numpy arrays. There is very little benefit of using numpy arrays of Python objects (such as lists) - unlike numeric types they aren't stored particulatly efficiently, they aren't quick to do calculations on, and you can't accelerate them in Cython. Therefore the outermost level may as well be a normal Python list.
However, the inner levels all look to be homogenous arrays of integers, and so would be ideal candidates for Numpy arrays (especially if you want to process them in Cython).
Therefore, build your list as:
a = [ np.array([1],dtype=np.int), np.array([2,3],dtype=np.int) ]
(Or use tolist on a numpy array)
For your function you can define it like:
def process_result(list a):
cdef int[:] item
for item in a:
#operations on the inner arrays are fast!
pass
Here I've assumed that you most likely want to iterate over the list. Note that there's pretty little benefit in typing a to be list, so you could just leave it untyped (to accept any Python object) and then you could pass it other iterables too, like your original numpy array.
Convert the array of list of integer to list of object (i.e., list of list of integers - its not a two dimensional array)
Python Code:
import numpy as np
import result
a = np.array([[1], [2,3]]).tolist()
process_result(a)
The output of a is [[1], [2,3]]
Cython Sample Code:
def process_result(list a):
pass
Change the int[:, :] to list. It works fine.
Note: If anyone know the optimal answer kindly post it, It will be
helpful.

Categories

Resources