I wonder if anyone has an elegant solution to being able to pass a python list, a numpy vector (shape(n,)) or a numpy vector (shape(n,1)) to a function. The idea would be to generalize a function such that any of the three would be valid without adding complexity.
Initial thoughts:
1) Use a type checking decorator function and cast to a standard representation.
2) Add type checking logic inline (significantly less ideal than #1).
3) ?
I do not generally use python builtin array types, but suspect a solution to this question would also support those.
I think the simplest thing to do is to start off your function with numpy.atleast_2d. Then, all 3 of your possibilities will be converted to the x.shape == (n, 1) case, and you can use that to simplify your function.
For example,
def sum(x):
x = np.atleast_2d(x)
return np.dot(x, np.ones((x.shape[0], 1)))
atleast_2d returns a view on that array, so there won't be much overhead if you pass in something that's already an ndarray. However, if you plan to modify x and therefore want to make a copy instead, you can do x = np.atleast_2d(np.array(x)).
You can convert the three types to a "canonical" type, which is a 1dim array, using:
arr = np.asarray(arr).ravel()
Put in a decorator:
import numpy as np
import functools
def takes_1dim_array(func):
#functools.wraps(func)
def f(arr, *a, **kw):
arr = np.asarray(arr).ravel()
return func(arr, *a, **kw)
return f
Then:
#takes_1dim_arr
def func(arr):
print arr.shape
Related
I'm defining a function and want to use Numba Vectorize to speed it up, with cuda. I'm having trouble with the function signature. The function will return a float64 value. I want to pass two float64 values, which will be vectorized, and in addition a 9-tuple of float64 values, which will be scalars.
Here is my function header:
from numba import vectorize
#vectorize(['float64(float64, float64, UniTuple(float64, 9))'], target='cuda')
def fn_vec(E, L, fparams):
# calculations...
return result
but this gives an error:
TypeError: data type "(float64 x 9)" not understood
I've tried many variations, including (float64, ..., float64) in place of the UniTuple(), but can't get anything to work. How do I do this?
How do I specify a tuple in a Numba Vectorize signature?
In a numba.vectorize function you cannot use a tuple. That's because vectorize vectorizes the code for arrays of these types.
So using a float, float, tuple signature creates a function that expects two arrays containing floats and one array containing tuples. The problem is that there is no dtype for an array containing tuples - it could work if you use a structured array instead of an array containing tuples but I haven't tried that.
How do I specify a tuple in a Numba jit signature?
The correct way to specify a UniTuple in a numba signature is with numba.types.containers.UniTuple. In your case:
nb.types.containers.UniTuple(nb.types.float64, 9)
So the correct signature would be somthing like this:
import numba as nb
#nb.njit(
nb.types.float64(
nb.types.float64,
nb.types.float64,
nb.types.containers.UniTuple(nb.types.float64, 9)))
def func(f1, f2, ftuple):
# ...
return f1
I often avoid typing my numba functions explicitly - but when I do I found it very useful to use numba.typeof, for example:
>>> nb.typeof((1.0, ) * 9)
tuple(float64 x 9)
>>> type(nb.typeof((1.0, ) * 9))
numba.types.containers.UniTuple
>>> help(type(nb.typeof((1.0, ) * 9))) # I shortened the result:
Help on class UniTuple in module numba.types.containers:
class UniTuple(BaseAnonymousTuple, _HomogeneousTuple, numba.types.abstract.Sequence)
| UniTuple(*args, **kwargs)
|
| Type class for homogeneous tuples.
|
| Methods defined here:
|
| __init__(self, dtype, count)
| Initialize self. See help(type(self)) for accurate signature.
So the information is all there: It's numba.types.containes.UniTuple and you instantiate it with two arguments, the dtype (here float64) and the number (in this case 9).
In case you wanted to vectorize over the float arrays only
If you did not want to vectorize the function for the tuple argument you could simply create the vectorized function inside another function and call it there:
import numba as nb
import numpy as np
def func(E, L, fparams):
#nb.vectorize(['float64(float64, float64)'])
def fn_vec(e, l):
return e + l + fparams[1] # just to illustrate that the tuple is available
return fn_vec(E, L)
This makes the tuple available inside the vectorized function. However it has to create the inner function and compile it everytime you call the outer function, so this may be actually slower. I'm also not sure that this will work with the target="cuda", you may need to test that yourself.
I'm working on the problem which asks me:
Add two NumPy vectors or matrices together, if possible. If it is not possible to add the two vectors/matrices together (because their sizes differ), return False.
Here is my approach:
import numpy as np
def mat_addition(A, B):
if A.shape != B.shape:
return False
else:
return np.sum(A,B)
But when I run the code for testing, it says
TypeError: only integer scalar arrays can be converted to a scalar index
Can someone tell me what's wrong with my code?
np.sum can in fact be used in the way you want. You just need to wrap the arguments you pass to np.sum in a list:
import numpy as np
def mat_addition(A, B):
if A.shape != B.shape:
return False
else:
return np.sum([A,B])
a = np.arange(5*3).reshape(5,3)
b = np.arange(5*3, 5*3*2).reshape(5,3)
print(mat_addition(a,b))
Output:
435
As per the numpy.sum docs, this function expects a single "array_like" object as its first argument. A list of arrays is a perfectly valid "array_like" object, so the above code works.
I'd like to use Numba to vectorize a function that will evaluate each row of a matrix. This would essentially apply a Numpy ufunc to the matrix, as opposed to looping over the rows. According to the docs:
You might ask yourself, “why would I go through this instead of compiling a simple iteration loop using the #jit decorator?”. The answer is that NumPy ufuncs automatically get other features such as reduction, accumulation or broadcasting.
With that in mind, I can't get even a toy example to work. The following simple example tries to calculate the sum of elements in each row.
import numba, numpy as np
# Define the row-wise function to be vectorized:
#numba.guvectorize(["void(float64[:],float64)"],"(n)->()")
def f(a,b):
b = a.sum()
# Apply the function to an array with five rows:
a = np.arange(10).reshape(5,2)
b = f(a)
I used the #guvectorize decorator, since I'd like the decorated function to take the argument a as each row of the matrix, which is an array; #vectorize takes only scalar inputs. I also wrote the signature to take an array argument and modify a scalar output. As per the docs, the decorated function does not use a return statement.
The result should be b = [1,5,9,13,17], but instead I got b=[0.,1.,2.,3.,4.]. Clearly, I'm missing something. I'd appreciate some direction, keeping in mind that the sum is just an example.
b = a.sum() can't ever modify the original value of b in python syntax.
numba gets around this by requiring every param to a gufunc be an array - scalars are just length 1, that you can then assign into. So you need both params as arrays, and the assignment must use []
#numba.guvectorize(["void(float64[:],float64[:])"],"(n)->()")
def f(a,b):
b[:] = a.sum()
# or b[0] = a.sum()
f(a)
Out[246]: array([ 1., 5., 9., 13., 17.])
#chrisb has a great answer above. This answer should add a bit of clarification for those newer to vectorization.
In terms of vectorization (in numpy and numba), you pass vectors of inputs.
For example:
import numpy as np
a=[1,2]
b=[3,4]
#np.vectorize
def f(x_1,x_2):
return x_1+x_2
print(f(a,b))
#-> [4,6]
In numba, you would traditionally need to pass in input types to the vectorize decorator. In more recent versions of numba, you do not need to specify vector input types if you pass in numpy arrays as inputs to a generically vectorized function.
For example:
import numpy as np
import numba as nb
a=np.array([1,2])
b=np.array([3,4])
# Note a generic vectorize decorator with input types not specified
#nb.vectorize
def f(x_1,x_2):
return x_1+x_2
print(f(a,b))
#-> [4,6]
So far, variables are simple single objects that get passed into the function from the input arrays. This makes it possible for numba to convert the python code to simple ufuncs that can operate on the numpy arrays.
In your example of summing up a vector, you would need to pass data as a single vector of vectors. To do this you need to create ufuncs that operate on vectors themselves. This requires a bit more work and specification for how you want the arbitrary outputs to be created Enter the guvectorize function (docs here and here).
Since you are providing a vector of vectors. Your outer vector is approached similar to how you use vectorize above. Now you need to specify what each inner vector looks like for your input values.
EG adding an arbitrary vector of integers. (This will not work for a few reasons explained below)
#nb.guvectorize([(nb.int64[:])])
def f(x):
return x.sum()
Now you will also need to add an extra input to your function and decorator. This allows you to specify an arbitrary type to store the output of your function. Instead of returning output, you will now update this input variable. Think of this final variable as a custom variable numba uses to generate an arbitrary output vector when creating the ufunc for numpy evaluation.
This input also needs to be specified in the decorator and your function should look something like this:
#nb.guvectorize([(nb.int64[:],nb.int64[:])])
def f(x, out):
out[:]=x.sum()
Finally you need to specify input and output formats in the decorator. These are given as matrix shapes in the order of input vectors and uses an arrow to indicate the output vector shape (which is actually your final input). In this case you are taking a vector of size n and outputing the results as a value and not a vector. Your format should be (n)->().
As a more complex example, assuming you have two input vectors for matrix multiplication of size (m,n) and (n,o) and you wanted your output vector to be of size (m,o) your decorator format would look like (m,n),(n,o)->(m,o).
A complete function for the current problem would look something like:
#nb.guvectorize([(nb.int64[:],nb.int64[:])], '(n)->()')
def f(x, out):
out[:]=x.sum()
Your end code should look something like:
import numpy as np
import numba as nb
a=np.arange(10).reshape(5,2)
# Equivalent to
# a=np.array([
# [0,1],
# [2,3],
# [4,5],
# [6,7],
# [8,9]
# ])
#nb.guvectorize([(nb.int64[:],nb.int64[:])], '(n)->()')
def f(x, out):
out[:]=x.sum()
print(f(a))
#-> [ 1 5 9 13 17]
I would like to apply a function to every element of a numpy.ndarray, something like this:
import numpy
import math
a = numpy.arange(10).reshape(2,5)
b = map(math.sin, a)
print b
but this gives:
TypeError: only length-1 arrays can be converted to Python scalars
I know I can do this:
import numpy
import math
a = numpy.arange(10).reshape(2,5)
def recursive_map(function, value):
if isinstance(value, (list, numpy.ndarray)):
out = numpy.array(map(lambda x: recursive_map(function, x), value))
else:
out = function(value)
return out
c = recursive_map(math.sin, a)
print c
My question is: is there a built-in function or method to do this? It seems elementary, but I haven't been able to find it. I am using Python 2.7.
Use np.sin it works element wise on ndarray already.
You can also reshape to a 1D array and the native map should just work. Then you can use reshape again to restore the original dimensions.
You can also use np.vectorize to write functions that can work like np.sin.
A toy-case for my problem:
I have a numpy array of size, say, 1000:
import numpy as np
a = np.arange(1000)
I also have a "projection array" p which is a mapping from a to another array b:
p = np.random.randint(0,1000,(1000,1000))
It is easy to get b from a using "fancy indexing":
b = a[p]
But b is not a view, as noted by several previous questions/answers and the numpy documentation.
Unfortunately, in my case only the values in a change over the course of a long simulation and using fancy indexing at each iteration to obtain b becomes very costly. I only read from b and do not modify it.
I understand it is not possible (yet) to solve this with fancy indexing.
I was wondering if anyone had a similar problem/bottleneck and came up with some other workaround?
What your asking for isn't practical and that's why the numpy folks haven't implemented it. You could do it yourself with something like:
class FancyView(object):
def __init__(self, array, index):
self._array = array
self._index = index.copy()
def __array__(self):
return self._array[self._index]
def __getitem__(self, index):
return self._array[self._index[index]]
b = FancyView(a, p)
But notice that the expensive a[p] operation will get called every time you use b as an array. There is no other practice way of making a 'view' of this kind. Numpy can get away with using views for basic slicing because it can manipulate the strides, but there is no way to do something like this using strides.
If you only need parts of b you might be able to get some time savings by indexing the fancy view instead of using it as an array.