I've been experimenting with Numba lately, and here's something that I still cannot understand:
In a normal Python function with NumPy arrays you can do something like this:
# Subtracts two NumPy arrays and returns an array as the result
def sub(a, b):
res = a - b
return res
But, when you use Numba's #guvectorize decorator like so:
# Subtracts two NumPy arrays and returns an array as the result
#guvectorize(['void(float32[:], float32[:], float32[:])'],'(n),(n)->(n)')
def subT(a, b, res):
res = a - b
The result is not even correct. Worse still, there are instances where it complains about "Invalid usage of [math operator] with [parameters]"
I am baffled. Even if I try this:
# Subtracts two NumPy arrays and returns an array as the result
#guvectorize(['void(float32[:], float32[:], float32[:])'],'(n),(n)->(n)')
def subTt(a, b, res):
res = np.subtract(a,b)
The result is still incorrect. Considering that this is supposed to be a supported Math operation, I don't see why it doesn't work.
I know the standard way is like this:
# Subtracts two NumPy arrays and returns an array as the result
#guvectorize(['void(float32[:], float32[:], float32[:])'],'(n),(n)->(n)')
def subTtt(a, b, res):
for i in range(a.shape[0]):
res[i] = a[i] - b[i]
and this does work as per expected.
But what is wrong with my way?
P/S This is just a trivial example to explain my problem, I don't actually plan to use #guvectorize just to subtract arrays :P
P/P/S I suspect it has something to do with how the arrays are copied to gpu memory, but I am not sure...
P/P/P/S This looked relevant but the function here operates only on a single thread right...
The correct way to write this is:
#guvectorize(['void(float32[:], float32[:], float32[:])'],'(n),(n)->(n)')
def subT(a, b, res):
res[:] = a - b
The reason what you tried didn't work is a limitation of python syntax not particular to numba.
name = expr rebinds the value of name to expr, it can never mutate the original value of name, as you could with, e.g. c++ references.
name[] = expr calls (in essence), name.__setitem__ which can be used to modify name, as numpy arrays do, the empty slice [:] refers to the whole array.
Related
I'd like to use Numba to vectorize a function that will evaluate each row of a matrix. This would essentially apply a Numpy ufunc to the matrix, as opposed to looping over the rows. According to the docs:
You might ask yourself, “why would I go through this instead of compiling a simple iteration loop using the #jit decorator?”. The answer is that NumPy ufuncs automatically get other features such as reduction, accumulation or broadcasting.
With that in mind, I can't get even a toy example to work. The following simple example tries to calculate the sum of elements in each row.
import numba, numpy as np
# Define the row-wise function to be vectorized:
#numba.guvectorize(["void(float64[:],float64)"],"(n)->()")
def f(a,b):
b = a.sum()
# Apply the function to an array with five rows:
a = np.arange(10).reshape(5,2)
b = f(a)
I used the #guvectorize decorator, since I'd like the decorated function to take the argument a as each row of the matrix, which is an array; #vectorize takes only scalar inputs. I also wrote the signature to take an array argument and modify a scalar output. As per the docs, the decorated function does not use a return statement.
The result should be b = [1,5,9,13,17], but instead I got b=[0.,1.,2.,3.,4.]. Clearly, I'm missing something. I'd appreciate some direction, keeping in mind that the sum is just an example.
b = a.sum() can't ever modify the original value of b in python syntax.
numba gets around this by requiring every param to a gufunc be an array - scalars are just length 1, that you can then assign into. So you need both params as arrays, and the assignment must use []
#numba.guvectorize(["void(float64[:],float64[:])"],"(n)->()")
def f(a,b):
b[:] = a.sum()
# or b[0] = a.sum()
f(a)
Out[246]: array([ 1., 5., 9., 13., 17.])
#chrisb has a great answer above. This answer should add a bit of clarification for those newer to vectorization.
In terms of vectorization (in numpy and numba), you pass vectors of inputs.
For example:
import numpy as np
a=[1,2]
b=[3,4]
#np.vectorize
def f(x_1,x_2):
return x_1+x_2
print(f(a,b))
#-> [4,6]
In numba, you would traditionally need to pass in input types to the vectorize decorator. In more recent versions of numba, you do not need to specify vector input types if you pass in numpy arrays as inputs to a generically vectorized function.
For example:
import numpy as np
import numba as nb
a=np.array([1,2])
b=np.array([3,4])
# Note a generic vectorize decorator with input types not specified
#nb.vectorize
def f(x_1,x_2):
return x_1+x_2
print(f(a,b))
#-> [4,6]
So far, variables are simple single objects that get passed into the function from the input arrays. This makes it possible for numba to convert the python code to simple ufuncs that can operate on the numpy arrays.
In your example of summing up a vector, you would need to pass data as a single vector of vectors. To do this you need to create ufuncs that operate on vectors themselves. This requires a bit more work and specification for how you want the arbitrary outputs to be created Enter the guvectorize function (docs here and here).
Since you are providing a vector of vectors. Your outer vector is approached similar to how you use vectorize above. Now you need to specify what each inner vector looks like for your input values.
EG adding an arbitrary vector of integers. (This will not work for a few reasons explained below)
#nb.guvectorize([(nb.int64[:])])
def f(x):
return x.sum()
Now you will also need to add an extra input to your function and decorator. This allows you to specify an arbitrary type to store the output of your function. Instead of returning output, you will now update this input variable. Think of this final variable as a custom variable numba uses to generate an arbitrary output vector when creating the ufunc for numpy evaluation.
This input also needs to be specified in the decorator and your function should look something like this:
#nb.guvectorize([(nb.int64[:],nb.int64[:])])
def f(x, out):
out[:]=x.sum()
Finally you need to specify input and output formats in the decorator. These are given as matrix shapes in the order of input vectors and uses an arrow to indicate the output vector shape (which is actually your final input). In this case you are taking a vector of size n and outputing the results as a value and not a vector. Your format should be (n)->().
As a more complex example, assuming you have two input vectors for matrix multiplication of size (m,n) and (n,o) and you wanted your output vector to be of size (m,o) your decorator format would look like (m,n),(n,o)->(m,o).
A complete function for the current problem would look something like:
#nb.guvectorize([(nb.int64[:],nb.int64[:])], '(n)->()')
def f(x, out):
out[:]=x.sum()
Your end code should look something like:
import numpy as np
import numba as nb
a=np.arange(10).reshape(5,2)
# Equivalent to
# a=np.array([
# [0,1],
# [2,3],
# [4,5],
# [6,7],
# [8,9]
# ])
#nb.guvectorize([(nb.int64[:],nb.int64[:])], '(n)->()')
def f(x, out):
out[:]=x.sum()
print(f(a))
#-> [ 1 5 9 13 17]
I want to generate symmetric zero diagonal matrices. My symmetric part work, but when I use fill_diagonal from numpy as the result I got "None". My code is below. Thank you for reading
import numpy as np
matrix_size = int(input("Size of the matrix \n"))
random_matrix = np.random.random_integers(-4,4,size=(matrix_size,matrix_size))
symmetric_matrix = (random_matrix + random_matrix.T)/2
print(symmetric_matrix)
zero_diogonal_matrix = np.fill_diagonal(symmetric_matrix,0)
print(zero_diogonal_matrix)
np.fill_diagonal(), like many other methods across python/numpy, works in-place. For example: Why does “return list.sort()” return None, not the list?. That is that it directly alters the object in memory and does not create a new object. The return value from such functions is None. Therefore, change:
zero_diogonal_matrix = np.fill_diagonal(symmetric_matrix,0)
To just:
np.fill_diagonal(symmetric_matrix,0)
You will then see the change reflected in symmetric_matrix.
It's probably overkill, but in case you want to preserve the tenet of minimising surprise, you could wrap this (and other functions like it) in a function that takes care of preserving the original array:
def fill_diagonal(source_array, diagonal):
copy = source_array.copy()
np.fill_diagonal(copy, diagonal)
return copy
But the question then becomes "who exactly is going to be least surprised by doing it this way?"
I wonder if anyone has an elegant solution to being able to pass a python list, a numpy vector (shape(n,)) or a numpy vector (shape(n,1)) to a function. The idea would be to generalize a function such that any of the three would be valid without adding complexity.
Initial thoughts:
1) Use a type checking decorator function and cast to a standard representation.
2) Add type checking logic inline (significantly less ideal than #1).
3) ?
I do not generally use python builtin array types, but suspect a solution to this question would also support those.
I think the simplest thing to do is to start off your function with numpy.atleast_2d. Then, all 3 of your possibilities will be converted to the x.shape == (n, 1) case, and you can use that to simplify your function.
For example,
def sum(x):
x = np.atleast_2d(x)
return np.dot(x, np.ones((x.shape[0], 1)))
atleast_2d returns a view on that array, so there won't be much overhead if you pass in something that's already an ndarray. However, if you plan to modify x and therefore want to make a copy instead, you can do x = np.atleast_2d(np.array(x)).
You can convert the three types to a "canonical" type, which is a 1dim array, using:
arr = np.asarray(arr).ravel()
Put in a decorator:
import numpy as np
import functools
def takes_1dim_array(func):
#functools.wraps(func)
def f(arr, *a, **kw):
arr = np.asarray(arr).ravel()
return func(arr, *a, **kw)
return f
Then:
#takes_1dim_arr
def func(arr):
print arr.shape
I am trying to convert part of a native python function to cython to improve the compute time. I would like to write a cython function just for the loop component that is taking up the time (as ipython lprun kindly told me). However this function takes in variably sized matrices .. and I can't see how to bring that across easily to statically typed cython.
for index1 in range(0,num_products):
for index2 in range(0,num_products):
cond_prob = (data[index1] * data[index2]).sum() / max(col_sums[index1], col_sums[index2])
prox[index1][index2] = cond_prob
This issue is that num_products changes year to year, so the matrix (data) size is variable.
What is the best strategy here?
Should I write two C functions. One to create a matrix of a certain dimension using memalloc, and then One to do the loops over the created matrix?
Is there some fancy cython/numpy wizardry to help in this scenario? Can I write a C function that takes in a variably sized Numpy Array in memory and pass the size?
Cython code is (strategically) statically typed, but that doesn't mean that arrays must have a fixed size. In straight C passing a multidimensional array to a function can be a little awkward maybe, but in Cython you should be able to do something like the following:
Note I took the function and variable names from your follow-up question.
import numpy as np
cimport numpy as np
cimport cython
#cython.boundscheck(False)
#cython.cdivision(True)
def cooccurance_probability_cy(double[:,:] X):
cdef int P, i, j, k
P = X.shape[0]
cdef double item
cdef double [:] CS = np.sum(X, axis=1)
cdef double [:,:] D = np.empty((P, P), dtype=np.float)
for i in range(P):
for j in range(P):
item = 0
for k in range(P):
item += X[i,k] * X[j,k]
D[i,j] = item / max(CS[i], CS[j])
return D
On the other hand, using just Numpy should also be quite fast for this problem, if you use the right functions and some broadcasting. In fact, as the calculation complexity is dominated by the matrix multiplication, I found the following is much faster than the Cython code above (np.inner uses a highly optimized BLAS routine):
def new(X):
CS = np.sum(X, axis=1, keepdims=True)
D = np.inner(X,X) / np.maximum(CS, CS.T)
return D
Have you tried getting rid of the for loops in numpy?
for the first part of your equation you could for example try:
(data[ np.newaxis,:] * data[:,np.newaxis]).sum(2)
if memory is an issue you can also use the np.einsum() function.
For the second part one could probably also cook up a numpy expression (bit more difficult) if you've not already tried that.
A toy-case for my problem:
I have a numpy array of size, say, 1000:
import numpy as np
a = np.arange(1000)
I also have a "projection array" p which is a mapping from a to another array b:
p = np.random.randint(0,1000,(1000,1000))
It is easy to get b from a using "fancy indexing":
b = a[p]
But b is not a view, as noted by several previous questions/answers and the numpy documentation.
Unfortunately, in my case only the values in a change over the course of a long simulation and using fancy indexing at each iteration to obtain b becomes very costly. I only read from b and do not modify it.
I understand it is not possible (yet) to solve this with fancy indexing.
I was wondering if anyone had a similar problem/bottleneck and came up with some other workaround?
What your asking for isn't practical and that's why the numpy folks haven't implemented it. You could do it yourself with something like:
class FancyView(object):
def __init__(self, array, index):
self._array = array
self._index = index.copy()
def __array__(self):
return self._array[self._index]
def __getitem__(self, index):
return self._array[self._index[index]]
b = FancyView(a, p)
But notice that the expensive a[p] operation will get called every time you use b as an array. There is no other practice way of making a 'view' of this kind. Numpy can get away with using views for basic slicing because it can manipulate the strides, but there is no way to do something like this using strides.
If you only need parts of b you might be able to get some time savings by indexing the fancy view instead of using it as an array.