The sparse matrix format (dok) assumes that values of keys not in the dictionary are equal to zero. Is there any way to make it use a default value other than zero?
Also, is there a way to calculate the log of a sparse matrix (akin to np.log in regular numpy matrix)
That feature is not built-in, but if you really need this, you should be able to write your own dok_matrix class, or subclass Scipy's one. The Scipy implementation is here. At least in the places where dict.* calls are made, the default value needs to be changed --- and maybe there are some other changes that need to be made.
However, I'd try to reformulate the problem so that this is not needed. If you for instance do linear algebra, you can isolate the constant term, and do instead
from scipy.sparse.linalg import LinearOperator
A = whatever_dok_matrix_minus_constant_term
def my_matvec(x):
return A*x + constant_term * x.sum()
op = LinearOperator(A.shape, matvec=my_matvec)
To most linear algebra routines (e.g. iterative solvers), you can pass in op instead of A.
As to the matrix logarithm: logarithm of a sparse matrix (as in scipy.linalg.logm) is typically dense, so you should just convert the matrix to a dense one first, and then compute the logarithm as usual. As far as I see, using a sparse matrix would give no performance gain. If you need only to compute a product of a vector and the logarithm, log(A) * v vector, some Krylov method might help, though.
If you OTOH want to compute the logarithm elementwise, you can modify the .data attribute directly (available at least in COO, CSR, and CSC)
x = A.tocoo()
x.data = np.log(x.data)
A = x.todok()
This leaves the zero elements alone, but as above, this allows treating the constant part separately.
Related
I'm looking for a numpy function (or a function from any other package) that would efficiently evaluate
with f being a vector-valued function of a vector-valued input x. The product is taken to be a simple component-wise multiplication.
The issue here is that both the length of each x vector and the total number of result vectors (f of x) to be multiplied (N) is very large, in the order of millions. Therefore, it is impossible to generate all the results at once (it wouldn't fit in memory) and then multiply them afterwards using np.multiply.reduce or the like .
A toy example of the type of code I would like to replace is:
import numpy as np
x = np.ones(1000000)
prod = f(x)
for i in range(2, 1000000):
prod *= f(i * np.ones(1000000))
with f a vector-valued function with the dimension of its output equal to the dimension of its input.
To be sure: I'm not looking for equivalent code, but for a single, highly optimized function. Is there such a thing?
For those familiar with Wolfram Mathematica: It would be the equivalent to Product. In Mathematica, I would be able to simply write Product[f[i ConstantArray[1,1000000]],{i,1000000}].
Numpy ufuncs all have a reduce method. np.multiply is a ufunc. So it's a one-liner:
np.multiply.reduce(v)
Where v is the vector of values you compute in what is hopefully an equally efficient manner.
To compute the vector, just apply your function to the input:
v = f(x)
So with your example:
np.multiply.reduce(np.sin(x))
Alternative
A simpler way to phrase the same thing is np.prod:
np.prod(v)
You can also use the prod method directly on your vector:
v.prod()
I want to do constrained optimisation using a vector of constraints using the scipy.optimize library. In particular, I am supplying a vector of 3d coordinates r0 of N points -- hence a matrix of size N x 3 -- as input to the function. The coordinates are Cartesian, and I wish to freeze out all y dependence. So that means that I need the second column of my N x 3 matrix to be held to a constant, y0 say. How do I go about defining such a list of constraints?
To be concrete, let's the consider the COBYLA algorithm (https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_cobyla.html#scipy.optimize.fmin_cobyla). I tried the following construction:
cons = []
for i in range(xyz0.shape[0]):
def f(x):
return x[i,1]-xyz0cyl[i,1]
cons.append(f)
fmin_cobyla(energy, xyz0, cons, rhoend=1e-7)
and got the error:
41 for i in range(xyz0.shape[0]):
42 def f(x):
---> 43 return x[i,1]-xyz0cyl[i,1]
44 cons.append(f)
45
IndexError: too many indices for array
What is going on?
Your approach is wrong in quite a number of ways.
First, minimize takes a sequence as constraint, so that your Nx3 array is first flattened before it is passed to constraint functions leaving you with an array of only one dimension. Therefore you can't index with a tuple except you reshape your array inside the constraint functions to the original Nx3; could be pretty expensive for large N:
return x.reshape(-1, 3)[i,1] - xyz0cyl[i,1]
Secondly, closures in Python are late binding; all of the constraints functions will use the last value of i after the for loop as completed. You'll only finding out later on after fixing the first bug that the optimisation does not go as expected. See How do lexical closures work? to learn more.
A better approach is to actually make the y-axis (i.e. 1st column) stationary in your energy function or simply passing a Nx2 matrix instead to fmin_cobyla.
As the question suggests, I would like to compute the gradient with respect to a matrix row. In code:
import numpy.random as rng
import theano.tensor as T
from theano import function
t_x = T.matrix('X')
t_w = T.matrix('W')
t_y = T.dot(t_x, t_w.T)
t_g = T.grad(t_y[0,0], t_x[0]) # my wish, but DisconnectedInputError
t_g = T.grad(t_y[0,0], t_x) # no problems, but a lot of unnecessary zeros
f = function([t_x, t_w], [t_y, t_g])
y,g = f(rng.randn(2,5), rng.randn(7,5))
As the comments indicate, the code works without any problems when I compute the gradient with respect to the entire matrix. In this case the gradient is correctly computed, but the problem is that the result has only non-zero entries in row 0 (because other rows of x obviously do not appear in the equations for the first row of y).
I have found this question, suggesting to store all rows of the matrix in separate variables and build graphs from these variables. In my setting though, I have no idea how much rows might be in X.
Would anybody have an idea how to get the gradient with respect to a single row of a matrix or how I could omit the extra zeros in the output? If anybody would have suggestions how an arbitrary amount of vectors can be stacked, that should work as well, I guess.
I realised that it is possible to get rid of the zeros when computing derivatives with respect to the entries in row i:
t_g = T.grad(t_y[i,0], t_x)[i]
and for computing the Jacobian, I found out that
t_g = T.jacobian(t_y[i], t_x)[:,i]
does the trick. However it seems to have a rather heavy impact on computation speed.
It would also be possible to approach this problem mathematically. The Jacobian of the matrix multiplication t_y w.r.t. t_x is simply the transpose of t_w.T, which is t_w in this case (the transpose of the transpose is the original matrix). Thus, the computation would be as simple as
t_g = t_w
I am trying to make a hack of tf.gradient in tensorflow that would give, for a tensor y of rank (M,N) and a tensor x of rank (Q,P) a gradient tensor of rank (M,N,Q,P) as one would naturally expect.
As pointed out in multiple questions on this site*, what one gets is a rank (Q,P) which is the grad of the sum of the elements of y. Now what I can't figure out, looking into the tensorflow code is where is that sum over elements of y is made? Is it as the beginning or at the end? Could someone help me pinpoint the lines of code where that is done?
*
Tensorflow gradients: without automatic implicit sum
TensorFlow: Compute Hessian matrix (and higher order derivatives)
Unaggregated gradients / gradients per example in tensorflow
Separate gradients in tf.gradients
I've answered it here but I'm guessing it's not very useful because you can't use this knowledge to be able to differentiate with respect to non-scalar y. Scalar y assumption is central to design of reverse AD algorithm, and there's not a single place you can modify to support non-scalar ys. Since this confusion keeps coming up, let me go in a bit more detail as to why it's non-trivial:
First of all, how reverse AD works -- suppose we have a function f that's composition of component functions f_i. Each component function takes a vector of length n and produces a vector of length n.
Its derivative can be expressed as a sequence of matrix multiplications. The entire expression can be expressed below.
When differentiating, function composition becomes matrix multiplication of corresponding component function Jacobians.
Note that this involves matrix/matrix products which proves to be too expensive for neural networks. IE, AlexNet contains 8k activations in its convnet->fc transition layer. Doing matrix multiples where each matrix is 8k x 8k would take too long. The trick that makes it efficient is to assume that last function in the chain produces a scalar. Then its Jacobian is a vector, and the whole thing can be rewritten in terms of vector-matrix multiplies, instead of matrix-matrix multiplies.
This product can be computed efficiently by doing multiplication left to right so everything you do is an nxn vector-matrix multiply instead of nxn matrix-matrix multiply.
You can make it even more efficient by never forming those nxn derivative matrices in a first place, and associate each component function with an op that does vector x Jacobian matrix product implicitly. That's what TensorFlow tf.RegisterGradient does. Here's an illustration of the "grad" associated with an a component function.
Now, this is done for vector value functions, what if your functions are matrix valued? This is a typical situation we deal with in neural networks. IE, in a layer that does matrix multiply, matrix that you multiply by is an unknown and it is matrix valued. In that case, the last derivative has rank 2, and remaining derivatives have rank 3.
Now to apply the chain rule you'd have to deal with extra notation because now "x" in chain rule means matrix multiplication generalized to tensors of rank-3.
However, note that we never have to do the multiplication explicitly since we are using a grad operator. So now in practice, this operator now takes values of rank-2 and produces values of rank-2.
So in all of this, there's an assumption that final target is scalar which allows fully connected layers to be differentiated by passing matrices around.
If you want to extend this to support non-scalar vector, you would need to modify the reverse AD algorithm to to propagate more info. IE, for fully connected feed-forward nets you would propagate rank-3 tensors around instead of matrices.
With the jacobian function in Tensorflow 2, this is a straightforward task.
with tf.GradientTape() as tape1:
with tf.GradientTape() as tape2:
y = layer(x)
loss = tf.reduce_mean(y ** 2)
first_order_gradient = tape2.gradient(loss, layer.trainable_weights)
hessian = tape1.jacobian(first_order_gradient, layer.trainable_weights)
https://www.tensorflow.org/guide/advanced_autodiff#hessian
Is there a way to use numpy.linalg.det or numpy.linalg.inv on an nx3x3 array (a line in a multiband image), for example? Right now I am doing something like:
det = numpy.array([numpy.linalg.det(i) for i in X])
but surely there is a more efficient way. Of course, I could use map:
det = numpy.array(map(numpy.linalg.det, X))
Any other more direct way?
I'm pretty sure there is no substantially more efficient way than what you have. You can save some memory by first creating an empty array for the results and writing all results directly to that array:
res = numpy.empty_like(X)
for i, A in enumerate(X):
res[i] = numpy.linalg.inv(A)
This won't be any faster, though -- it will only use less memory.
a "normal" determinant is only defined for a matrix (dimension=2), so if that's what you want i don't see another way.
if you really want to compute the determinant of a cube then you could try to implement one of the ways described here:
http://en.wikipedia.org/wiki/Hyperdeterminant
notice that it is not necessarily the same value as the one you're currently computing.
New answer to an old question: Since version 1.8.0, numpy supports evaluating a batch of 2D matrices. For a batch of MxM matrices, the input and output now looks like:
linalg.det(a)
Compute the determinant of an array.
Parameters a(…, M, M) array_like
Input array to compute determinants for.
Returns det(…) array_like
Determinant of a.
Note the ellipsis. There can be multiple "batch dimensions", where for example you can evaluate a determinants on a meshgrid.
https://numpy.org/doc/stable/reference/generated/numpy.linalg.det.html
https://numpy.org/doc/stable/reference/generated/numpy.linalg.inv.html