I recently hit some performance bottlenecks with symbolic matrix derivatives in Sympy (specifically, the single line of code evaluating symbolic matrices via substitution using lambdas was taking ~90% of the program's runtime), so I decided to give Theano a go.
Its previous application was evaluating the partial derivatives over the hyperparameters of a Gaussian process, where using a (1, k) dimension matrix of Sympy symbols (MatrixSymbol) worked nicely in terms of iterating over this list and differentiating the matrix on each item.
However, this doesn't carry over into Theano, and the documentation doesn't seem to detail how to do this. Indexing a symbolic vector in Theano returns the Subtensor type, which is invalid for calculating the gradient on.
Below is a simple (but entirely algorithmically incorrect - stripped down to the functionality I'm trying to obtain) version of what I'm attempting to do.
EDIT: I have modified the code sample to include the data as a tensor to be passed into the function as suggested below, and included an alternate attempt at instead using a list of separate scalar tensors as I cannot index the values of a symbolic Theano vector, though also to no avail.
import theano
import numpy as np
# Sample data
data = np.array(10*np.random.rand(5, 3), dtype='int64')
# Not including data as tensor, incorrect/invalid indexing of symbolic vector
l_scales_sym = theano.tensor.dvector('l_scales')
x = theano.tensor.dmatrix('x')
f = x/l_scales_sym
f_eval = theano.function([x, l_scales_sym], f)
df_dl = theano.gradient.jacobian(f.flatten(), l_scales_sym[0])
df_dl_eval = theano.function([x, l_scales_sym], df_dl)
The second last line of the code snippet is where I am trying to get a partial derivative over one of the elements in the list of 'length scale' variables, but this sort of indexing is inapplicable to the symbolic vectors.
Any help would be greatly appreciated!
When using theano, all variables should be defined as theano tensors (or shared variables); otherwise, the variable does not become part of the computational graph. In f = data/l_scales_sym the variable data is a numpy array. Try to also define it as a a tensor, it should work.
Related
This may seem like a basic question, but I am unable to work it through.
In the forward pass of my neural network, I have an output tensor of shape 8x3x3, where 8 is my batch size. We can assume each 3x3 tensor to be a non-singular matrix. I need to find the inverse of these matrices.
The PyTorch inverse() function only works on square matrices. Since I now have 8x3x3, how do I apply this function to every matrix in the batch in a differentiable manner?
If I iterate through the samples and append the inverses to a python list, which I then convert to a PyTorch tensor, should it be a problem during backprop? (I am asking since converting PyTorch tensors to numpy to perform some operations and then back to a tensor won't compute gradients during backprop for such operations)
I also get the following error when I try to do something like that.
a = torch.arange(0,8).view(-1,2,2)
b = [m.inverse() for m in a]
c = torch.FloatTensor(b)
TypeError: 'torch.FloatTensor' object does not support indexing
EDIT:
As of Pytorch version 1.0, torch.inverse now supports batches of tensors. See here. So you can simply use the built-in function torch.inverse
OLD ANSWER
There are plans to implement batched inverse soon. For discussion, see for example issue 7500 or issue 9102. However, as of the time of writing, the current stable version (0.4.1), no batch inverse operation is available.
Having said that, recently batch support for torch.gesv was added. This can be (ab)used to define your own batched inverse operation along the following lines:
def b_inv(b_mat):
eye = b_mat.new_ones(b_mat.size(-1)).diag().expand_as(b_mat)
b_inv, _ = torch.gesv(eye, b_mat)
return b_inv
I found that this gives good speed-ups over a for loop when running on GPU.
You could split the tensor using torch.functional.unbind(), apply inverse to every element of the result, and then stack back:
a = torch.arange(0,8).view(-1,2,2)
b = [t.inverse() for t in torch.functional.unbind(a)]
c = torch.functional.stack(b)
I have this code snippet I am trying to understand that is in python. I don't understand how scalars operate on arrays in all cases. In most code I read it makes sense that operations work on each value of an array.
sig_sq_samples = beta*invgamma.rvs(alpha,size=n_samples)
var_norm = sqrt(sig_sq_samples/kN)
mu_samples = norm.rvs(mean_norm,scale=var_norm,size=n_samples)
I want to know how each line is functioning. The reason being is that I don't have a linux machine setup with the library and thought someone may be able to help me understand this python code I have found in an article. I can not setup the environment in a reasonable amount of time.
invgamma.rvs() - returns an array of numeric values
beta - is a scalar value
sig_sq_samples (I'm assuming)- is an array of beta * each array value of
what invgamma.rvs() function returns.
var_norm - I have no idea what this value is supposed to be because
the norm.rvs function underneath takes a scalar (scale=var_norm).
In short how is sqrt(siq_sq_samples/kn) with kN also a scalar returning back a scalar? What is happening here? This one line is what is getting me. Like I said earlier sig_sq_samples is an array. I hope I'm not wrong about that line that is producing sig_sq_samples. At one point or another the values being worked on are scalars. I am from c# where hard types are used and I have worked with scripting languages such as PERL where I had a lot of experience with what "shortcut" operations do. Ex. C# does not allow you to multiply a scalar to an array. I tried to look up how scalars work with arrays but it doesn't clarify this code to me. Anyone answering is more than welcome to look up the functions above in case I am wrong about anything. I put a lot of effort and I have many years of development experience. Either this code snippet is wrong or I'm just not seeing something real obvious.
In the line
mu_samples = norm.rvs(mean_norm,scale=var_norm,size=n_samples)
n_samples has the same size as var_norm, so what is happening is that for the ith sample of n_samples, it generates it using the ith scale parameter of var_norm, var_norm[i]
Internal to the code is
vals = vals * scale + loc, when scale is an array it uses broadcasting which is a common feature of numpy. norm.rvs already generated an array of n_samples random values. When multiplied by scale, it does an element-wise multiplication between each array. The result is that the left hand side will also be an array value. For more information see here
sig_sq_samples = beta*invgamma.rvs(alpha,size=n_samples)
if
invgamma.rvs() - returns an array of numeric values
beta - is a scalar value
then
sig_sq_samples = beta*invgamma.rvs(alpha,size=n_samples)
produces another array of the same size. Scalar beta just multiplies each element.
In
var_norm = sqrt(sig_sq_samples/kN)
kN is scalar doing the same thing - dividing each element. I assume sqrt is numpy.sqrt, which takes the sqrt of each element. So var_norm is again an array of the original size (of invgammas.rvs()).
mu_samples = norm.rvs(mean_norm,scale=var_norm,size=n_samples)
I don't know what norm.rvs does, or where it is from. It's not numpy, but could be a package in scipy. I'd have to google it. It takes one postional argument, here mean_norm, and two (at least) keyword values. n_samples is probably a number, eg. 100. But scale could certainly take an array, such as var_norm.
======================
http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.rv_continuous.rvs.html#scipy.stats.rv_continuous.rvs
appears to be the documentation for the rvs method (norm is a subclass of rv_continuous).
Arguments are:
arg1, arg2, arg3,... : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
scale : array_like, optional
Scale parameter (default=1).
size : int or tuple of ints, optional
Defining number of random variates (default is 1).
and the result is
rvs : ndarray or scalar
Random variates of given size.
I'm guessing invgamma.rvs is the similar method for a different subclass. alpha must be the shape argument for the first, and norm_mean the shape for the 2nd.
I am trying to implement k-mean clustering algorithm for small project. I came upon this article which suggest that
K-Means is much faster if you write the update functions using operations on numpy arrays, instead of manually looping over the arrays and updating the values yourself.
I am exactly using iteration over each element of array to update it. For each element in dataset z, I am assigning the cluster array from nearest centroid via iteration through each element.
for i in range(z):
clstr[i] = closest_center(data[i], cen)
and my update function is
def closest_center(x, clist):
dlist = [fabs(x - i) for i in clist]
return clist[dlist.index(min(dlist))]
Since I am using grayscale image, I am using absolute value to calculate the Euclidean distance.
I noticed that opencv has this algorithm too. It takes less than 2s to execute the algorithm while mine takes more than 70s. May I know what the article is suggesting?
My images are imported as gray scale and is represented as 2d numpy array. I further converted into 1d array because it's easier to process 1d array.
The list comprehension is likely to slow down execution. I would suggest to vectorize the function closest_center. This is straightforward for 1-dimensional arrays:
import numpy as np
def closest_center(x, clist):
return clist[np.argmin(np.abs(x - clist))]
numpy.linalg.svd function gives the full svd of the input matrix.
However I want only the first singular vectors.
I was wondering if there is any function in numpy for that or any other library in python?
One possibility is sklearn.utils.extmath.randomized_svd
from sklearn.utils.extmath import randomized_svd
U, S, Vt = randomized_svd(X, n_components=1)
In addition to randomized svd, you can run ARPACK on the squared problem, via scipy.sparse.linalg.svds.
I'm new to theano and I'm trying to adapt the autoencoder script here to work on text data. This code uses the MNIST dataset as training data. This data is in the form of a numpy 2d array.
My data is a csr sparse matrix of about 100,000 instances with about 50,000 features. The matrix is the result of using sklearn's tfidfvectorizer to fit and transform the text data. As I'm using sparse matrices I modify the code to use the theano.sparse package to represent my input.
My training set is the symbolic variable:
train_set_x = theano.sparse.shared(train_set)
However, theano.sparse matrices cannot perform all of the operations used in the original script (there is a list of sparse operations here). The code uses dot and sum from the tensor methods on the input. I have changed the dot to sparse.dot but I can't find out what to replace the sum with so I am converting the training batches to dense matrices and using the original tensor methods as shown in this cost function:
def get_cost(self):
tilde_x = self.get_corrupted_input(self.x, self.corruption)
y = self.get_hidden_values(tilde_x)
z = self.get_reconstructed_input(y)
#make dense, must be a better way to do this
L = - T.sum(SP.dense_from_sparse(self.x) * T.log(z) + (1 - SP.dense_from_sparse(self.x)) * T.log(1 - z), axis=1)
cost = T.mean(L)
return cost
def get_hidden_values(self, input):
# use theano.sparse.dot instead of T.dot
return T.nnet.sigmoid(theano.sparse.dot(input, self.W) + self.b)
The get_corrupted_input and get_reconstructed_input methods remain as they are in the link above. My question is is there a faster way to do this?
Converting the matrices to dense is making running the training very slow. Currently it takes 20.67m to do one training epoch with a batch size of 20 training instances.
Any help or tips you could give would be greatly appreciated!
In the most recent master branch of theano.sparse there is an sp_sum method listed.
(see here)
If you're not using the bleeding edge version I'd install that and see if calling it will work and if doing so speeds things up:
pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git
(And if it does, noting it here would be nice, it's not always clear that the sparse functionality is much faster than using dense calculations all the way through, especially on the gpu.)