I am trying to derive the conditional distribution of the visible variables, , for the Replicated Softmax Model (RSM) or equivalently, the Restricted Boltzmann Machine (RBM) for word counts, according to the paper: "Replicated Softmax: an Undirected Topic Model" by Salakhutdinov and Hinton.
Paper can be found at: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=B04C8D67D381B8106FF6FA4203A86264?doi=10.1.1.164.71&rep=rep1&type=pdf
However, despite all efforts, I've been unable to get how the conditional can turn out to be a softmax distribtution:
Also, I'm confused if is a 3D matrix and a 2D matrix or is it instead a 2D matrix and vector respectively. I believe it is the latter. Hoping someone can demonstrate the derivations.
I am looking to implement the RSM to do topic modelling in python's theano. I am aware that there are codes out there but I prefer to understand the derivation myself so that I can extend or optimize the codes without the risk of breaking the model.
p.s. apologies, this is a repost of https://math.stackexchange.com/questions/2085616/rbm-deriving-the-replicated-softmax-model-rsm but i did so as aren't as many mathstackexchange users.
After sometime I found out where I misunderstood things and managed to derive the equations. Please refer to math.stackexchange:
https://math.stackexchange.com/questions/2085616/rbm-deriving-the-replicated-softmax-model-rsm/2087272#2087272
Related
I already asked this question here: Can Convolutional Neural Networks (CNN) be represented by a Mathematical formula? but I feel that I was not clear enough and also the proposed idea did not work for me.
Let's say that using my computer, I train a certain machine learning algorithm (i.e. naive bayes, decision tree, linear regression, and others). So I already have a trained model which I can give a input value and it returns the result of the prediction (i.e. 1 or 0).
Now, let's say that I still want to give an input and get a predicted output. However, at this time I would like that my input value to be, for example, multiplied by some sort of mathematical formula, weights, or matrix that represents my "trained model".
In other words, I would like that my trained model "transformed" in some sort of formula which I can give an input and get the predicted number.
The reason why I want to do this is because I wanna train a big dataset and use complex prediction model. And use this trained prediciton model in simpler hardwares such as a PIC32 microcontroler. The PIC32 Microntroler would not train the machine learning or store all inputs. Instead, the microcontroler would simple read from the system certain numbers, apply a math formula or some sort of matrix multiplication and give me the predicted output. With that, I can use "fancy" neural networks in much simpler devices that can easily operate math formulas.
If I read this properly, you want a generally continuous function in many variables to replace a CNN. The central point of a CNN existing in a world with ANNs ("normal" neural networks) is that in includes irruptive transformations: non-linearities, discontinuities, etc. that enable the CNN to develop recognitions and relationships that simple linear combinations -- such as matrix multiplication -- cannot handle.
If you want to understand this better, I recommend that you choose an introduction to Deep Learning and CNNs in whatever presentation mode fits your learning styles.
Essentially, every machine learning algorithm is a parameterized formula, with your trained model being the learned parameters that are applied to the input.
So what you're actually asking is to simplify arbitrary computations to, more or less, a matrix multiplication. I'm afraid that's mathematically impossible. If you ever do come up with a solution to this, make sure to share it - you'd instantly become famous, most likely rich, and put a hell of a lot of researchers out of business. If you can't train a matrix multiplication to get the accuracy you want from the start, what makes you think you can boil down arbitrary "complex prediction models" to such simple computations?
For a project I am working on, I need to find a model for the data graphed below that includes a sine or cosine component (hard to tell from the image but the data does follow a trig-like function for each period, although the amplitude/max/mins are changing).
data
I originally planned on finding a simple regression model for my data using Desmos before I saw how complex the data was, but alas, I do not think I am capable of determining what equation to use without the help of Python. I don't have much experience with regression in Python, I've only done basic linear modeling where I knew the type of equation and was just determining the coefficients/constants. Could anyone offer a guiding example, git code, or resources that would be useful for this?
Your question is pretty generic and looking at the graph, we cannot tell much about the data to give you a more detailed answer, but i'd say have a look at OLS
https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html
You could also look at scikit learn for the various regression models it provides.
http://scikit-learn.org/stable/modules/linear_model.html
Essentially,these packages will help you figure our the equation you are looking to have for your data.
Also, looks like your graph has an outlier ? Please note regression is very sensitive to outliers, so you may want to handle those data points before fitting the model.
I would like to get the gradient of tf.cholesky with respect to its input. As of the moment, the tf.cholesky does not have a registered gradient:
LookupError: No gradient defined for operation 'Cholesky' (op type: Cholesky)
The code used to generate this error is:
import tensorflow as tf
A = tf.diag(tf.ones([3]))
chol = tf.cholesky(A)
cholgrad = tf.gradients(chol, A)
While it is possible for me to compute the gradient myself and register it, the only existing means by which I've seen the Cholesky gradient computed involved the use of for loops and needs the shape of the input matrix. However, to the best of my knowledge, symbolic loops aren't currently available to TensorFlow.
One possible workaround to getting the shape of the input matrix A would probably be to use:
[int(elem) for elem in list(A.get_shape())]
But this approach doesn't work if the dimensions of A is dependent on a TensorFlow placeholder object with shape TensorShape([Dimension(None)]).
If anyone has any idea for how to compute and register a gradient of tf.cholesky, I would very much appreciate knowing about it.
We discussed this a bit in the answers and comments to this question: TensorFlow cholesky decomposition.
It might (?) be possible to port the Theano implementation of CholeskyGrad, provided its semantics are actually what you want. Theano's is based upon Smith's "Differentiation of the Cholesky Algorithm".
If you implement it as a C++ operation that the Python just calls into, you have unrestricted access to all the looping constructs you could desire, and anything Eigen provides. If you wanted to do it in pure tensorflow, you could use the control flow ops, such as tf.control_flow_ops.While to loop.
Once you know the actual formula you want to apply, the answer here: matrix determinant differentiation in tensorflow
shows how to implement and register a gradient for an op in tensorflow.
You could also create an issue on github to request this feature, though, of course, you'll probably get it faster if you implement it yourself and then send in a pull request. :)
Is there a way to have an x,y pair dataset given to a function that will return a list of curve fit models and the coeff. The program DataFit does this with about 200 different models, but we are looking for a pythonic way. From exponential to inverse polynomial etc.
I have seen many posts of manually using scipy to type each model, but this is not feasible for the number of models we want to test.
The closest I found was pyeq2, but this is not returning the list of functions, and seems to be a rabbit hole to code for.
If R has this available, we could use that but python is really the goal
Below is an example of the data, we want to find the best way to describe this curve
You can try library splines in R. I have used this for higher order curve fitting to some univariate data. You can try to change and achieve similar thing with corresponding R^2 errors.
You can either decide to do the following:
Choose a model to fit a parameters. This model should be based on a single independent variable. This can be done by python's scipy.optimize curve_fit function. You can choose something like a hyberbola.
Choose a model that is complex and likely represents an underlying mechanism of something at work. Like the system of ODE's from a disease SIR model. Fitting the parameters will be no easy task. This will be done by Markov Chain Monte Carlo (MCMC) methods. This is VERY difficult.
Realise that you have data and can use machine learning via scikit learn to predict from your data. This is a method that doesn't require parameters.
Machine learning and neural networks don't fit something and can't really tell you about the underlying mechanism but can make predicitions just as a best fit model would...dare I say even better.
In the end, we found that Eureqa software was able to achieve this. https://www.nutonian.com/products/eureqa/
I had some trouble finding a good transductive svm (semi-supervised support vector machine or s3vm) implementation for python. Finally I found the implementation of Fabian Gieseke of Oldenburg University, Germany (code is here: https://www.ci.uni-oldenburg.de/60506.html, paper title: Fast and Simple Gradient-Based Optimization for Semi-Supervised Support Vector Machines).
I now try to integrate the learned model into my scikit-learn code.
1) This works already:
I've got a binary classification problem. I defined a new method inside the S3VM-code returning the self.__c-coeficients (these are needed for the decision function of the classifier).
I then assign these (in my own scikit-code where clf stands for a svm.SVC-classifier) to clf.dual_coefs_ and properly change clf.support_ too (which holds the indices of the support vectors). It takes a while because sometimes you need numpy-arrays and sometimes lists etc. But it works quite well.
2) This doesnt work at all:
I want to adapt this now to a multi-class-classification problem (again with an svm.SVC-classifier).
I see the scheme for multi-class dual_coef_ in the docs at
http://scikit-learn.org/stable/modules/svm.html
I tried some things already but seem to mess it up all the time. My strategy is as follows:
for all pairs in classes:
calculate the coefficients with qns3vm for the properly binarized labeled training set (filling 0s into spaces in the coef-vector where instances have been in the labeled training set that are not in the current class-pair) --> get a 1x(l+u)-np.array of coefficients
horizontally stack these to get a (n_class*(n_class-1)/2)x(l+u) matrix | I do not have a clue why the specs say that this should be of shape [n_class-1, n_SV(=l+u)]?
replace clf.dual_coef_ with this matrix
Does anybody know the right way to replace dual_coef_ in the multi-class-setting? Or is there a neat piece of example code someone can recommend? Or at least a better explanation for the shape of dual_coef_ in the one-vs-one-multiclass-setting?
Thanks!
Damian