Implementing CRF-CNN model in python

Implementing CRF-CNN model in python - python

I am trying to implement a research paper that uses CNN and CRF for page object detection. According to the research paper we have to to build two neural network (named unary and pairwise). Then the training data (set of images) are passed and both the CNNs are trained. After that we are supposed to apply CRF.
Following are the equations for CRF:
U and V are unary and pairwise potentials obtained from the CNNs using the following equations:
Maximum a posteriori (MAP) strategy to predict the labels of line regions given a new document. MAP inference of CRFs can be formulated as the following optimization problem:
The parameters of our CRFs include Unary-Net's weights and Pairwise-Net's weights and a combination coefficient vector λ of U and V. weights of U and V (w) are learned using the SGD method. Then they are fixed and λ is learned using the Pseudo Likelihood method.
I have created the neural networks but I am not able to implement the CRF part. Can someone help me implement this or suggest a python library that makes it easier to implement. (I have tried a python library pystruct but could not install it)

Related

Adding a "closeness" estimate to the output of a neural network model

I have a neural network which maps a set of 4 floating-point input parameters to a set of 10 floating point outputs trained on a dataset of ~300 points. The points themselves are intrinsically multi-modal, and there are some sparse areas in the training set that I don't currently have any good way to gather data for (although in real-world deployment they will eventually be encountered).
The data itself trained as-expected during training (test-split loss value was uniformly decreasing during the training, and the errors are all within acceptable levels). So I believe these model is mapping the variables well against each other. However, I'm concerned over how well 'generalized' the model is within areas where it doesn't have training data.
So I'm looking to add an additional output to the model to provide a "closeness" estimate to the training points. My implementation currently is to use scipy to calculate a gauissian KDE from the 4 parameters and then check the points closeness to the training space based on that. Then, in deployment, I return a warning/error if the inputs are too far from the space the model was trained on. This works okay but I have to pass the entire test "X" set around with the model which is a little inconvenient and kludgy.
Is there a way to embed this closeness estimate in the model itself? Or is there any more formalized way to handle this (ex., to give a "confidence" estimate in the model output)?

I think you want to relook at loss value of your model. At its core, a loss function is a measure of how good your prediction model does in terms of being able to predict the expected outcome(or value). We convert the learning problem into an optimization problem, define a loss function and then optimize the algorithm to minimize the loss function.
The loss value gives you this closeness estimate between data and
model output.
This loss value can be accessed in the history object returned while fitting the tensorflow model.
>>>history = model.fit(np.arange(100).reshape(5, 20), np.zeros(5),
epochs=10, verbose=1)
>>>print(history.history.keys())
dict_keys(['loss'])
If you want closeness estimate between two different models, you need KL Divergence(Kullback–Leibler divergence) or relative entropy.
From wiki
KL Divergence is a type of statistical distance: a measure of how one
probability distribution P is different from a second, reference
probability distribution Q.A simple interpretation of the KL
divergence of P from Q is the expected excess surprise from using Q as
a model when the actual distribution is P.
In the simple case, a relative entropy of 0 indicates that the two distributions in question have identical quantities of information.
KLDivergence is used to distill knowledge of a teacher model into a student model and check if both of them have identical quantities of information.
Trivia: Knowledge distillation is used to compress deep learning model size with little bit compromise in quality.
KLDivergence can be directly imported in tensorflow as follows:
from tensorflow.keras.losses import KLDivergence
You can check a full fledged KD implementation using keras in https://keras.io/examples/vision/knowledge_distillation/

Is there a differentiable algorithm for image quantization?

I am implementing an autoencoder, used to rebuild color images. The loss function I want to use requires a reduced color set (max ~100 different colors) but I am struggling to find a suitable differentiable algorithm.
Another doubt I have is the following: is it better to apply such quantization directly in the loss function, or can I implement it in a custom non-trainable layer? In the second case, need the algorithm to be differentiable?
My first idea approaching this problem was to quantize the images before feeding them to the network, but I don`t know how to "force" the network to produce only the quantized colors as output.
Any suggestion is greatly appreciated, I do not need code, just some ideas or new perspectives. Being pretty new to Tensorflow I am probably missing something.

If you want to compress the image, it seems you want to find discrete color set for image compression. In that case auto-encoder is not suitable approach for image compression.
The general auto-encoder compress tensor of images(B x C x H x W) to latent code of each images(B x D, typically D = 512). The beauty of this approach is that the optimal latent space is found 'automatically'.
Nevertheless if you want to utilize convex optimization tool of tensorflow, some continuous relaxation technique like interpolation could be helpul.
In the following paper, they utilize continuous relaxation for discrete path selection of neural network.
Liu, H., Simonyan, K., & Yang, Y. (2018). Darts: Differentiable architecture search. ICLR.
In the following paper, they utilize interpolation to learn quantized kernel bank on look-up table.
Jo, Y., & Kim, S. J. (2021). Practical single-image super-resolution using look-up table. CVPR.
Both of them provide codes.

Neural network and the law of large numbers

I am struggling to implement the following function in python, which holds by the law of large numbers:
where ANN stands for artificial neural network.
I have created a sample from where I have several subsamples. I want to feed each subsample at a time, increaigly, to train a neural network. That implies I will have a neural network for each subsample:
ANN((X_t,N,\theta_1,1)+ANN(X_t,N,\theta_2,2)+....
And each needs to be incorporated in a sum.
However I have no idea on how to implement this, once I would need to store, not the values but the neural network itself after each computation. Is there any references on how to solve a problem of this kind? I have looked at the recurrent neural networks implemented in Python, namely the LSTM, but that does not "store" each neural network, furthemore it selects the variablles that are more meaningful across time.
Thanks in advance.

By invoking (artificial) neural networks and the Central Limit Theorem you step into quite a few concepts. Let me try to elaborate on these concepts before trying to suggest a solution.
First, the fact that
holds P-almost surely for a family of random variables X_{1},X_{2},... that are iid (independently and identically distributed) like the random variable X is called the Strong Law of Large Numbers (LLN). In contrast, the Central Limit Theorem (CLT) refers to the limiting distribution (as the name suggests) which is Gaussian. Both theorems require proper scaling, namely for the LLN
and for the CLT, respectively. Both theorems allow approximation through a finite sum of up to J summands which is what you attempt. However, equality is lost and approximate equality i.e. ≈ is appropriate. Moreover, there is no normalization in your summation which will cause the term to diverge. Note that the limits hold for certain functions being applied to X. You assume that the function ANN(X_t, N, Θ, j).
Second, the (artificial) neural network. Like any statistical model, a neural network takes in data input X, hyperparameters that determine the network architecture (e.g. depth and size of the involved layers) that might be N in your case, and a parameter vector Θ. The latter is only obtained after the model has been trained on data. In turn, I'd interpret your function
def ANN(X_t, N, Θ)
as the inference function that compiles a previously trained neural network by combining hyperparameter value N the parameter vector Θ and applies it to the current data input X_{t}. However, you don't clarify what the input j is. j and Θ_j seem to suggest a recurrent neural network (RNN). An LSTM is a special type of RNN. However, it is unclear what the inputs actually are as you leave this vague. RNNs are used on speech, text, and numeric time-series data. This is further complicated by the fact that $X_{t}$ is on the left-hand side in the expectation and on the right-hand side as the input to the neural network.
Finally, the suggested solution. If the ANNs are in fact independent and you meant to write E(Y), then your equation vaguely describes ensemble learning. There, several neural networks (of the same architecture) are trained on the same dataset and their prediction is averaged (not summed) to gain a more accurate prediction of the expectation of Y. If, on the other hand, you do describe RNNs, the equation above for E(X) vaguely describes a convergence of non-independent random variables as X_{t+1} and Θ_{t+1} depend on the previous X_t's and Θ_t's. Intuitively, you try to show that the output of an RNN converges to some numeric value when applied iteratively. Mathematically speaking, there are LLM-like results for non-iid random variables but they impose other very specific assumptions e.g. on the type of dependence.
Regarding storing neural networks. You can implement your own ANN program which is a lot of work (as it requires training and inference functions). Virtually every deep learning framework in Python allows storing/loading a parameter vector Θ which would allow you to implement your procedure regardless of what mathematical meaning you'd like to derive from it. In keras, for example, a model can be saved via
model.save(PARAMETER_PATH)
and later re-loaded via
keras.models.load_model(PARAMETER_PATH)
see the reference. Similar methods exist for PyTorch another very popular deep learning framework in Python.

Recommendation for Best Neural Network Type (in TensorFLow or PyTorch) For Fitting Problems

I am looking to develop a simple Neural Network in PyTorch or TensorFlow to predict one numeric value based on several inputs.
For example, if one has data describing the interior comfort parameters for a building, the NN should predict the numeric value for the energy consumption.
Both PyTorch or TensorFlow documented examples and tutorials are generally focused on classification and time dependent series (which is not the case). Any idea on which NN available in those libraries is best for this kind of problems? I'm just looking for a hint about the type, not code.
Thanks!

The type of problem you are talking about is called a regression problem. In such types of problems, you would have a single output neuron with a linear activation (or no activation). You would use MSE or MAE to train your network.
If your problem is time series(where you are using previous values to predict current/next value) then you could try doing multi-variate time series forecasting using LSTMs.
If your problem is not time series, then you could just use a vanilla feed forward neural network. This article explains the concepts of data correlation really well and you might find it useful in deciding what type of neural networks to use based on the type of data and output you have.

Getting some sort of Math Formula from a Machine Learning trained model

I already asked this question here: Can Convolutional Neural Networks (CNN) be represented by a Mathematical formula? but I feel that I was not clear enough and also the proposed idea did not work for me.
Let's say that using my computer, I train a certain machine learning algorithm (i.e. naive bayes, decision tree, linear regression, and others). So I already have a trained model which I can give a input value and it returns the result of the prediction (i.e. 1 or 0).
Now, let's say that I still want to give an input and get a predicted output. However, at this time I would like that my input value to be, for example, multiplied by some sort of mathematical formula, weights, or matrix that represents my "trained model".
In other words, I would like that my trained model "transformed" in some sort of formula which I can give an input and get the predicted number.
The reason why I want to do this is because I wanna train a big dataset and use complex prediction model. And use this trained prediciton model in simpler hardwares such as a PIC32 microcontroler. The PIC32 Microntroler would not train the machine learning or store all inputs. Instead, the microcontroler would simple read from the system certain numbers, apply a math formula or some sort of matrix multiplication and give me the predicted output. With that, I can use "fancy" neural networks in much simpler devices that can easily operate math formulas.

If I read this properly, you want a generally continuous function in many variables to replace a CNN. The central point of a CNN existing in a world with ANNs ("normal" neural networks) is that in includes irruptive transformations: non-linearities, discontinuities, etc. that enable the CNN to develop recognitions and relationships that simple linear combinations -- such as matrix multiplication -- cannot handle.
If you want to understand this better, I recommend that you choose an introduction to Deep Learning and CNNs in whatever presentation mode fits your learning styles.

Essentially, every machine learning algorithm is a parameterized formula, with your trained model being the learned parameters that are applied to the input.
So what you're actually asking is to simplify arbitrary computations to, more or less, a matrix multiplication. I'm afraid that's mathematically impossible. If you ever do come up with a solution to this, make sure to share it - you'd instantly become famous, most likely rich, and put a hell of a lot of researchers out of business. If you can't train a matrix multiplication to get the accuracy you want from the start, what makes you think you can boil down arbitrary "complex prediction models" to such simple computations?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.