Neural network and the law of large numbers

Neural network and the law of large numbers - python

I am struggling to implement the following function in python, which holds by the law of large numbers:
where ANN stands for artificial neural network.
I have created a sample from where I have several subsamples. I want to feed each subsample at a time, increaigly, to train a neural network. That implies I will have a neural network for each subsample:
ANN((X_t,N,\theta_1,1)+ANN(X_t,N,\theta_2,2)+....
And each needs to be incorporated in a sum.
However I have no idea on how to implement this, once I would need to store, not the values but the neural network itself after each computation. Is there any references on how to solve a problem of this kind? I have looked at the recurrent neural networks implemented in Python, namely the LSTM, but that does not "store" each neural network, furthemore it selects the variablles that are more meaningful across time.
Thanks in advance.

By invoking (artificial) neural networks and the Central Limit Theorem you step into quite a few concepts. Let me try to elaborate on these concepts before trying to suggest a solution.
First, the fact that
holds P-almost surely for a family of random variables X_{1},X_{2},... that are iid (independently and identically distributed) like the random variable X is called the Strong Law of Large Numbers (LLN). In contrast, the Central Limit Theorem (CLT) refers to the limiting distribution (as the name suggests) which is Gaussian. Both theorems require proper scaling, namely for the LLN
and for the CLT, respectively. Both theorems allow approximation through a finite sum of up to J summands which is what you attempt. However, equality is lost and approximate equality i.e. ≈ is appropriate. Moreover, there is no normalization in your summation which will cause the term to diverge. Note that the limits hold for certain functions being applied to X. You assume that the function ANN(X_t, N, Θ, j).
Second, the (artificial) neural network. Like any statistical model, a neural network takes in data input X, hyperparameters that determine the network architecture (e.g. depth and size of the involved layers) that might be N in your case, and a parameter vector Θ. The latter is only obtained after the model has been trained on data. In turn, I'd interpret your function
def ANN(X_t, N, Θ)
as the inference function that compiles a previously trained neural network by combining hyperparameter value N the parameter vector Θ and applies it to the current data input X_{t}. However, you don't clarify what the input j is. j and Θ_j seem to suggest a recurrent neural network (RNN). An LSTM is a special type of RNN. However, it is unclear what the inputs actually are as you leave this vague. RNNs are used on speech, text, and numeric time-series data. This is further complicated by the fact that $X_{t}$ is on the left-hand side in the expectation and on the right-hand side as the input to the neural network.
Finally, the suggested solution. If the ANNs are in fact independent and you meant to write E(Y), then your equation vaguely describes ensemble learning. There, several neural networks (of the same architecture) are trained on the same dataset and their prediction is averaged (not summed) to gain a more accurate prediction of the expectation of Y. If, on the other hand, you do describe RNNs, the equation above for E(X) vaguely describes a convergence of non-independent random variables as X_{t+1} and Θ_{t+1} depend on the previous X_t's and Θ_t's. Intuitively, you try to show that the output of an RNN converges to some numeric value when applied iteratively. Mathematically speaking, there are LLM-like results for non-iid random variables but they impose other very specific assumptions e.g. on the type of dependence.
Regarding storing neural networks. You can implement your own ANN program which is a lot of work (as it requires training and inference functions). Virtually every deep learning framework in Python allows storing/loading a parameter vector Θ which would allow you to implement your procedure regardless of what mathematical meaning you'd like to derive from it. In keras, for example, a model can be saved via
model.save(PARAMETER_PATH)
and later re-loaded via
keras.models.load_model(PARAMETER_PATH)
see the reference. Similar methods exist for PyTorch another very popular deep learning framework in Python.

Related

How to use TF Adam to optimize multiple outputs from the same neural-network; but with a varying linear transformation?

So this is a tough question to ask, as i dont have examples of code.
I have a task where the same neural network predicts a time-series characteristic(say a single sigmoid activation) for single time period ahead. So you have several areas with information at time t say A(t),B(t),C(t) -- this same neural network will take in some past information and give you various outputs Y(A;t),Y(B;t).Y(C;t).
You want to take the time based outputs of this neural network for various inputs, and at each output apply a time-varying but known linear transformation. At this point you have a collection of outputs depending on the network parameters, time-dependent inputs and linear transformations.
After this you basically you want to collect a number of these outputs over a period of a month. You then compute a value X based on all of these outputs (after their linear transformation) for that month.
I now want to use ADAM to optimize the weights of this neural network(the same network is used each time). I have something that depends on the outputs of a lot of different fixed neural network inputs, and a lot of different fixed linear transformations.
So i am not sure how to use tensorflow ADAM to change the weights, if you can simply use the model.output or predict in a custom version of the optimizer and it will understand this is a function of the weights.
Sorry if this is hard to understand, i am finding it hard to explain and i dont have code for it as that would make the question redundant.

Recommendation for Best Neural Network Type (in TensorFLow or PyTorch) For Fitting Problems

I am looking to develop a simple Neural Network in PyTorch or TensorFlow to predict one numeric value based on several inputs.
For example, if one has data describing the interior comfort parameters for a building, the NN should predict the numeric value for the energy consumption.
Both PyTorch or TensorFlow documented examples and tutorials are generally focused on classification and time dependent series (which is not the case). Any idea on which NN available in those libraries is best for this kind of problems? I'm just looking for a hint about the type, not code.
Thanks!

The type of problem you are talking about is called a regression problem. In such types of problems, you would have a single output neuron with a linear activation (or no activation). You would use MSE or MAE to train your network.
If your problem is time series(where you are using previous values to predict current/next value) then you could try doing multi-variate time series forecasting using LSTMs.
If your problem is not time series, then you could just use a vanilla feed forward neural network. This article explains the concepts of data correlation really well and you might find it useful in deciding what type of neural networks to use based on the type of data and output you have.

Getting some sort of Math Formula from a Machine Learning trained model

I already asked this question here: Can Convolutional Neural Networks (CNN) be represented by a Mathematical formula? but I feel that I was not clear enough and also the proposed idea did not work for me.
Let's say that using my computer, I train a certain machine learning algorithm (i.e. naive bayes, decision tree, linear regression, and others). So I already have a trained model which I can give a input value and it returns the result of the prediction (i.e. 1 or 0).
Now, let's say that I still want to give an input and get a predicted output. However, at this time I would like that my input value to be, for example, multiplied by some sort of mathematical formula, weights, or matrix that represents my "trained model".
In other words, I would like that my trained model "transformed" in some sort of formula which I can give an input and get the predicted number.
The reason why I want to do this is because I wanna train a big dataset and use complex prediction model. And use this trained prediciton model in simpler hardwares such as a PIC32 microcontroler. The PIC32 Microntroler would not train the machine learning or store all inputs. Instead, the microcontroler would simple read from the system certain numbers, apply a math formula or some sort of matrix multiplication and give me the predicted output. With that, I can use "fancy" neural networks in much simpler devices that can easily operate math formulas.

If I read this properly, you want a generally continuous function in many variables to replace a CNN. The central point of a CNN existing in a world with ANNs ("normal" neural networks) is that in includes irruptive transformations: non-linearities, discontinuities, etc. that enable the CNN to develop recognitions and relationships that simple linear combinations -- such as matrix multiplication -- cannot handle.
If you want to understand this better, I recommend that you choose an introduction to Deep Learning and CNNs in whatever presentation mode fits your learning styles.

Essentially, every machine learning algorithm is a parameterized formula, with your trained model being the learned parameters that are applied to the input.
So what you're actually asking is to simplify arbitrary computations to, more or less, a matrix multiplication. I'm afraid that's mathematically impossible. If you ever do come up with a solution to this, make sure to share it - you'd instantly become famous, most likely rich, and put a hell of a lot of researchers out of business. If you can't train a matrix multiplication to get the accuracy you want from the start, what makes you think you can boil down arbitrary "complex prediction models" to such simple computations?

What is the key feature in MNIST Dataset that is used to classify images

I was recently learning about neural networks and came across MNIST data set. i understood that a sigmoid cost function is used to reduce the loss. Also, weights and biases gets adjusted and an optimum weights and biases are found after the training. the thing i did not understand is, on what basis the images are classified. For example, to classify whether a patient has cancer or not, data like age, location, etc., becomes features. in MNIST dataset, i did not find any of that. Am i missing something here. Please help me with this

First of all the Network pipeline consists of 3 main parts:
Input Manipulation:
Parameters that effect the finding of minimum:
Parameters like your descission function in your interpretation
layer (often fully connected layer)
In contrast to your regular machine learning pipeline where you have to extract features manually a CNN uses filters. (Filters like in edge detection or viola and jones).
If a filter runs across the images and is convolved with pixels it Produces an output.
This output is then interpreted by a neuron. If the output is above a threshold it is considered as valid (Step function counts 1 if valid or in case of Sigmoid it has a value on the sigmoid function).
The next steps are the same as before.
This is progressed until the interpretation layer (often softmax). This layer interprets your computation (if the filters are good adapted to your problem you will get a good predicted label) which means you have a low difference between (y_guess - y_true_label).
Now you can see that for the guess of y we have multiplied the input x with many weights w and also used functions on it. This can be seen like a chain rule in analysis.
To get better results the effect of a single weight on the input must be known. Therefore, you use Backpropagation which is a derivative of the Error with respect to all w. The Trick is that you can reuse derivatives which is more or less Backpropagation and it becomes easier since you can use Matrix vector notation.
If you have your gradient, you can use the normal concept of minimization where you walk along the steepest descent. (There are also many other gradient methods like adagrad or adam etc).
The steps will repeat until convergence or until you reach the maximum epochs.
So the answer is: THE COMPUTED WEIGHTS (FILTERS) ARE THE KEY TO DETECT NUMBERS AND DIGITS :)

implementing a perceptron classifier

Hi I'm pretty new to Python and to NLP. I need to implement a perceptron classifier. I searched through some websites but didn't find enough information. For now I have a number of documents which I grouped according to category(sports, entertainment etc). I also have a list of the most used words in these documents along with their frequencies. On a particular website there was stated that I must have some sort of a decision function accepting arguments x and w. x apparently is some sort of vector ( i dont know what w is). But I dont know how to use the information I have to build the perceptron algorithm and how to use it to classify my documents. Have you got any ideas? Thanks :)

How a perceptron looks like
From the outside, a perceptron is a function that takes n arguments (i.e an n-dimensional vector) and produces m outputs (i.e. an m-dimensional vector).
On the inside, a perceptron consists of layers of neurons, such that each neuron in a layer receives input from all neurons of the previous layer and uses that input to calculate a single output. The first layer consists of n neurons and it receives the input. The last layer consist of m neurons and holds the output after the perceptron has finished processing the input.
How the output is calculated from the input
Each connection from a neuron i to a neuron j has a weight w(i,j) (I'll explain later where they come from). The total input of a neuron p of the second layer is the sum of the weighted output of the neurons from the first layer. So
total_input(p) = Σ(output(k) * w(k,p))
where k runs over all neurons of the first layer. The activation of a neuron is calculated from the total input of the neuron by applying an activation function. An often used activation function is the Fermi function, so
activation(p) = 1/(1-exp(-total_input(p))).
The output of a neuron is calculated from the activation of the neuron by applying an output function. An often used output function is the identity f(x) = x (and indeed some authors see the output function as part of the activation function). I will just assume that
output(p) = activation(p)
When the output off all neurons of the second layer is calculated, use that output to calculate the output of the third layer. Iterate until you reach the output layer.
Where the weights come from
At first the weights are chosen randomly. Then you select some examples (from which you know the desired output). Feed each example to the perceptron and calculate the error, i.e. how far off from the desired output is the actual output. Use that error to update the weights. One of the fastest algorithms for calculating the new weights is Resilient Propagation.
How to construct a Perceptron
Some questions you need to address are
What are the relevant characteristics of the documents and how can they be encoded into an n-dimansional vector?
Which examples should be chosen to adjust the weights?
How shall the output be interpreted to classify a document? Example: A single output that yields the most likely class versus a vector that assigns probabilities to each class.
How many hidden layers are needed and how large should they be? I recommend starting with one hidden layer with n neurons.
The first and second points are very critical to the quality of the classifier. The perceptron might classify the examples correctly but fail on new documents. You will probably have to experiment. To determine the quality of the classifier, choose two sets of examples; one for training, one for validation. Unfortunately I cannot give you more detailed hints to answering these questions due to lack of practical experience.

I think that trying to solve an NLP problem with a Neural Network when you're not familiar with either might be a step too far. That you're doing it in a new language is the least of your worries.
I'll link you to my Neural Computation module slides that gets taught at my university. You'll want the slides from session 1 and session 2 in week 2. Right at the bottom of the page is a link to how to implement a neural network in C. With a few modifications should be able to port it to python. You should note that it details how to implement a multilayer perceptron. You only need to implement a single layer perceptron, so ignore anything that talks about hidden layers.
A quick explanation of x and w. Both x and w are vectors. x is the input vector. x contains normalised frequencies for each word you are concerned about. w contains weights for each word you are concerned with. The perceptron works by multiplying the input frequency for each word by its respective weight and summing them up. It passes the result to a function (typically a sigmoid function) that turns the result into a value between 0 and 1. 1 means the perceptron is positive that the inputs are an instance of the class it represents and 0 means it is sure that the inputs really aren't an example of its class.
With NLP you typically learn about the bag of words model first, before moving on to other, more complex, models. With a neural network, hopefully, it will learn its own model. The problem with this is that the neural network will not give you much of an understanding of NLP, other than documents can be classified by the words they contain, and that usually the number and type of words in a document contains most of the information you need to classify a document -- context and grammar do not add much extra detail.
Anyway, I hope that gives a better place from which to start your project. If you're still stuck on a particular part then ask again and I'll do my best to help.

You should take a look at this survey paper on text classification by Frabizio Sebastiani. It tells you all of the best ways to do text classification.
Now, I'm not going to bother you to read the whole thing, but there's one table near the end, where he compares how lots of different people's techniques stack up on lots of different test corpora. Find it, pick the best one (the best perceptron one, if you assignment is specifically to learn how to do this with perceptron), and read the paper he cites that describes that method in detail.
You now know how to construct a good topical text classifier.
Turning the algorithm that Oswald gave you (and that you posted in your other question) into code is a Small Matter of Programming (TM). And if you encounter unfamiliar terms like TF-IDF while you're working, ask your teacher to help you by explaining those terms.

MultiLayer perceptrons (A specific NeuralNet architecture for general classification problem.) Now available for Python from the GraphLab folks:
https://dato.com/products/create/docs/generated/graphlab.deeplearning.MultiLayerPerceptrons.html#graphlab.deeplearning.MultiLayerPerceptrons

I had a try at implementing something similar the other day. I made some code to recognize english looking text vs non-english. I hadn't done AI or statistics in many years, so it was a bit of a shotgun attempt.
My code is here (don't want to bloat the post): http://cnippit.com/content/perceptron-statistically-recognizing-english
Inputs:
I take a text file, split it up into
tri-grams (eg "abcdef" => ["abc",
"bcd", "cde", "def"])
I calculate the relative frequencies of each, and feed that as the inputs to the perceptron (so there are 26^3 inputs)
Despite me not really knowing what I was doing, it seems to work fairly well. The success depends quite heavily on the training data though. I was getting poor results until I trained it on more french/spanish/german text etc.
It's a very small example though, with lots of "lucky guesses" at values (eg. initial weights, bias, threshold, etc.).
Multiple classes:
If you have multiple classes you want to distinquish between (ie. not as simple as "is A or NOT-A"), then one approach is to use a perceptron for each class. Eg. one for sport, one for news, etc.
Train the sport-perceptron on data grouped as either sport or NOT-sport. Similar for news or Not-news, etc.
When classifying new data, you pass your input to all perceptrons, and whichever one returns true (or "fires"), then that's the class the data belongs to.
I used this approach way back in university, where we used a set of perceptrons for recognizing handwritten characters. It's simple and worked pretty effectively (>98% accuracy if I recall correctly).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.