Initializing a bias term in my nonlinear regression model using TensorFlow - python

I am trying to make a basic nonlinear regression model that will predict the return index of companies in the FTSE350.
I am unsure as to what my bias term should look like in terms of dimensions and whether I am using it properly in the calculations method:
w1 = tf.Variable(tf.truncated_normal([4, 10], mean=0.0, stddev=1.0, dtype=tf.float64))
b1 = tf.Variable(tf.constant(0.1, shape=[4,10], dtype = tf.float64))
w2 = tf.Variable(tf.truncated_normal([10, 1], mean=0.0, stddev=1.0, dtype=tf.float64))
b2 = tf.Variable(tf.constant(0.1, shape=[1], dtype = tf.float64))
def calculations(x, y):
w1d = tf.matmul(x, w1)
h1 = (tf.nn.sigmoid(tf.add(w1d, b1)))
h1w2 = tf.matmul(h1, w2)
activation = tf.add(tf.nn.sigmoid(tf.matmul(h1, w2)), b2)
error = tf.reduce_sum(tf.pow(activation - y,2))/(len(x))
return [ activation, error ]
My initial thoughts were that it should be the same size as my weights but I get this error:
ValueError: Dimensions must be equal, but are 251 and 4 for 'Add' (op: 'Add') with input shapes: [251,10], [4,10]
I've played around with different ideas but don't seem to be getting anywhere.
(My input data has 4 features)
The network structure I have attempted is 4 neurons in the input layer, 10 in the hidden layer, and 1 in the output later but I feel like I may mixed up the dimensions in my weights layer too?

When you are constructing the layers for a feed-forward fully-connected neural network (like in your example), the shape of the biases should be equal to the number of nodes in the corresponding layer. So in your case, since your weight matrix has a shape of (4, 10), you have 10 nodes in that layer and you should be using:
b1 = tf.Variable(tf.constant(0.1, shape=[10], type = tf.float64))
The reason for this is when you do w1d = tf.matmul(x, w1), you are actually getting a matrix of shape (batch_size, 10) (if batch_size is the number of rows in your input matrix). This is because you are matrix multiplying a (batch_size, 4) matrix by a (4, 10) weight matrix. Then, you are adding a bias across each column of w1d, which can be represented as a 10-dimensional vector, which you would get if you made the shape of b1 [10].
Without the non-linearity (sigmoid) afterward, this is called an affine transformation, which you can read more about here: https://en.wikipedia.org/wiki/Affine_transformation.
Another fantastic resource is the Stanford Deep Learning Tutorial, which has a good explanation of how these feed-forward models work here:
http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/.
Hope that helped!

I think your b1 should just be of dimention 10 and your code should run
Since 4 is the number of features and 10 is the number of neurones in your first layer (i think in term of neural net ...)
then you must add a bias of dimention = 10
Also you might see the biases as adding an extra feature of constant value = 1.
see this pdf if you have time it expalin very well :https://cs.stanford.edu/~quocle/tutorial1.pdf

Related

Calculate Jacobian Matrix of LSTM Model - Python

I have a trained LSTM model with 1 LSTM Layer and 3 Dense layers. I am using it for a sequence to One prediction. I have 4 input variables and 1 output variable. I am using the values of the last 20 timesteps to predict the next value of my output variable. The architecture of the model is shown below
model = Sequential()
model.add(LSTM(units = 120, activation ='relu', return_sequences = False,input_shape =
(train_in.shape[1],5)))
model.add(Dense(100,activation='relu'))
model.add(Dense(50,activation='relu'))
model.add(Dense(1))
The shapes of training input and training output are as shown below
train_in.shape , train_out.shape
((89264, 20, 5), (89264,))
I want to calculate the jacobian matrix for this model.
Say, Y = f(x1,x2,x3,x4) is the representation of the above neural network where:
Y -- Output variable of the trained model, f -- Is the function representing the Model; x1,x2,x3,x4 --input parameters.
How can I calculate the Jacobian Matrix?? Please share your thoughts on this. Also any valuable references if you know any.
Thank you :)
you might want to take a look at tf.GradientTape in tensorflow. Gradient tape is very simple way to auto-differentiate your computation. And the link has some basic example.
However your model is already quite big. If you have n parameters, your jacobian will have n*n values. I believe your model probably already has more than 10000 parameters. You might need to make it smaller.
I found a way to get the Jacobian matrix for LSTM model output with respect to the input. I am posting it here so that it might help someone in the future. Please share if there is any better or more simple way to do the same
import numpy as np
import pandas as pd
import tensorflow as tf
tf.compat.v1.enable_eager_execution() #This will enable eager execution which is must.
tf.executing_eagerly() #check if eager execution is enabled or not. Should give "True"
data = pd.read_excel("FileName or Location ")
#My data is in the from of dataframe with 127549 rows and 5 columns(127549*5)
a = data[:20] #shape is (20,5)
b = data[50:70] # shape is (20,5)
A = [a,b] # making a list
A = np.array(A) # convert into array size (2,20,5)
At = tf.convert_to_tensor(A, np.float32) #convert into tensor
At.shape # TensorShape([Dimension(2), Dimension(20), Dimension(5)])
model = load_model('EKF-LSTM-1.h5') # Load the trained model
# I have a trained model which is shown in the question above.
# Output of this model is a single value
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
tape.watch(At)
y1 = model(At) #defining your output as a function of input variables
print(y1,type(y1)
#output
tf.Tensor([[0.04251503],[0.04634088]], shape=(2, 1), dtype=float32) <class
'tensorflow.python.framework.ops.EagerTensor'>
jacobian=tape.jacobian(y1,At) #jacobian of output w.r.t both inputs
jacobian.shape
Outupt
TensorShape([Dimension(2), Dimension(1), Dimension(2), Dimension(20), Dimension(5)])
Here I calculated Jacobian w.r.t 2 inputs each of size (20,5). If you want to calculate w.r.t to only one input of size (20,5), then use this
jacobian=tape.jacobian(y1,At[0]) #jacobian of output w.r.t only 1st input in 'At'
jacobian.shape
Output
TensorShape([Dimension(1), Dimension(1), Dimension(1), Dimension(20), Dimension(5)])

How do you implement neurons in ANN?

From what I understand about Neural Networks, you have a number of hidden layers which each consist of X neurons. A neuron takes in a number of inputs and prospective weights, then using an activation function (sigmoid in my case) gives an output.
My task is to implement a network from scratch (only using numpy), with 2 hidden layers, sigmoid activation function and 500 neurons in each hidden layer. What I don't understand is, how can I implement the concept of neurons? According to this article, one neuron is when all inputs are weighted and are passed into the activation function. So do I feed in the same inputs, 500 times, with different weights each time (in the first layers, then again in the second)? I've also read this topic, where the following is said:
The neuron is nothing more than a set of inputs, a set of weights, and an activation function. The neuron translates these inputs into a single output, which can then be picked up as input for another layer of neurons later on.
So according to this, I indeed should weigh the inputs differently, 500 times, and then pass these forward to the next layer which will do the same. Am I understanding this correctly?
Here is the code I have written so far (it is very elementary but I did not want to proceed further before I clear this up), but have no idea how I would be implementing this:
class NeuralNetwork:
def __init__(self, data, y, neurons, hidden):
self.input = data
self.y = y
self.output = np.zeros(y.shape)
self.layers = hidden
self.neurons = neurons
self.weights = self.generateWeightArray()
print(self.weights)
def generateWeightArray(self):
weightarr = []
#Last weight array is for inbetween hidden and output layer
for i in range(self.layers + 1):
weightarr.append(self.generateWeightMatrix())
return np.asarray(weightarr)
def generateWeightMatrix(self):
return np.random.rand(self.input.shape[0], self.input.shape[1]-1)
def sigmoid(self, x):
return 1/(1+np.exp(-x))
def dsigmoid(self, x):
return self.sigmoid(x)*(1-self.sigmoid(x))
def train(self):
pass
def run(self):
#Since between each layer we have a matrix of weights, we can just keep going for the number of hidden
#layers we have
for i in range(self.layers):
out = np.dot(self.input.transpose(), self.weights[i]).transpose() #step1
self.input = self.sigmoid(out) #step2
print(self.input)
net = NeuralNetwork(np.array([[1,2,3,4],[3,5,1,2],[5,6,7,8]]), np.array([1,0,1]), 500, 2)
net.run()
EDIT
I have changed my code as follows
class NeuralNetwork:
def __init__(self, data, y, neurons, hidden):
self.input = data
self.y = y
self.output = np.zeros(y.shape)
self.layers = hidden
self.neurons = neurons
self.weights_to_hidden = np.random.rand(self.neurons, self.input.shape[1])
self.weights = self.generateWeightArray()
self.weights_to_output = np.random.rand(self.neurons,1)
print(self.weights_to_output)
#Generate a matrix with h+1 weight matrices, where h is the number of hidden layers (+1 for output)
def generateWeightArray(self):
weightarr = []
#Last weight array is for inbetween hidden and output layer
for i in range(self.layers):
weightarr.append(self.generateWeightMatrix())
return np.asarray(weightarr)
#Generate a matrix with n columns and m rows, where n is the number of features and m is the number of neurons
#in the layer
def generateWeightMatrix(self):
return np.random.rand(self.neurons, self.neurons)
def sigmoid(self, x):
return 1/(1+np.exp(-x))
def dsigmoid(self, x):
return self.sigmoid(x)*(1-self.sigmoid(x))
def train(self):
#2 hidden layers, then hidden -> output layer
hidden_in = self.sigmoid(np.dot(self.input, self.weights_to_hidden.transpose()).transpose())
print("Going into hidden layer:")
print(hidden_in)
for i in range(self.layers):
in_hidden = self.sigmoid(np.dot(hidden_in.transpose(), self.weights[i]).transpose())
print("After ",str(i+1), " hidden layer:")
print(in_hidden)
print("Output")
out = self.sigmoid(np.dot(hidden_in.transpose(), self.weights_to_output).transpose())
print(out)
net = NeuralNetwork(np.array([[1,2,3,4],[3,5,1,2],[5,6,7,8]]), np.array([1,0,1]), 5, 2)
net.train()
And the output after running is
[[0.89405222 0.89501672 0.89717842]]
I'm not sure if self.weights_to_output has the correct shape though because its a (n,1), so all features (in each record) will have the same weight, rather than having 3 weights for each row (?)
The 'neurons' (usually called 'units' these days) are the activations in each layer. These are represented as a vector with one element for each unit. You will represent a layer's activations as a 1D array. So the short answer is that the neurons are elements in 1D arrays.
Let's look at the activation of one unit in layer 3 of a deep neural network:
a_3 = sigma(w_3 # a_2 + b_3)
So:
a_3 will be a scalar — the activation for this unit;
sigma is the activation function (e.g. logistic function, tanh, or ReLU)
w_3 is the vector of weights for this layer (one element for each unit of layer 2)
a_2 is the vector of activations for the previous layer (one element for each unit of layer 2)
b_3 is the bias for this unit.
Note that w_3 and a_2 are both 1D arrays (vectors). The # operator does matrix multiplication in Python 3.4+ and in this case it's going to perform the dot product.
Now, thanks to the magic of linear algebra, it turns out we don't need to loop over each unit to compute all the activations. If we let the weight vector w_3 be a matrix, W_3 (note the capital letter), then it can represent the weights for all units in layer 2, connecting to all units in layer 3. Then:
a_3 = sigma(W_3 # a_2 + b_3)
Now a_3 will be a vector.
The tricky part is keeping track of all the shapes. For W # a to work, the shapes must be compatible. For example, imagine we have a network with 2 units in layer 2 (so a_2 is a 1D array with 2 elements) and 3 units in layers 3 (so a_3 needs to have 3 elements). Now W_3 needs to be 3 × 2 and a_2 needs to be 2 × 1. Then the matrix multiply works. You can just use np.reshape() and np.transpose() to achieve the shapes you need.
I hope this helps... it's a lot of words.
Maybe this diagram (from this article) helps explain:
The diagram doesn't say how many records there are. There are 3 features per data instance (i.e. we have an M × 3 input matrix). The input layer is 'just' another layer, you can think of the inputs as just another set of activations. You could think of x as a_0 (the 0-th layer).
This 3 Blue 1 Brown video is well worth watching too: https://www.youtube.com/watch?v=aircAruvnKk

Why does this TensorFlow example not have a summation before the activation function?

I'm trying to understand a TensorFlow code snippet. What I've been taught is that we sum all the incoming inputs and then pass them to an activation function. Shown in the picture below is a single neuron. Notice that we compute a weighted sum of the inputs and THEN compute the activation.
In most examples of the multi-layer perceptron, they don't include the summation step. I find this very confusing.
Here is an example of one of those snippets:
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
# Create model
def multilayer_perceptron(x):
# Hidden fully connected layer with 256 neurons
layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights['h1']), biases['b1']))
# Hidden fully connected layer with 256 neurons
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']))
# Output fully connected layer with a neuron for each class
out_layer = tf.nn.relu(tf.matmul(layer_2, weights['out']) + biases['out'])
return out_layer
In each layer, we first multiply the inputs with a weights. Afterwards, we add the bias term. Then we pass those to the tf.nn.relu. Where does the summation happen? It looks like we've skipped this!
Any help would be really great!
The last layer of your model out_layer outputs probabilities of each class Prob(y=yi|X) and has shape [batch_size, n_classes]. To calculate these probabilities the softmax
function is applied. For each single input data point x that your model receives it outputs a vector of probabilities y of size number of classes. You then pick the one that has highest probability by applying argmax on the output vector class=argmax(P(y|x)) which can be written in tensorflow as y_pred = tf.argmax(out_layer, 1).
Consider network with a single layer. You have input matrix X of shape [n_samples, x_dimension] and you multiply it by some matrix W that has shape [x_dimension, model_output]. The summation that you're talking about is dot product between the row of matrix X and column of matrix W. The output will then have shape [n_samples, model_output]. On this output you apply activation function (if it is the final layer you probably want softmax). Perhaps the picture that you've shown is a bit misleading.
Mathematically, the layer without bias can be described as and suppose that the first row of matrix (the first row is a single input data point) is
and first column of W is
The result of this dot product is given by
which is your summation. You repeat this for each column in matrix W and the result is vector of size model_output (which correspond to the number of columns in W). To this vector you add bias (if needed) and then apply activation.
The tf.matmul operator performs a matrix multiplication, which means that each element in the resulting matrix is a sum of products (which corresponds exactly to what you describe).
Take a simple example with a row-vector and a column-vector, as would be the case if you had exactly one neuron and an input vector (as per the graphic you shared above);
x = [2,3,1]
y = [3,
1,
2]
Then the result would be:
tf.matmul(x, y) = 2*3 + 3*1 +1*2 = 11
There you can see the weighted sum.
p.s: tf.multiply performs element-wise multiplication, which is not what we want here.

Tensorflow LSTM for noisy sequence

I tried to solve Experiment 3a described in the original LSTM paper here: http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf with tensorflow LSTM and failed
From the paper: The task is to observe and then classify input sequences. There are two classes, each occurring with probability 0.5. There is only one input line. Only the rst N real-valued sequence elements convey relevant information about the class. Sequence elements at positions t > N are generated by a Gaussian with mean zero and variance 0.2.
The net architecture that he described in the paper:
"We use a 3-layer net with 1 input unit, 1 output unit, and 3 cell blocks of size 1. The output layer receives connections only from memory cells. Memory cells and gate units receive inputs from input units, memory cells and gate units, and have bias weights. Gate units and output unit are logistic sigmoid in [0; 1], h in [-1; 1], and g in [-2; 2]"
I tried to reproduce it with LSTM with 3 hidden units for T=100 and N=3 but failed.
I used online training (i.e. update the weights after each sequence) as described in the original paper
The core of my code was as follow:
self.batch_size = batch_size = config.batch_size
hidden_size = 3
self._input_data = tf.placeholder(tf.float32, (1, T))
self._targets = tf.placeholder(tf.float32, [1, 1])
lstm_cell = rnn_cell.BasicLSTMCell(hidden_size , forget_bias=1.0)
cell = rnn_cell.MultiRNNCell([lstm_cell] * 1)
self._initial_state = cell.zero_state(1, tf.float32)
weights_hidden = tf.constant(1.0, shape= [config.num_features, config.n_hidden])
prepare the input
inputs = []
for k in range(num_steps):
nextitem = tf.matmul(tf.reshape(self._input_data[:, k], [1, 1]) , weights_hidden)
inputs.append(nextitem)
outputs, states = rnn.rnn(cell, inputs, initial_state=self._initial_state)
use the last output
pred = tf.sigmoid(tf.matmul(outputs[-1], tf.get_variable("weights_out", [config.n_hidden,1])) + tf.get_variable("bias_out", [1]))
self._final_state = states[-1]
self._cost = cost = tf.reduce_mean(tf.square((pred - self.targets)))
self._result = tf.abs(pred[0, 0] - self.targets[0,0])
optimizer = tf.train.GradientDescentOptimizer(learning_rate = config.learning_rate).minimize(cost)
Any idea why it couldn't learn?
My first instinct was to create 2 outputs one for each class but in the paper he specifically mentioned only one output unit.
Thanks
It seems that i needed forget_bias > 1.0. for long sequences the network couldn't work with default forget_bias for T=50 for example i needed forget_bias = 2.1

Tensorflow embedding_lookup

I am trying to learn the word representation of the imdb dataset "from scratch" through the TensorFlow tf.nn.embedding_lookup() function. If I understand it correctly, I have to set up an embedding layer before the other hidden layer, and then when I perform gradient descent, the layer will "learn" a word representation in the weights of this layer. However, when I try to do this, I get a shape error between my embedding layer and the first fully-connected layer of my network.
def multilayer_perceptron(_X, _weights, _biases):
with tf.device('/cpu:0'), tf.name_scope("embedding"):
W = tf.Variable(tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),name="W")
embedding_layer = tf.nn.embedding_lookup(W, _X)
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(embedding_layer, _weights['h1']), _biases['b1']))
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, _weights['h2']), _biases['b2']))
return tf.matmul(layer_2, weights['out']) + biases['out']
x = tf.placeholder(tf.int32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
pred = multilayer_perceptron(x, weights, biases)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred,y))
train_step = tf.train.GradientDescentOptimizer(0.3).minimize(cost)
init = tf.initialize_all_variables()
The error I get is:
ValueError: Shapes TensorShape([Dimension(None), Dimension(300), Dimension(128)])
and TensorShape([Dimension(None), Dimension(None)]) must have the same rank
The shape error arises because you are using a two-dimensional tensor, x to index into a two-dimensional embedding tensor W. Think of tf.nn.embedding_lookup() (and its close cousin tf.gather()) as taking each integer value i in x and replacing it with the row W[i, :]. From the error message, one can infer that n_input = 300 and embedding_size = 128. In general, the result of tf.nn.embedding_lookup() number of dimensions equal to rank(x) + rank(W) - 1… in this case, 3. The error arises when you try to multiply this result by _weights['h1'], which is a (two-dimensional) matrix.
To fix this code, it depends on what you're trying to do, and why you are passing in a matrix of inputs to the embedding. One common thing to do is to aggregate the embedding vectors for each input example into a single row per example using an operation like tf.reduce_sum(). For example, you might do the following:
W = tf.Variable(
tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0) ,name="W")
embedding_layer = tf.nn.embedding_lookup(W, _X)
# Reduce along dimension 1 (`n_input`) to get a single vector (row)
# per input example.
embedding_aggregated = tf.reduce_sum(embedding_layer, [1])
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(
embedding_aggregated, _weights['h1']), _biases['b1']))
One another possible solution is : Instead of adding the embedding vectors, concatenate these vectors into a single vector and increase the number of neurons in the hidden layer.
I used :
embedding_aggregated = tf.reshape(embedding_layer, [-1, embedding_size * sequence_length])
Also, i changed the number of neurons in hidden layer to embedding_size * sequence_length.
Observation : Accuracy also improved on using concatenation rather than addition.

Categories

Resources