I am new to CNTK, and using its awesome python API. I have problem figuring out how I may define a Recurrent Convolutional network layer since the Recurrence() seems to assume a regular network layer only.
Be more specific, I would like to have recurrence among convolutional layers.
Any pointer or even a simple example would be highly appreciated. Thank you.
There are two ways to do this in a meaningful way (i.e. without destroying the structure of natural images that convolutions rely on). The simplest is to just have an LSTM at the final layer i.e.
convnet = C.layers.Sequential([Convolution(...), MaxPooling(...), Convolution(...), ...])
z = C.layers.Sequential([convnet, C.layers.Recurrence(LSTM(100)), C.layers.Dense(10)])
for a 10-class problem.
The more complex way would be to define your own recurrent cell that only uses convolutions and thus respects the structure of natural images. To define a recurrent cell you need to write a function that takes the previous state and an input (i.e. a single frame if you are processing video) and outputs the next state and output. For example you can look into the implementation of the GRU in the CNTK layers module, and adapt it to use convolution instead of times everywhere. If this is what you want I can try to provide such an example. However, I encourage you to try the simple way first.
Update: I wrote a barebones convolutional GRU. You need to pay special attention to how the initial state is defined but otherwise it seems to work fine. Here's the layer definition
def ConvolutionalGRU(kernel_shape, outputs, activation=C.tanh, init=C.glorot_uniform(), init_bias=0, name=''):
conv_filter_shape = (outputs, C.InferredDimension) + kernel_shape
bias_shape = (outputs,1,1)
# parameters
bz = C.Parameter(bias_shape, init=init_bias, name='bz') # bias
br = C.Parameter(bias_shape, init=init_bias, name='br') # bias
bh = C.Parameter(bias_shape, init=init_bias, name='bc') # bias
Wz = C.Parameter(conv_filter_shape, init=init, name='Wz') # input
Wr = C.Parameter(conv_filter_shape, init=init, name='Wr') # input
Uz = C.Parameter(conv_filter_shape, init=init, name='Uz') # hidden-to-hidden
Ur = C.Parameter(conv_filter_shape, init=init, name='Hz') # hidden-to-hidden
Wh = C.Parameter(conv_filter_shape, init=init, name='Wc') # input
Uh = C.Parameter(conv_filter_shape, init=init, name='Hc') # hidden-to-hidden
# Convolutional GRU model function
def conv_gru(dh, x):
zt = C.sigmoid (bz + C.convolution(Wz, x) + C.convolution(Uz, dh)) # update gate z(t)
rt = C.sigmoid (br + C.convolution(Wr, x) + C.convolution(Ur, dh)) # reset gate r(t)
rs = dh * rt # hidden state after reset
ht = zt * dh + (1-zt) * activation(bh + C.convolution(Wh, x) + C.convolution(Uh, rs))
return ht
return conv_gru
and here is how to use it
x = C.sequence.input_variable(3,224,224))
z = C.layers.Recurrence(ConvolutionalGRU((3,3), 32), initial_state=C.constant(0, (32,224,224)))
y = z(x)
x0 = np.random.randn(16,3,224,224).astype('f') # a single seq. with 16 random "frames"
output = y.eval({x:x0})
output[0].shape
(16, 32, 224, 224)
Related
I've created a neural network to estimate the sin(x) function for an input x. The network has 21 output neurons (representing numbers -1.0, -0.9, ..., 0.9, 1.0) with numpy that does not learn, as I think I implemented the neuron architecture incorrectly when I defined the feedforward mechanism.
When I execute the code, the amount of test data it estimates correctly sits around 48/1000. This happens to be the average data point count per category if you split 1000 test data points between 21 categories. Looking at the network output, you can see that the network seems to just start picking a single output value for every input. For example, it may pick -0.5 as the estimate for y regardless of the x you give it. Where did I go wrong here? This is my first network. Thank you!
import random
import numpy as np
import math
class Network(object):
def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):
#Create weight vector arrays to represent each layer size and initialize indices randomly on a Gaussian distribution.
self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
self.layer1_activations = np.zeros((hiddenLayerSize, 1))
self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
self.layer2_activations = np.zeros((outputLayerSize, 1))
self.outputLayerSize = outputLayerSize
self.inputLayerSize = inputLayerSize
self.hiddenLayerSize = hiddenLayerSize
# print(self.layer1)
# print()
# print(self.layer2)
# self.weights = [np.random.randn(y,x)
# for x, y in zip(sizes[:-1], sizes[1:])]
def feedforward(self, network_input):
#Propogate forward through network as if doing this by hand.
#first layer's output activations:
for neuron in range(self.hiddenLayerSize):
self.layer1_activations[neuron] = 1/(1+np.exp(network_input * self.layer1[neuron]))
#second layer's output activations use layer1's activations as input:
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))
#convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)
return(outputs[np.argmax(self.layer2_activations)])
def train(self, training_pairs, epochs, minibatchsize, learn_rate):
#apply gradient descent
test_data = build_sinx_data(1000)
for epoch in range(epochs):
random.shuffle(training_pairs)
minibatches = [training_pairs[k:k + minibatchsize] for k in range(0, len(training_pairs), minibatchsize)]
for minibatch in minibatches:
loss = 0 #calculate loss for each minibatch
#Begin training
for x, y in minibatch:
network_output = self.feedforward(x)
loss += (network_output - y) ** 2
#adjust weights by abs(loss)*sigmoid(network_output)*(1-sigmoid(network_output)*learn_rate
loss /= (2*len(minibatch))
adjustWeights = loss*(1/(1+np.exp(-network_output)))*(1-(1/(1+np.exp(-network_output))))*learn_rate
self.layer1 += adjustWeights
#print(adjustWeights)
self.layer2 += adjustWeights
#when line 63 placed here, results did not improve during minibatch.
print("Epoch {0}: {1}/{2} correct".format(epoch, self.evaluate(test_data), len(test_data)))
print("Training Complete")
def evaluate(self, test_data):
"""
Returns number of test inputs which network evaluates correctly.
The ouput assumed to be neuron in output layer with highest activation
:param test_data: test data set identical in form to train data set.
:return: integer sum
"""
correct = 0
for x, y in test_data:
output = self.feedforward(x)
if output == y:
correct+=1
return(correct)
def build_sinx_data(data_points):
"""
Creates a list of tuples (x value, expected y value) for Sin(x) function.
:param data_points: number of desired data points
:return: list of tuples (x value, expected y value
"""
x_vals = []
y_vals = []
for i in range(data_points):
#parameter of randint signifies range of x values to be used*10
x_vals.append(random.randint(-2000,2000)/10)
y_vals.append(round(math.sin(x_vals[i]),1))
return (list(zip(x_vals,y_vals)))
# training_pairs, epochs, minibatchsize, learn_rate
sinx_test = Network(1,21,21)
print(sinx_test.feedforward(10))
sinx_test.train(build_sinx_data(600),20,10,2)
print(sinx_test.feedforward(10))
I didn't examine thoroughly all of your code, but some issues are clearly visible:
* operator doesn't perform matrix multiplication in numpy, you have to use numpy.dot. This affects, for instance, these lines: network_input * self.layer1[neuron], self.layer1_activations[weight]*self.layer2[neuron][weight], etc.
Seems like you are solving your problem via classification (selecting 1 out of 21 classes), but using L2 loss. This is somewhat mixed up. You have two options: either stick to classification and use a cross entropy loss function, or perform regression (i.e. predict the numeric value) with L2 loss.
You should definitely extract sigmoid function to avoid writing the same expression all over again:
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def sigmoid_derivative(x):
return sigmoid(x) * (1 - sigmoid(x))
You perform the same update of self.layer1 and self.layer2, which clearly wrong. Take some time analyzing how exactly backpropagation works.
I edited how my loss function was integrated into my function and also correctly implemented gradient descent. I also removed the use of mini-batches and simplified what my network was trying to do. I now have a network which attempts to classify something as even or odd.
Some extremely helpful guides I used to fix things up:
Chapter 1 and 2 of Neural Networks and Deep Learning, by Michael Nielsen, available for free at http://neuralnetworksanddeeplearning.com/chap1.html . This book gives thorough explanations for how Neural Nets work, including breakdowns of the math behind their execution.
Backpropagation from the Beginning, by Erik Hallström, linked by Maxim. https://medium.com/#erikhallstrm/backpropagation-from-the-beginning-77356edf427d
. Not as thorough as the above guide, but I kept both open concurrently, as this guide is more to the point about what is important and how to apply the mathematical formulas that are thoroughly explained in Nielsen's book.
How to build a simple neural network in 9 lines of Python code https://medium.com/technology-invention-and-more/how-to-build-a-simple-neural-network-in-9-lines-of-python-code-cc8f23647ca1
. A useful and fast introduction to some neural networking basics.
Here is my (now functioning) code:
import random
import numpy as np
import scipy
import math
class Network(object):
def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):
#Layers represented both by their weights array and activation and inputsums vectors.
self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
self.layer1_activations = np.zeros((hiddenLayerSize, 1))
self.layer2_activations = np.zeros((outputLayerSize, 1))
self.layer1_inputsums = np.zeros((hiddenLayerSize, 1))
self.layer2_inputsums = np.zeros((outputLayerSize, 1))
self.layer1_errorsignals = np.zeros((hiddenLayerSize, 1))
self.layer2_errorsignals = np.zeros((outputLayerSize, 1))
self.layer1_deltaw = np.zeros((hiddenLayerSize, inputLayerSize))
self.layer2_deltaw = np.zeros((outputLayerSize, hiddenLayerSize))
self.outputLayerSize = outputLayerSize
self.inputLayerSize = inputLayerSize
self.hiddenLayerSize = hiddenLayerSize
print()
print(self.layer1)
print()
print(self.layer2)
print()
# self.weights = [np.random.randn(y,x)
# for x, y in zip(sizes[:-1], sizes[1:])]
def feedforward(self, network_input):
#Calculate inputsum and and activations for each neuron in the first layer
for neuron in range(self.hiddenLayerSize):
self.layer1_inputsums[neuron] = network_input * self.layer1[neuron]
self.layer1_activations[neuron] = self.sigmoid(self.layer1_inputsums[neuron])
# Calculate inputsum and and activations for each neuron in the second layer. Notice that each neuron in the second layer represented by
# weights vector, consisting of all weights leading out of the kth neuron in (l-1) layer to the jth neuron in layer l.
self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = self.sigmoid(self.layer2_inputsums[neuron])
return self.layer2_activations
def interpreted_output(self, network_input):
#convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
self.feedforward(network_input)
outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)
return(outputs[np.argmax(self.layer2_activations)])
# def build_expected_output(self, training_data):
# #Views expected output number y for each x to generate an expected output vector from the network
# index=0
# for pair in training_data:
# expected_output_vector = np.zeros((self.outputLayerSize,1))
# x = training_data[0]
# y = training_data[1]
# for i in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1, 1):
# if y == i / 10:
# expected_output_vector[i] = 1
# #expect the target category to be a 1.
# break
# training_data[index][1] = expected_output_vector
# index+=1
# return training_data
def train(self, training_data, learn_rate):
self.backpropagate(training_data, learn_rate)
def backpropagate(self, train_data, learn_rate):
#Perform for each x,y pair.
for datapair in range(len(train_data)):
x = train_data[datapair][0]
y = train_data[datapair][1]
self.feedforward(x)
# print("l2a " + str(self.layer2_activations))
# print("l1a " + str(self.layer1_activations))
# print("l2 " + str(self.layer2))
# print("l1 " + str(self.layer1))
for neuron in range(self.outputLayerSize):
#Calculate first error equation for error signals of output layer neurons
self.layer2_errorsignals[neuron] = (self.layer2_activations[neuron] - y[neuron]) * self.sigmoid_prime(self.layer2_inputsums[neuron])
#Use recursive formula to calculate error signals of hidden layer neurons
self.layer1_errorsignals = np.multiply(np.array(np.matrix(self.layer2.T) * np.matrix(self.layer2_errorsignals)) , self.sigmoid_prime(self.layer1_inputsums))
#print(self.layer1_errorsignals)
# for neuron in range(self.hiddenLayerSize):
# #Use recursive formula to calculate error signals of hidden layer neurons
# self.layer1_errorsignals[neuron] = np.multiply(self.layer2[neuron].T,self.layer2_errorsignals[neuron]) * self.sigmoid_prime(self.layer1_inputsums[neuron])
#Partial derivative of C with respect to weight for connection from kth neuron in (l-1)th layer to jth neuron in lth layer is
#(jth error signal in lth layer) * (kth activation in (l-1)th layer.)
#Update all weights for network at each iteration of a training pair.
#Update weights in second layer
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_deltaw[neuron][weight] = self.layer2_errorsignals[neuron]*self.layer1_activations[weight]*(-learn_rate)
self.layer2 += self.layer2_deltaw
#Update weights in first layer
for neuron in range(self.hiddenLayerSize):
self.layer1_deltaw[neuron] = self.layer1_errorsignals[neuron]*(x)*(-learn_rate)
self.layer1 += self.layer1_deltaw
#Comment/Uncomment to enable error evaluation.
#print("Epoch {0}: Error: {1}".format(datapair, self.evaluate(test_data)))
# print("l2a " + str(self.layer2_activations))
# print("l1a " + str(self.layer1_activations))
# print("l1 " + str(self.layer1))
# print("l2 " + str(self.layer2))
def evaluate(self, test_data):
error = 0
for x, y in test_data:
#x is integer, y is single element np.array
output = self.feedforward(x)
error += y - output
return error
#eval function for sin(x)
# def evaluate(self, test_data):
# """
# Returns number of test inputs which network evaluates correctly.
# The ouput assumed to be neuron in output layer with highest activation
# :param test_data: test data set identical in form to train data set.
# :return: integer sum
# """
# correct = 0
# for x, y in test_data:
# outputs = [x / 10 for x in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1,
# 1)] # range(-10, 11, 1)
# newy = outputs[np.argmax(y)]
# output = self.interpreted_output(x)
# #print("output: " + str(output))
# if output == newy:
# correct+=1
# return(correct)
def sigmoid(self, z):
return 1 / (1 + np.exp(-z))
def sigmoid_prime(self, z):
return (1 - self.sigmoid(z)) * self.sigmoid(z)
def build_simple_data(data_points):
x_vals = []
y_vals = []
for each in range(data_points):
x = random.randint(-3,3)
expected_output_vector = np.zeros((1, 1))
if x > 0:
expected_output_vector[[0]] = 1
else:
expected_output_vector[[0]] = 0
x_vals.append(x)
y_vals.append(expected_output_vector)
print(list(zip(x_vals,y_vals)))
print()
return (list(zip(x_vals,y_vals)))
simpleNet = Network(1, 3, 1)
# print("Pretest")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))
# init_weights_l1 = simpleNet.layer1
# init_weights_l2 = simpleNet.layer2
# simpleNet.train(build_simple_data(10000),.1)
# #sometimes Error converges to 0, sometimes error converges to 10.
# print("Initial Weights:")
# print(init_weights_l1)
# print(init_weights_l2)
# print("Final Weights")
# print(simpleNet.layer1)
# print(simpleNet.layer2)
# print("Post-test")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))
def test_network(iterations,net,training_points):
"""
Casually evaluates pre and post test
:param iterations: number of trials to be run
:param net: name of network to evaluate.
;param training_points: size of training data to be used
:return: four 1x1 arrays.
"""
pretest_negative = 0
pretest_positive = 0
posttest_negative = 0
posttest_positive = 0
for each in range(iterations):
pretest_negative += net.feedforward(-10)
pretest_positive += net.feedforward(10)
net.train(build_simple_data(training_points),.1)
for each in range(iterations):
posttest_negative += net.feedforward(-10)
posttest_positive += net.feedforward(10)
return(pretest_negative/iterations, pretest_positive/iterations, posttest_negative/iterations, posttest_positive/iterations)
print(test_network(10000, simpleNet, 10000))
While much differs between this code and the code posted in the OP, there is a particular difference that is interesting. In the original feedforward method notice
#second layer's output activations use layer1's activations as input:
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))
The line
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
Resembles
self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
In the updated code. This line performs the dot product between each weight vector and each input vector (the activations from layer 1) to arrive at the input_sum for a neuron, commonly referred to as z (think sigmoid(z)). In my network, the derivative of the sigmoid function, sigmoid_prime, is used to calculate the gradient of the cost function with respect to all the weights. By multiplying sigmoid_prime(z) * network error between actual and expected output. If z is very big (and positive), the neuron will have an activation value very close to 1. That means that the network is confident that that neuron should be activating. The same is true if z is very negative. The network, then, doesn't want to radically adjust weights that it is happy with, so the scale of the change in each weight for a neuron is given by the gradient of sigmoid(z), sigmoid_prime(z). Very large z means very small gradient and very small change applied to weights (the gradient of sigmoid is maximized at z = 0, when the network is unconfident about how a neuron should be categorized and when the activation for that neuron is 0.5).
Since I was continually adding on to each neuron's input_sum (z) and never resetting the value for new inputs of dot(weights, activations), the value for z kept growing, continually slowing the rate of change for the weights until weight modification grew to a standstill. I added the following line to cope with this:
self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))
The new posted network can be copy and pasted into an editor and executed so long as you have the numpy module installed. The final line of output to print will be a list of 4 arrays representing final network output. The first two are the pretest values for a negative and positive input, respectively. These should be random. The second two are post-test values to determine how well the network classifies as positive and negative number. A number near 0 denotes negative, near 1 denotes positive.
import tensorflow as tf
x = tf.placeholder(tf.float32, [None,4]) # input vector
w1 = tf.Variable(tf.random_normal([4,2])) # weights between first and second layers
b1 = tf.Variable(tf.zeros([2])) # biases added to hidden layer
w2 = tf.Variable(tf.random_normal([2,1])) # weights between second and third layer
b2 = tf.Variable(tf.zeros([1])) # biases added to third (output) layer
def feedForward(x,w,b): # function for forward propagation
Input = tf.add(tf.matmul(x,w), b)
Output = tf.sigmoid(Input)
return Output
>>> Out1 = feedForward(x,w1,b1) # output of first layer
>>> Out2 = feedForward(Out1,w2,b2) # output of second layer
>>> MHat = 50*Out2 # final prediction is in the range (0,50)
>>> M = tf.placeholder(tf.float32, [None,1]) # placeholder for actual (target value of marks)
>>> J = tf.reduce_mean(tf.square(MHat - M)) # cost function -- mean square errors
>>> train_step = tf.train.GradientDescentOptimizer(0.05).minimize(J) # minimize J using Gradient Descent
>>> sess = tf.InteractiveSession() # create interactive session
>>> tf.global_variables_initializer().run() # initialize all weight and bias variables with specified values
>>> xs = [[1,3,9,7],
[7,9,8,2], # x training data
[2,4,6,5]]
>>> Ms = [[47],
[43], # M training data
[39]]
>>> for _ in range(1000): # performing learning process on training data 1000 times
sess.run(train_step, feed_dict = {x:xs, M:Ms})
>>> print(sess.run(MHat, feed_dict = {x:[[1,3,9,7]]}))
[[ 50.]]
>>> print(sess.run(MHat, feed_dict = {x:[[1,15,9,7]]}))
[[ 50.]]
>>> print(sess.run(tf.transpose(MHat), feed_dict = {x:[[1,15,9,7]]}))
[[ 50.]]
In this code, I am trying to predict the marks M of a student out of 50 given how many hours he/she slept, studied, used electronics and played. These 4 features come under the input feature vector x.
To solve this regression problem, I am using a deep neural network with
an input layer with 4 perceptrons (the input features) , a hidden layer with two perceptrons and an output layer with one perceptron. I have used sigmoid as activation function. But, I am getting the exact same prediction([[50.0]]) for M for all possible input vectors I feed in. Can someone please tell me
what is wrong with the code below. I HIGHLY APPRECIATE THE HELP! (IN ADVANCE)
You would need to modify your feedforward() function. Here you don't need to apply sigmoid() at last layer (simply return the activation function!) and also no need to multiply output of this function by 50.
def feedForward(X,W1,b1,W2,b2):
Z=tf.sigmoid(tf.matmul(X,W1)+b1)
return tf.matmul(Z,W2)+b2
MHat = feedForward(x,w1,b1,w2,b2)
Hope this helps!
Don't forget to let us know if it solved your problem :)
I have trained an LSTM model mapping 9 inputs to 7 outputs each time step using Tensorflow. Now I have reconstructed the model in Matlab, but the results (I plotted the trajectories) don't seem to be correct.
I did this by just load the trained weight-matrices of the LSTM network in Matlab. Furthermore I implemented the forward-propagation rules in Matlab by hand and plugged in the weight-matrices
function [outputs,states] = unfoldNetwork(obj,inputs)
numInputs = size(inputs,1);
input_output = [inputs(1,:)' ; zeros(obj.num_hidden,1)];
state = zeros(obj.num_hidden,1);
outputs = zeros(numInputs, obj.output_dim);
states = zeros(numInputs, obj.num_hidden);
for i=1:numInputs
[output,state] = fp(obj,state,input_output);
input_output = [inputs(i,:)' ; output];
outputs(i,:) = (obj.output_W * output + obj.output_b)'; % transfer to output-dimension
states(i,:) = state';
end
end
function [output,state] = fp(obj,state,input_output)
i = obj.sigmoid(obj.weights(1:obj.num_hidden,:) * input_output + obj.biases(1:obj.num_hidden));
g = tanh(obj.weights(obj.num_hidden+1:2*obj.num_hidden,:) * input_output + obj.biases(obj.num_hidden+1:2*obj.num_hidden));
f = obj.sigmoid(obj.weights(2*obj.num_hidden+1:3*obj.num_hidden,:) * input_output + obj.biases(2*obj.num_hidden+1:3*obj.num_hidden));
o = obj.sigmoid(obj.weights(3*obj.num_hidden+1:4*obj.num_hidden,:) * input_output + obj.biases(3*obj.num_hidden+1:4*obj.num_hidden));
state = f.*state + i.*g;
output = o.* tanh(state);
end
function [argout] = sigmoid(obj,x)
argout = 1./(1+exp(-x));
end
The network in Tensorflow is basically this
lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(config.n_hidden, forget_bias=1.0, state_is_tuple=False)
# Get LSTM cell output
outputs, states = tf.nn.dynamic_rnn(lstm_cell_1, _X, initial_state=_S, dtype=tf.float32)
# output layer
# bring it to a shape for being able to multiply with output layer matrix
outputs = tf.reshape(outputs,[-1,config.n_hidden])
final_outputs = [tf.matmul(outputs, config.W['output']) + config.biases['output'] ]
final_outputs = tf.reshape(final_outputs,[batch_size,timesteps,config.n_output]) # convert it back to sequence
Ground Truth, Python prediction, and Matlab prediction
On the image, you can see at the beginning the prediction in Matlab do (a little bit shifted) correspond with the one in Python. After that, it gets somehow biased. I am pretty sure that the forward propagation rules and the weight matrices are correct. Because I did change some matrices or the forward-propagation rules just a little bit and after that everything gets really messy.
There is very helpful method in matlab called "getwb()".
For developers that coding neural network, this method returns the weights and biases at the final iteration.
I have neural network (using tensorflow tools).
There is possible to convert this method in some way?
I tried alot with tensorFlow.saver() and restore() but I don`t really understand this issue.
thanks!
EDIT:
my model is:
def neuralNetworkModel(x):
# first step: (input * weights) + bias, linear operation like y = ax + b
# each layer connection to other layer will represent by nodes(i) * nodes(i+1)
for i in range(0,numberOfLayers):
if i == 0:
hiddenLayers.append({"weights": tensorFlow.Variable(tensorFlow.random_normal([sizeOfRow, nodesLayer[i]])),
"biases": tensorFlow.Variable(tensorFlow.random_normal([nodesLayer[i]]))})
elif i > 0 and i < numberOfLayers-1:
hiddenLayers.append({"weights" : tensorFlow.Variable(tensorFlow.random_normal([nodesLayer[i], nodesLayer[i+1]])),
"biases" : tensorFlow.Variable(tensorFlow.random_normal([nodesLayer[i+1]]))})
else:
outputLayer = {"weights": tensorFlow.Variable(tensorFlow.random_normal([nodesLayer[i], classes])),
"biases": tensorFlow.Variable(tensorFlow.random_normal([classes]))}
# create the layers
for i in range(numberOfLayers):
if i == 0:
layers.append(tensorFlow.add(tensorFlow.matmul(x, hiddenLayers[i]["weights"]), hiddenLayers[i]["biases"]))
layers.append(tensorFlow.nn.relu(layers[i])) # pass values to activation function (i.e sigmoid, softmax) and add it to the layer
elif i >0 and i < numberOfLayers-1:
layers.append(tensorFlow.add(tensorFlow.matmul(layers[i-1], hiddenLayers[i]["weights"]), hiddenLayers[i]["biases"]))
layers.append(tensorFlow.nn.relu(layers[i]))
output = tensorFlow.matmul(layers[numberOfLayers-1], outputLayer["weights"]) + outputLayer["biases"]
finalOutput = output
return output
In your code you create a bunch of variables for weights and biases of hidden and output layers. You should be able to retrieve them at any moment (when a session is active) by using tf.Session.run() like follows:
import tensorflow as tf
tf.reset_default_graph()
v = tf.Variable(tf.random_normal((5, 5)))
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
v_val = sess.run(v)
print v_val
I also advise to use tf.learn library that contains useful abstractions like fully_connected layer.
I currently have the following code for a series of chained together RNNs in tensorflow. I am not using MultiRNN since I was to do something later on with the output of each layer.
for r in range(RNNS):
with tf.variable_scope('recurent_%d' % r) as scope:
state = [tf.zeros((BATCH_SIZE, sz)) for sz in rnn_func.state_size]
time_outputs = [None] * TIME_STEPS
for t in range(TIME_STEPS):
rnn_input = getTimeStep(rnn_outputs[r - 1], t)
time_outputs[t], state = rnn_func(rnn_input, state)
time_outputs[t] = tf.reshape(time_outputs[t], (-1, 1, RNN_SIZE))
scope.reuse_variables()
rnn_outputs[r] = tf.concat(1, time_outputs)
Currently I have a fixed number of time steps. However I would like to change it to have only one timestep but remember the state between batches. I would therefore need to create a state variable for each layer and assign it the final state of each of the layers. Something like this.
for r in range(RNNS):
with tf.variable_scope('recurent_%d' % r) as scope:
saved_state = tf.get_variable('saved_state', ...)
rnn_outputs[r], state = rnn_func(rnn_outputs[r - 1], saved_state)
saved_state = tf.assign(saved_state, state)
Then for each of the layers I would need to evaluate the saved state in my sess.run function as well as calling my training function. I would need to do this for every rnn layer. This seems like kind of a hassle. I would need to track every saved state and evaluate it in run. Also then run would need to copy the state from my GPU to host memory which would be inefficient and unnecessary. Is there a better way of doing this?
Here is the code to update the LSTM's initial state, when state_is_tuple=True by defining state variables. It also supports multiple layers.
We define two functions - one for getting the state variables with an initial zero state and one function for returning an operation, which we can pass to session.run in order to update the state variables with the LSTM's last hidden state.
def get_state_variables(batch_size, cell):
# For each layer, get the initial state and make a variable out of it
# to enable updating its value.
state_variables = []
for state_c, state_h in cell.zero_state(batch_size, tf.float32):
state_variables.append(tf.contrib.rnn.LSTMStateTuple(
tf.Variable(state_c, trainable=False),
tf.Variable(state_h, trainable=False)))
# Return as a tuple, so that it can be fed to dynamic_rnn as an initial state
return tuple(state_variables)
def get_state_update_op(state_variables, new_states):
# Add an operation to update the train states with the last state tensors
update_ops = []
for state_variable, new_state in zip(state_variables, new_states):
# Assign the new state to the state variables on this layer
update_ops.extend([state_variable[0].assign(new_state[0]),
state_variable[1].assign(new_state[1])])
# Return a tuple in order to combine all update_ops into a single operation.
# The tuple's actual value should not be used.
return tf.tuple(update_ops)
We can use that to update the LSTM's state after each batch. Note that I use tf.nn.dynamic_rnn for unrolling:
data = tf.placeholder(tf.float32, (batch_size, max_length, frame_size))
cell_layer = tf.contrib.rnn.GRUCell(256)
cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers)
# For each layer, get the initial state. states will be a tuple of LSTMStateTuples.
states = get_state_variables(batch_size, cell)
# Unroll the LSTM
outputs, new_states = tf.nn.dynamic_rnn(cell, data, initial_state=states)
# Add an operation to update the train states with the last state tensors.
update_op = get_state_update_op(states, new_states)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
sess.run([outputs, update_op], {data: ...})
The main difference to this answer is that state_is_tuple=True makes the LSTM's state a LSTMStateTuple containing two variables (cell state and hidden state) instead of just a single variable. Using multiple layers then makes the LSTM's state a tuple of LSTMStateTuples - one per layer.
Resetting to zero
When using a trained model for prediction / decoding, you might want to reset the state to zero. Then, you can make use of this function:
def get_state_reset_op(state_variables, cell, batch_size):
# Return an operation to set each variable in a list of LSTMStateTuples to zero
zero_states = cell.zero_state(batch_size, tf.float32)
return get_state_update_op(state_variables, zero_states)
For example like above:
reset_state_op = get_state_reset_op(state, cell, max_batch_size)
# Reset the state to zero before feeding input
sess.run([reset_state_op])
sess.run([outputs, update_op], {data: ...})
I am now saving the RNN states using the tf.control_dependencies. Here is an example.
saved_states = [tf.get_variable('saved_state_%d' % i, shape = (BATCH_SIZE, sz), trainable = False, initializer = tf.constant_initializer()) for i, sz in enumerate(rnn.state_size)]
W = tf.get_variable('W', shape = (2 * RNN_SIZE, RNN_SIZE), initializer = tf.truncated_normal_initializer(0.0, 1 / np.sqrt(2 * RNN_SIZE)))
b = tf.get_variable('b', shape = (RNN_SIZE,), initializer = tf.constant_initializer())
rnn_output, states = rnn(last_output, saved_states)
with tf.control_dependencies([tf.assign(a, b) for a, b in zip(saved_states, states)]):
dense_input = tf.concat(1, (last_output, rnn_output))
dense_output = tf.tanh(tf.matmul(dense_input, W) + b)
last_output = dense_output + last_output
I just make sure that part of my graph is dependent on saving the state.
These two links are also related and useful for this question:
https://github.com/tensorflow/tensorflow/issues/2695
https://github.com/tensorflow/tensorflow/issues/2838