import tensorflow as tf
x = tf.placeholder(tf.float32, [None,4]) # input vector
w1 = tf.Variable(tf.random_normal([4,2])) # weights between first and second layers
b1 = tf.Variable(tf.zeros([2])) # biases added to hidden layer
w2 = tf.Variable(tf.random_normal([2,1])) # weights between second and third layer
b2 = tf.Variable(tf.zeros([1])) # biases added to third (output) layer
def feedForward(x,w,b): # function for forward propagation
Input = tf.add(tf.matmul(x,w), b)
Output = tf.sigmoid(Input)
return Output
>>> Out1 = feedForward(x,w1,b1) # output of first layer
>>> Out2 = feedForward(Out1,w2,b2) # output of second layer
>>> MHat = 50*Out2 # final prediction is in the range (0,50)
>>> M = tf.placeholder(tf.float32, [None,1]) # placeholder for actual (target value of marks)
>>> J = tf.reduce_mean(tf.square(MHat - M)) # cost function -- mean square errors
>>> train_step = tf.train.GradientDescentOptimizer(0.05).minimize(J) # minimize J using Gradient Descent
>>> sess = tf.InteractiveSession() # create interactive session
>>> tf.global_variables_initializer().run() # initialize all weight and bias variables with specified values
>>> xs = [[1,3,9,7],
[7,9,8,2], # x training data
[2,4,6,5]]
>>> Ms = [[47],
[43], # M training data
[39]]
>>> for _ in range(1000): # performing learning process on training data 1000 times
sess.run(train_step, feed_dict = {x:xs, M:Ms})
>>> print(sess.run(MHat, feed_dict = {x:[[1,3,9,7]]}))
[[ 50.]]
>>> print(sess.run(MHat, feed_dict = {x:[[1,15,9,7]]}))
[[ 50.]]
>>> print(sess.run(tf.transpose(MHat), feed_dict = {x:[[1,15,9,7]]}))
[[ 50.]]
In this code, I am trying to predict the marks M of a student out of 50 given how many hours he/she slept, studied, used electronics and played. These 4 features come under the input feature vector x.
To solve this regression problem, I am using a deep neural network with
an input layer with 4 perceptrons (the input features) , a hidden layer with two perceptrons and an output layer with one perceptron. I have used sigmoid as activation function. But, I am getting the exact same prediction([[50.0]]) for M for all possible input vectors I feed in. Can someone please tell me
what is wrong with the code below. I HIGHLY APPRECIATE THE HELP! (IN ADVANCE)
You would need to modify your feedforward() function. Here you don't need to apply sigmoid() at last layer (simply return the activation function!) and also no need to multiply output of this function by 50.
def feedForward(X,W1,b1,W2,b2):
Z=tf.sigmoid(tf.matmul(X,W1)+b1)
return tf.matmul(Z,W2)+b2
MHat = feedForward(x,w1,b1,w2,b2)
Hope this helps!
Don't forget to let us know if it solved your problem :)
Related
I was wondering if possible to train the inputs of neural network part by part. For example, suppose that I have neural network of inputs 256, and output of 256. what I am asking is about the possibility to take groups where each group contains only 16 out of 265 of the inputs in order to be predicted based on a single model trained independently and then concatenate the whole groups at final outputs.
For example, the below example is provided :
from matplotlib import pyplot as plt
import tensorflow as tf
tf.reset_default_graph()
x_train = [[0.,0.],[1.,1.],[1.,0.],[0.,1.]]
y_train = [[0.],[0.],[1.],[1.]]
x_test = [[0.,0.],[.5,.5],[.5,0.],[0.,.5]]
y_test = [[0.],[0.],[2.],[2.]]
# use placeholder instead so you can have different inputs
x = tf.placeholder('float32', [None, 2])
y = tf.placeholder('float32',)
# Layer 1 = the 2x3 hidden sigmoid
m1 = tf.Variable(tf.random_uniform([2,3], minval=0.1, maxval=0.9, dtype=tf.float32))
b1 = tf.Variable(tf.random_uniform([3], minval=0.1, maxval=0.9, dtype=tf.float32))
h1 = tf.sigmoid(tf.matmul(x, m1) + b1)
# Layer 2 = the 3x1 sigmoid output
m2 = tf.Variable(tf.random_uniform([3,1], minval=0.1, maxval=0.9, dtype=tf.float32))
b2 = tf.Variable(tf.random_uniform([1], minval=0.1, maxval=0.9, dtype=tf.float32))
y_out = tf.sigmoid(tf.matmul(h1, m2) + b2)
### loss
# loss : sum of the squares of y0 - y_out
loss = tf.reduce_sum(tf.square(y - y_out))
# training step : gradient decent (1.0) to minimize loss
train = tf.train.GradientDescentOptimizer(1.0).minimize(loss)
# the two feed dictionaries
feeddict_train = {x: x_train, y: y_train}
feeddict_test = {x: x_test, y: y_test}
### training
# run 500 times using all the X and Y
# print out the loss and any other interesting info
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
train_loss, test_loss = [], []
for step in range(500):
loss_train, _ = sess.run([loss, train], feed_dict=feeddict_train)
train_loss.append(loss_train)
# under the same tensorflow graph (in the session), use another feed dictionary
loss_test = sess.run(loss, feed_dict=feeddict_test)
test_loss.append(loss_test)
plt.plot(train_loss, 'r', label='train_loss')
plt.plot(test_loss, 'b', label='test_loss')
plt.legend(loc='best')
here in this command loss_test = sess.run(loss, feed_dict=feeddict_test), the whole inputs feeddict_test will be
taken and trained. what's about if I want to take it into two groups each groub contain only 2 items out of the available
4, and then test them indpendentaly and contencate the outputs, is that possible ??
How can I do that? could you please help me in doing that if possible?
thank you in advance.
There are few ways your question can be interpreted due to the inaccuracy of your question.
First interpretation:
If what you're asking is that if your neural network receives an input vector of size 256 and outputs a vector of size 256, then the answer is no, you can't input a part of the vector as input and expect it to work.
Second interpretation:
If what you're asking is that if you have 256 data (each data is an n-sized vector) and you want to train the network by inputting the first 16, then the second 16, and so on until the 16th 16, yes it is very much possible. Based on the example code you've given, all you need to do is make a for loop that loops 2 times (because in your example, there are 4 data and you want to input them in a group of 2) and,
Change these lines of code:
for step in range(500):
loss_train, _ = sess.run([loss, train], feed_dict=feeddict_train)`
to
for step in range(500):
temp_list = [] #an empty list
for i in range(0,4,2):
loss_train, _ = sess.run([loss, train], feed_dict={x:x_train[i:i+2], y:y_train[i:i+2]}
temp_list.append(loss_train) #append the loss of the network for each group of data.
These will allow the network to be trained with two groups of data independently and learn from them. You can simply make an empty list before the new for loop and concatenate the outputs in it.
Hope this helps. Do let me know if I understood your questions wrongly. Cheers.
I am new to tensorflow and more advanced machine learning, so I tried to get a better grasp of RNNs by implementing one by hand instead of using tf.contrib.rnn.RNNCell. My first problem was that I needed to unroll the net for backpropogation so I looped through my sequence and I needed to keep consistent weights and biases, so I couldn't reinitialize a dense layer with tf.layers.dense each time, but I also needed to have my layer connected to the current timestep of my sequence and I couldn't find a way to change what a dense layer was connected to. To work around this I tried to implement my own version of tf.layers.dense, and this worked fine until I got the error: NotImplementedError("Trying to update a Tensor " ...) when I tried to optimize my custom dense layers.
My code:
import tensorflow as tf
import numpy as np
from tensorflow.contrib import rnn
import random
# -----------------
# WORD PARAMETERS
# -----------------
target_string = ['Hello ','Hello ','World ','World ', '!']
number_input_words = 1
# --------------------------
# TRAINING HYPERPARAMETERS
# --------------------------
training_steps = 4000
batch_size = 9
learning_rate = 0.01
display_step = 150
hidden_cells = 20
# ----------------------
# PREPARE DATA AS DICT
# ----------------------
# TODO AUTOMATICALLY CREATE DICT
dictionary = {'Hello ': 0, 'World ': 1, '!': 2}
reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
vocab_size = len(dictionary)
# ------------
# LSTM MODEL
# ------------
class LSTM:
def __init__(self, sequence_length, number_input_words, hidden_cells, mem_size_x, mem_size_y, learning_rate):
self.sequence = tf.placeholder(tf.float32, (sequence_length, vocab_size), 'sequence')
self.memory = tf.zeros([mem_size_x, mem_size_y])
# sequence_length = self.sequence.shape[0]
units = [vocab_size, 5,4,2,6, vocab_size]
weights = [tf.random_uniform((units[i-1], units[i])) for i in range(len(units))[1:]]
biases = [tf.random_uniform((1, units[i])) for i in range(len(units))[1:]]
self.total_loss = 0
self.outputs = []
for word in range(sequence_length-1):
sequence_w = tf.reshape(self.sequence[word], [1, vocab_size])
layers = []
for i in range(len(weights)):
if i == 0:
layers.append(tf.matmul(sequence_w, weights[0]) + biases[0])
else:
layers.append(tf.matmul(layers[i-1], weights[i]) + biases[i])
percentages = tf.nn.softmax(logits=layers[-1])
self.outputs.append(percentages)
self.total_loss += tf.losses.absolute_difference(tf.reshape(self.sequence[word+1], (1, vocab_size)), tf.reshape(percentages, (1, vocab_size)))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
self.train_operation = optimizer.minimize(loss=self.total_loss, var_list=weights+biases, global_step=tf.train.get_global_step())
lstm = LSTM(len(target_string), number_input_words, hidden_cells, 10, 5, learning_rate)
# ---------------
# START SESSION
# ---------------
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
sequence = []
for i in range(len(target_string)):
x = [0]*vocab_size
x[dictionary[target_string[i]]] = 1
sequence.append(x)
print(sequence)
for x in range(1000):
sess.run(lstm.train_operation, feed_dict={lstm.sequence: sequence})
prediction, loss = sess.run((lstm.outputs, lstm.total_loss), feed_dict= {lstm.sequence: sequence})
print(prediction)
print(loss)
Any answers that tell me either how to either connect tf.layers.dense to different variables each time or tell me how to get around my NotImplementedError would be greatly appreciated. I apologize if this question is lengthy or just badly worded, i'm still new to stackoverflow.
EDIT:
I've updated the LSTM class part of my code to:
(Inside def init)
self.sequence = [tf.placeholder(tf.float32, (batch_size, vocab_size), 'sequence') for _ in range(sequence_length-1)]
self.total_loss = 0
self.outputs = []
rnn_cell = rnn.BasicLSTMCell(hidden_cells)
h = tf.zeros((batch_size, hidden_cells))
for i in range(sequence_length-1):
current_sequence = self.sequence[i]
h = rnn_cell(current_sequence, h)
self.outputs.append(h)
But I still get an error on the line: h = rnn_cell(current_sequence, h) about not being able to iterate over tensors. I'm not trying to iterate over any tensors, and if I am I don't mean to.
So there's a standard way of approaching this issue (this is the best approach I know from my knowledge) Instead of trying to create a new list of dense layers. Do the following. Before that lets assume your hidden layer size is h_dim and number of steps to unroll is num_unroll and batch size batch_size
In a for loop, you calculate the output of the RNNCell for each unrolled input
h = tf.zeros(...)
outputs= []
for ui in range(num_unroll):
out, state = rnn_cell(x[ui],state)
outputs.append(out)
Now concat all the outputs to a single tensor of size, [batch_size*num_unroll, h_dim]
Send this through a single dense layer of size [h_dim, num_classes]
logits = tf.matmul(tf.concat(outputs,...), w) + b
predictions = tf.nn.softmax(logits)
You have the logits for all the unrolled inputs now. Now it's just a matter of reshaping the tensor to a [batch_size, num_unroll, num_classes] tensor.
Edited (Feeding in Data): The data will be presented in the form of a list of num_unroll many placeholders. So,
x = [tf.placeholder(shape=[batch_size,3]...) for ui in range(num_unroll)]
Now say you have data like below,
Hello world bye
Bye hello world
Here batch size is 2, sequence length is 3. Once converted to one hot encoding, you're data looks like below (shape [time_steps, batch_size, 3].
data = [ [ [1,0,0], [0,0,1] ], [ [0,1,0], [1,0,0] ], [ [0,0,1], [0,1,0] ] ]
Now feed data in, in the following format.
feed_dict = {}
for ui in range(3):
feed_dict[x[ui]] = data[ui]
Hej,
I am trying to write a small program to solve a Regression problem. My dataset is hereby 4 random x (x1,x2,x3 and x4) and 1 y value. One of the rows looks like this:
0.634585 0.552366 0.873447 0.196890 8.75
I know want to predict the y-value as close as possible so after the training I would like to evaluate how good my model is by showing the loss. Unfortunately I always receive
Training cost= nan
The most important lines of could would be:
X_data = tf.placeholder(shape=[None, 4], dtype=tf.float32)
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)
# Input neurons : 4
# Hidden neurons : 2 x 8
# Output neurons : 3
hidden_layer_nodes = 8
w1 = tf.Variable(tf.random_normal(shape=[4,hidden_layer_nodes])) # Inputs -> Hidden Layer1
b1 = tf.Variable(tf.random_normal(shape=[hidden_layer_nodes])) # First Bias
w2 = tf.Variable(tf.random_normal(shape=[hidden_layer_nodes,1])) # Hidden layer2 -> Outputs
b2 = tf.Variable(tf.random_normal(shape=[1])) # Third Bias
hidden_output = tf.nn.relu(tf.add(tf.matmul(X_data, w1), b1))
final_output = tf.nn.relu(tf.add(tf.matmul(hidden_output, w2), b2))
loss = tf.reduce_mean(-tf.reduce_sum(y_target * tf.log(final_output), axis=0))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
steps = 10000
with tf.Session() as sess:
sess.run(init)
for i in range(steps):
sess.run(train,feed_dict={X_data:X_train,y_target:y_train})
# PRINT OUT A MESSAGE EVERY 100 STEPS
if i%500 == 0:
print('Currently on step {}'.format(i))
training_cost = sess.run(loss, feed_dict={X_data:X_test,y_target:y_test})
print("Training cost=", training_cost)
Maybe someone knows where my mistake is or even better, how to constantly show the error during my training :) I know how this is done with the tf.estimator, but not without. If you need the dataset, let me know.
Cheers!
This is because the Relu activation function causes the exploding gradient. Therefore, you need to reduce the learning rate accordingly. Moreover, you can try a different activation function also (for this you may have to normalize your dataset first)
Here, (In simple multi-layer FFNN only ReLU activation function doesn't converge) is a similar problem as your case. Follow the answer and you will understand.
Hope this helps.
I am trying to implement simple autoencoder like below.
The number of input features are 2, and I want to build sparse autoencoder for dimension reduction to feature 1. I selected the number of nodes are 2(input), 8(hidden), 1(reduced feature), 8(hidden), 2(output) to add some more complexity than using only (2, 1, 2) nodes. The number of samples N is around 10000.
'DATA' is a just a 2x10000 matrix containing integer values.
import tensorflow as tf
x = tf.placeholder(shape=[None, 2])
w1 = tf.Variable(tf.random_normal(shape=[2, 8]))
w2 = tf.Variable(tf.random_normal(shape=[8, 1]))
h1 = tf.nn.relu(tf.matmul(x, w1))
encoded = tf.matmul(h1, w2)
h2 = tf.nn.relu(encoded)
h3 = tf.nn.relu(tf.matmul(h2, tf.transpose(w2)))
y = tf.matmul(h3, tf.transpose(w1))
mse = tf.reduce_mean(tf.squared_difference(x, y))
optimizer =
tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(mse)
sess = tf.Session()
sess.run(init)
fd = {x: DATA}
loss_value, reduced_feature = sess.run([mse, encoded], feed_dict=fd)
I have 2 questions with the implementation, as the result was quite different as I expected.
Is this implementation correct? Will the variable 'reduced_feature' show the reduced feature(1d feature) from 2 feature inputs?
Should I add some sparsity condition if I want to use more hidden nodes than input? If yes, can you show some sample code for this task?
In case where suppose I have a trained RNN (e.g. language model), and I want to see what it would generate on its own, how should I feed its output back to its input?
I read the following related questions:
TensorFlow using LSTMs for generating text
TensorFlow LSTM Generative Model
Theoretically it is clear to me, that in tensorflow we use truncated backpropagation, so we have to define the max step which we would like to "trace". Also we reserve a dimension for batches, therefore if I'd like to train a sine wave, I have to feed [None, num_step, 1] inputs.
The following code works:
tf.reset_default_graph()
n_samples=100
state_size=5
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
X = tf.placeholder_with_default(zero_x, [None, n_samples, 1])
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
Y = np.roll(def_x, 1)
loss = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
opt = tf.train.AdamOptimizer().minimize(loss)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# Initial state run
plt.show(plt.plot(output.eval()[0]))
plt.plot(def_x.squeeze())
plt.show(plt.plot(pred.eval().squeeze()))
steps = 1001
for i in range(steps):
p, l, _= sess.run([pred, loss, opt])
The state size of the LSTM can be varied, also I experimented with feeding sine wave into the network and zeros, and in both cases it converged in ~500 iterations. So far I have understood that in this case the graph consists n_samples number of LSTM cells sharing their parameters, and it is only up to me that I feed input to them as a time series. However when generating samples the network is explicitly depending on its previous output - meaning that I cannot feed the unrolled model at once. I tried to compute the state and output at every step:
with tf.variable_scope('sine', reuse=True):
X_test = tf.placeholder(tf.float64)
X_reshaped = tf.reshape(X_test, [1, -1, 1])
output, last_states = tf.nn.dynamic_rnn(lstm_cell, X_reshaped, dtype=tf.float64)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
test_vals = [0.]
for i in range(1000):
val = pred.eval({X_test:np.array(test_vals)[None, :, None]})
test_vals.append(val)
However in this model it seems that there is no continuity between the LSTM cells. What is going on here?
Do I have to initialize a zero array with i.e. 100 time steps, and assign each run's result into the array? Like feeding the network with this:
run 0: input_feed = [0, 0, 0 ... 0]; res1 = result
run 1: input_feed = [res1, 0, 0 ... 0]; res2 = result
run 1: input_feed = [res1, res2, 0 ... 0]; res3 = result
etc...
What to do if I want to use this trained network to use its own output as its input in the following time step?
If I understood you correctly, you want to find a way to feed the output of time step t as input to time step t+1, right? To do so, there is a relatively easy work around that you can use at test time:
Make sure your input placeholders can accept a dynamic sequence length, i.e. the size of the time dimension is None.
Make sure you are using tf.nn.dynamic_rnn (which you do in the posted example).
Pass the initial state into dynamic_rnn.
Then, at test time, you can loop through your sequence and feed each time step individually (i.e. max sequence length is 1). Additionally, you just have to carry over the internal state of the RNN. See pseudo code below (the variable names refer to your code snippet).
I.e., change the definition of the model to something like this:
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
X = tf.placeholder_with_default(zero_x, [None, None, 1]) # [batch_size, seq_length, dimension of input]
batch_size = tf.shape(self.input_)[0]
initial_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64,
initial_state=initial_state)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
Then you can perform inference like so:
fetches = {'final_state': last_state,
'prediction': pred}
toy_initial_input = np.array([[[1]]]) # put suitable data here
seq_length = 20 # put whatever is reasonable here for you
# get the output for the first time step
feed_dict = {X: toy_initial_input}
eval_out = sess.run(fetches, feed_dict)
outputs = [eval_out['prediction']]
next_state = eval_out['final_state']
for i in range(1, seq_length):
feed_dict = {X: outputs[-1],
initial_state: next_state}
eval_out = sess.run(fetches, feed_dict)
outputs.append(eval_out['prediction'])
next_state = eval_out['final_state']
# outputs now contains the sequence you want
Note that this can also work for batches, however it can be a bit more complicated if you sequences of different lengths in the same batch.
If you want to perform this kind of prediction not only at test time, but also at training time, it is also possible to do, but a bit more complicated to implement.
You can use its own output (last state) as the next-step input (initial state).
One way to do this is to:
use zero-initialized variables as the input state at every time step
each time you completed a truncated sequence and got some output state, update the state variables with this output state you just got.
The second can be done by either:
fetching the states to python and feeding them back next time, as done in the ptb example in tensorflow/models
build an update op in the graph and add a dependency, as done in the ptb example in tensorpack.
I know I'm a bit late to the party but I think this gist could be useful:
https://gist.github.com/CharlieCodex/f494b27698157ec9a802bc231d8dcf31
It lets you autofeed the input through a filter and back into the network as input. To make shapes match up processing can be set as a tf.layers.Dense layer.
Please ask any questions!
Edit:
In your particular case, create a lambda which performs the processing of the dynamic_rnn outputs into your character vector space. Ex:
# if you have:
W = tf.Variable( ... )
B = tf.Variable( ... )
Yo, Ho = tf.nn.dynamic_rnn( cell , inputs , state )
logits = tf.matmul(W, Yo) + B
...
# use self_feeding_rnn as
process_yo = lambda Yo: tf.matmul(W, Yo) + B
Yo, Ho = self_feeding_rnn( cell, seed, initial_state, processing=process_yo)