I am trying to make a simple MLP to predict values of a pixel of an image - original blog .
Here's my earlier attempt using Keras in python - link
I've tried to do the same in tensorflow, but I am getting very large output values (~10^12) when they should be less than 1.
Here's my code:
import numpy as np
import cv2
from random import shuffle
import tensorflow as tf
'''
Image preprocessing
'''
image_file = cv2.imread("Mona Lisa.jpg")
h = image_file.shape[0]
w = image_file.shape[1]
preX = []
preY = []
for i in xrange(h):
for j in xrange(w):
preX.append([i,j])
preY.append(image_file[i,j,:].astype('float32')/255.0)
print preX[:5], preY[:5]
zipped = [i for i in zip(preX,preY)]
shuffle(zipped)
X_train = np.array([i for (i,j) in zipped]).astype('float32')
Y_train = np.array([j for (i,j) in zipped]).astype('float32')
print X_train[:10], Y_train[:10]
'''
Tensorflow code
'''
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
x = tf.placeholder(tf.float32, shape=[None,2])
y = tf.placeholder(tf.float32, shape=[None,3])
'''
Layers
'''
w1 = weight_variable([2,300])
b1 = bias_variable([300])
L1 = tf.nn.relu(tf.matmul(X_train,w1)+b1)
w2 = weight_variable([300,3])
b2 = bias_variable([3])
y_model = tf.matmul(L1,w2)+b2
'''
Training
'''
# criterion
MSE = tf.reduce_mean(tf.square(tf.sub(y,y_model)))
# trainer
train_op = tf.train.GradientDescentOptimizer(learning_rate = 0.01).minimize(MSE)
nb_epochs = 10
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
cost = 0
for i in range(nb_epochs):
sess.run(train_op, feed_dict ={x: X_train, y: Y_train})
cost += sess.run(MSE, feed_dict ={x: X_train, y: Y_train})
cost /= nb_epochs
print cost
'''
Prediction
'''
pred = sess.run(y_model,feed_dict = {x:X_train})*255.0
print pred[:10]
output_image = []
index = 0
h = image_file.shape[0]
w = image_file.shape[1]
for i in xrange(h):
row = []
for j in xrange(w):
row.append(pred[index])
index += 1
row = np.array(row)
output_image.append(row)
output_image = np.array(output_image)
output_image = output_image.astype('uint8')
cv2.imwrite('out_mona_300x3_tf.png',output_image)
First of all, I think that instead of running the train_op and then the MSE
you can run both ops in a list and reduce your computational cost significantly.
for i in range(nb_epochs):
cost += sess.run([MSE, train_op], feed_dict ={x: X_train, y: Y_train})
Secondly, I suggest always writing out your cost function so you can see what is going on during the training phase. Either manually print it out or use tensorboard to log your cost and plot it (you can find examples on the official tf page).
You can also monitor your weights to see that they aren't blowing up.
A few things you can try:
Reduce learning rate, add regularization to weights.
Check that your training set (pixels) really consist of the values that
you expect them to.
You give the input layer weights and the output layer weights the same names w and b, so it seems something goes wrong in the gradient-descent procedure. Actually I'm surprised tensorflow doesn't issue an error or at leas a warning (or am I missing something?)
Related
I just started using the GPU version of TensorFlow hoping that it would speed up the training of my feed-forward neural networks. I am able to train on my GPU (GTX1080ti), but unfortunately it is not notably faster than doing the same training on my CPU (i7-8700K) the current way I’ve implemented it. During training, the GPU appears to barely be utilized at all, which makes me suspect that the bottleneck in my implementation is how the data is copied from the host to the device using feed_dict.
I’ve heard that TensorFlow has something called the “tf.data” pipeline which is supposed to make it easier and faster to feed data to GPUs etc. However I have not been able to find any simple examples where this concept is implemented into multilayer perceptron training as a replacement for feed_dict.
Is anyone aware of such an example and can point me to it? Preferably as simple as possible since I’m new to TensorFlow in general. Or is there something else I should change in my current implementation to make it more efficient? I’m pasting the code I have here:
import tensorflow as tf
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
tf.reset_default_graph()
import time
# Function for iris dataset.
def get_iris_data():
iris = datasets.load_iris()
data = iris["data"]
target = iris["target"]
# Convert to one-hot vectors
num_labels = len(np.unique(target))
all_Y = np.eye(num_labels)[target]
return train_test_split(data, all_Y, test_size=0.33, random_state=89)
# Function which initializes tensorflow weights & biases for feed-forward NN.
def InitWeights(LayerSizes):
with tf.device('/gpu:0'):
# Make tf placeholders for network inputs and outputs.
X = tf.placeholder( shape = (None,LayerSizes[0]),
dtype = tf.float32,
name ='InputData')
y = tf.placeholder( shape = (None,LayerSizes[-1]),
dtype = tf.float32,
name ='OutputData')
# Initialize weights and biases.
W = {}; b = {};
for ii in range(len(LayerSizes)-1):
layername = f'layer%s' % ii
with tf.variable_scope(layername):
ny = LayerSizes[ii]
nx = LayerSizes[ii+1]
# Weights (initialized with xavier initializatiion).
W['Weights_'+layername] = tf.get_variable(
name = 'Weights_'+layername,
shape = (ny, nx),
initializer = tf.contrib.layers.xavier_initializer(),
dtype = tf.float32
)
# Bias (initialized with xavier initializatiion).
b['Bias_'+layername] = tf.get_variable(
name = 'Bias_'+layername,
shape = (nx),
initializer = tf.contrib.layers.xavier_initializer(),
dtype = tf.float32
)
return W, b, X, y
# Function for forward propagation of NN.
def FeedForward(X, W, b):
with tf.device('/gpu:0'):
# Initialize 'a' of first layer to the placeholder of the network input.
a = X
# Loop all layers of the network.
for ii in range(len(W)):
# Use name of each layer as index.
layername = f'layer%s' % ii
## Weighted sum: z = input*W + b
z = tf.add(tf.matmul(a, W['Weights_'+layername], name = 'WeightedSum_z_'+layername), b['Bias_'+layername])
## Passed through actication fcn: a = h(z)
if ii == len(W)-1:
a = z
else:
a = tf.nn.relu(z, name = 'activation_a_'+layername)
return a
if __name__ == "__main__":
# Import data
train_X, test_X, train_y, test_y = get_iris_data()
# Define network size [ninputs-by-256-by-outputs]
LayerSizes = [4, 256, 3]
# Initialize weights and biases.
W, b, X, y = InitWeights(LayerSizes)
# Define loss function to optimize.
yhat = FeedForward(X, W, b)
loss = tf.reduce_sum(tf.square(y - yhat),reduction_indices=[0])
# Define optimizer to use when minimizing loss function.
all_variables = tf.trainable_variables()
optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.0001)
train_op = optimizer.minimize(loss, var_list = all_variables)
# Start tf session and initialize variables.
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# Train 10000 minibatches and time how long it takes.
t0 = time.time()
for i in range(10000):
ObservationsToUse = np.random.choice(len(train_X), 32)
X_minibatch = train_X[ObservationsToUse,:]
y_minibatch = train_y[ObservationsToUse,:]
sess.run(train_op, feed_dict={X : X_minibatch, y : y_minibatch})
t1 = time.time()
print('Training took %0.2f seconds' %(t1-t0))
sess.close()
The speed might be low because:
You are creating placeholders. Using numpy, we insert the data in the
placeholders and thereby they are converted to tensors of the graph.
By using tf.data.Dataset, you can create a direct pipeline which makes the data directly flow into the graph without the need of placeholders. They are fast, scalable and have a number of functions to play around with.
with np.load("/var/data/training_data.npy") as data:
features = data["features"]
labels = data["labels"]
# Assume that each row of `features` corresponds to the same row as `labels`.
assert features.shape[0] == labels.shape[0]
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
Some useful functions :
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(32) # Creating batches
dataset = dataset.repeat(num_epochs) # repeat the dataset 'N' times
iterator = dataset.make_one_shot_iterator() # Create a iterator to retrieve batches of data
X, Y = iterator.get_next()
Here, 32 is the batch size.
In your case,
dataset = tf.data.Dataset.from_tensor_slices((data, targets))
Hence, there is no need of placeholders. Directly run,
session.run( train_op ) # no feed_dict!!
For a complex text classification task I am training an RNN model. Somehow the weights of the RNN only change at the beginning and then with very tiny steps (gradients are in the range of e^-7):
For those of you who are not familiar with tensorboard: This shows the distribution of values for the bias and the weights of the RNN (x axis is the value, y axis the number of values and z the training iterations).
I constructed a toy example which does not make sense but reproduces the same behaviour:
import numpy as np
import tensorflow as tf
tensorboard_save_path = "../RNN/tensorboard/supersimple/"
x = np.random.normal(size=(33, 20, 5000))
y = np.array([1 if i>0.5 else 0 for i in np.random.random(33)])
##### NETWORK #########
with tf.name_scope("RNN"):
rnn_cell = tf.contrib.rnn.BasicRNNCell(1)
outputs, states = tf.nn.dynamic_rnn(rnn_cell, x, dtype=tf.float64)
rnn_weights, rnn_biases = rnn_cell.variables
tf.summary.histogram("RNN weights", rnn_weights)
tf.summary.histogram("RNN biases", rnn_biases)
pred = tf.sigmoid(outputs[:,-1])
with tf.name_scope("cost"):
cost = tf.losses.mean_squared_error(predictions=pred, labels=np.reshape(y, (33,1)))
with tf.name_scope("train"):
optimizer = tf.train.AdagradOptimizer(learning_rate=0.5).minimize(cost)
init = tf.global_variables_initializer()
merged_summary = tf.summary.merge_all()
writer = tf.summary.FileWriter(tensorboard_save_path)
print("\ttensorboard --logdir=" + tensorboard_save_path)
sess = tf.Session()
sess.run(init)
for i in range(1000):
sess.run([optimizer])
if i % 50 == 0:
c, s = sess.run([cost, merged_summary])
writer.add_summary(s, i)
print("cost is %f" % c)
Expected behaviour would be that the model overfits due to the huge amount of variables in contrast to the few training samples. Any idea what's going wrong here?
I have a dataset with 5 columns, I am feeding in first 3 columns as my Inputs and the other 2 columns as my outputs.
I have successfully executed the program but i am not sure how to test the model by giving my own values as input and getting a predicted output from the model.
Can anyone please help me, How can I actually test the model with my own value after training is done ?
I am using Tensorflow in Python. I am able to display accuracy of testing,but How do I actually predict with value if I pass some random input (here,I need to pass 3 input values to get 2 output values)
Here is my code:
# Implementation of a simple MLP network with one hidden layer. Tested on the iris data set.
# Requires: numpy, sklearn>=0.18.1, tensorflow>=1.0
# NOTE: In order to make the code simple, we rewrite x * W_1 + b_1 = x' * W_1'
# where x' = [x | 1] and W_1' is the matrix W_1 appended with a new row with elements b_1's.
# Similarly, for h * W_2 + b_2
import tensorflow as tf
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
import pandas as pd
RANDOM_SEED = 1000
tf.set_random_seed(RANDOM_SEED)
def init_weights(shape):
""" Weight initialization """
weights = tf.random_normal(shape, stddev=0.1)
return tf.Variable(weights)
def forwardprop(X, w_1, w_2):
"""
Forward-propagation.
IMPORTANT: yhat is not softmax since TensorFlow's softmax_cross_entropy_with_logits() does that internally.
"""
h = tf.nn.sigmoid(tf.matmul(X, w_1)) # The \sigma function
yhat = tf.matmul(h, w_2) # The \varphi function
return yhat
def get_iris_data():
""" Read the iris data set and split them into training and test sets """
df = pd.read_csv("H:\MiniThessis\Sample.csv")
train_X = np.array(df[df.columns[0:3]])
train_Y = np.array(df[df.columns[3:]])
print(train_X)
# Convert into one-hot vectors
#num_labels = len(np.unique(train_Y))
#all_Y = np.eye(num_labels)[train_Y] # One liner trick!
#print()
return train_test_split(train_X, train_Y, test_size=0.33, random_state=RANDOM_SEED)
def main():
train_X, test_X, train_y, test_y = get_iris_data()
# Layer's sizes
x_size = train_X.shape[1] # Number of input nodes: 4 features and 1 bias
h_size = 256 # Number of hidden nodes
y_size = train_y.shape[1] # Number of outcomes (3 iris flowers)
# Symbols
X = tf.placeholder("float", shape=[None, x_size])
y = tf.placeholder("float", shape=[None, y_size])
# Weight initializations
w_1 = init_weights((x_size, h_size))
w_2 = init_weights((h_size, y_size))
# Forward propagation
yhat = forwardprop(X, w_1, w_2)
predict = tf.argmax(yhat, axis=1)
# Backward propagation
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=yhat))
updates = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
# Run SGD
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
for epoch in range(3):
# Train with each example
for i in range(len(train_X)):
sess.run(updates, feed_dict={X: train_X[i: i + 1], y: train_y[i: i + 1]})
train_accuracy = np.mean(np.argmax(train_y, axis=1) == sess.run(predict, feed_dict={X: train_X, y: train_y}))
test_accuracy = np.mean(np.argmax(test_y, axis=1) ==sess.run(predict, feed_dict={X: test_X, y: test_y}))
print("Epoch = %d, train accuracy = %.2f%%, test accuracy = %.2f%%"
% (epoch + 1, 100. * train_accuracy, 100. * test_accuracy))
correct_Prediction = tf.equal((tf.arg_max(predict,1)),(tf.arg_max(y,1)))
best = sess.run([predict], feed_dict={X: np.array([[20.14, 46.93, 1014.66]])})
#print(correct_Prediction)
print(best)
sess.close()
if __name__ == '__main__':
main()
Note, this answer takes some assumptions on the input, which you did not provide in your original question.
At this point I think your homework will require some more work to be finished. The 4th and 5th column are probably the petal width (a real value) and the Iris type (encoded as a category). This means that as a result you will probably need a real valued output (petal width) and a categorical prediction. This means that your softmax_cross_entry_with_logits will probably not play nicely with the petal width. Also you are applying an argmax as part of the prediction and this willl return the index with the highest value (in this case the petal width or the catergorically encoded value) which also does not make sense. So as a debugging aid why not start with:
print(sess.run([yhat], feed_dict={X: np.array([[20.14, 46.93, 1014.66]])})
This will print out yhat, (based upon the inputs [20.14, 46.93, 1014.66] (that's a massive flower, completely unlike anything inside the dataset) and will contain 2 outputs.
This is a simple example of using LSTM cell from tensor flow. I am generating a sin wave and training my network for ten periods and I'm trying to predict the eleventh period. The predictor values X are one epoch lag of the true y. After training, I save the session to the disk and I restore it at prediction time - this is typical of training and deploying models to production.
When I predict the last period, y_predicted is matching very well the true y.
If I try to predict the sin wave using an arbitrary starting point, (i.e. uncomment line 114)
test_data = test_data[16:]
such that the true values of y would be shifted by a quarter period, it seems like the LSTM prediction still starts at zero and it takes a couple of epochs to catch up with the true values, eventually matching the previous prediction. As a matter of fact it seems that the prediction in the second case is still a full sin wave instead of the 3/4 wave.
What is the reason why this is happening. If I implement a regressor I would like to use it starting with any point.
https://github.com/fbora/mytensorflow/issues/1
import os
import pandas as pd
import numpy as np
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
def sin_signal():
'''
generate a sin function
the train set is ten periods in length
the test set is one additional period
the return variable is in pandas format for easy plotting
'''
phase = np.arange(0, 2*np.pi*11, 0.1)
y = np.sin(phase)
data = pd.DataFrame.from_dict({'phase': phase, 'y':y})
# fill the last element by 0 - it's the end of the period anyways
data['X'] = data.y.shift(-1).fillna(0.0)
train_data = data[data.phase<=2*np.pi*10].copy()
test_data = data[data.phase>2*np.pi*10].copy()
return train_data, test_data
class lstm_model():
def __init__(self, size_x, size_y, num_units=32, num_layers=3, keep_prob=0.5):
# def single_unit():
# return rnn.DropoutWrapper(
# rnn.LSTMCell(num_units), output_keep_prob=keep_prob)
def single_unit():
return rnn.LSTMCell(num_units)
self.graph = tf.Graph()
with self.graph.as_default():
'''input place holders'''
self.X = tf.placeholder(tf.float32, [None, size_x], name='X')
self.y = tf.placeholder(tf.float32, [None, size_y], name='y')
'''network'''
cell = rnn.MultiRNNCell([single_unit() for _ in range(num_layers)])
X = tf.expand_dims(self.X, -1)
val, state = tf.nn.dynamic_rnn(cell, X, time_major=True, dtype=tf.float32)
val = tf.transpose(val, [1, 0, 2])
last = tf.gather(val, int(val.get_shape()[0])-1)
weights = tf.Variable(tf.truncated_normal([num_units, size_y], 0.0, 1.0), name='weights')
bias = tf.Variable(tf.zeros(size_y), name='bias')
predicted_y = tf.nn.xw_plus_b(last, weights, bias, name='predicted_y')
'''optimizer'''
optimizer = tf.train.AdamOptimizer(name='adam_optimizer')
global_step = tf.Variable(0, trainable=False, name='global_step')
self.loss = tf.reduce_mean(tf.squared_difference(predicted_y, self.y), name='mse_loss')
self.train_op = optimizer.minimize(self.loss, global_step=global_step, name='training_op')
'''initializer'''
self.init_op = tf.global_variables_initializer()
class lstm_regressor():
def __init__(self):
if not os.path.isdir('./check_pts'):
os.mkdir('./check_pts')
#staticmethod
def get_shape(dataframe):
df_shape = dataframe.shape
num_rows = df_shape[0]
num_cols = 1 if len(df_shape)<2 else df_shape[1]
return num_rows, num_cols
def train(self, X_train, y_train, iterations):
train_pts, size_x = lstm_regressor.get_shape(X_train)
train_pts, size_y = lstm_regressor.get_shape(y_train)
model = lstm_model(size_x=size_x, size_y=size_y, num_units=32, num_layers=1)
with tf.Session(graph=model.graph) as sess:
sess.run(model.init_op)
saver = tf.train.Saver()
feed_dict={
model.X: X_train.values.reshape(-1, size_x),
model.y: y_train.values.reshape(-1, size_y)
}
for step in range(iterations):
_, loss = sess.run([model.train_op, model.loss], feed_dict=feed_dict)
if step%100==0:
print('step={}, loss={}'.format(step, loss))
saver.save(sess, './check_pts/lstm')
def predict(self, X_test):
test_pts, size_x = lstm_regressor.get_shape(X_test)
X_np = X_test.values.reshape(-1, size_x)
graph = tf.Graph()
with graph.as_default():
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver = tf.train.import_meta_graph('./check_pts/lstm.meta')
saver.restore(sess, './check_pts/lstm')
X = graph.get_tensor_by_name('X:0')
y_tf = graph.get_tensor_by_name('predicted_y:0')
y_np = sess.run(y_tf, feed_dict={X: X_np})
return y_np.reshape(test_pts)
def main():
train_data, test_data = sin_signal()
regressor = lstm_regressor()
regressor.train(train_data.X, train_data.y, iterations=1000)
# test_data = test_data[16:]
y_predicted = regressor.predict(test_data.X)
test_data['y_predicted'] = y_predicted
test_data[['y', 'y_predicted']].plot()
if __name__ == '__main__':
main()
I suspect that since you are starting your predictions at an arbitrary starting point in the future, there is a gap of values between what your model was trained on and what it is starting to see for predictions, and the State of your LSTM has not updated with the values in that gap?
*** UPDATE:
In your code, you have this:
val, state = tf.nn.dynamic_rnn(cell, X, time_major=True, dtype=tf.float32)
and then during training this:
_, loss = sess.run([model.train_op, model.loss], feed_dict=feed_dict)
I would suggest feeding the initial State into dynamic_rnn and re-feeding the updated state at each training iteration, something like this:
inState = tf.placeholder(tf.float32, [YOUR_DIMENSIONS], name='inState')
val, state = tf.nn.dynamic_rnn(cell, X, time_major=True, dtype=tf.float32, initial_state=inState)
And during training:
iState = np.zeros([YOUR_DIMENSIONS])
feed_dict={
model.X: X_train.values.reshape(-1, size_x),
model.y: y_train.values.reshape(-1, size_y),
inState: iState # feed initial value for state placeholder
}
_, loss, oState = sess.run([model.train_op, model.loss, model.state], feed_dict=feed_dict) # run one additional variable from the session
iState = oState # assign latest out-state to be re-fed as in-state
So, this way your model not only learns the parameters during training, but also keeps track of everything that it's seen during training in the State. NOW, you save this State with the rest of your session and use it during the prediction stage.
The small difficulty with this is that technically this State is a placeholder, so it won't be saved in the Graph automatically in my experience. So you create another variable manually at the end of training and assign the State to it; this way it is saved in the graph for later:
# make sure this variable is declared BEFORE the saver is declared
savedState = tf.get_variable('savedState', shape=[YOUR_DIMENSIONS])
# then, at the end of training:
assignOp = tf.assign(savedState, oState)
sess.run(assignOp)
# now save your graph
So now once you restore the Graph, if you want to start your predictions after some artificial gap, then somehow you still have to run your model through this gap so as to update the state. In my case, I just run one dummy prediction for the whole gap, just so as to update the state, and then you continue at your normal intervals from here.
Hope this helps...
I have this problem that after one iteration nearly all my parameters (cost function, weights, hypothesis function, etc.) output 'NaN'. My code is similar to the tensorflow tutorial MNIST-Expert (https://www.tensorflow.org/versions/r0.9/tutorials/mnist/pros/index.html). I looked for solutions already and so far I tried: reducing the learning rate to nearly zero and setting it to zero, using AdamOptimizer instead of gradient descent, using sigmoid function for the hypothesis function in the last layer and using only numpy functions. I have some negative and zero values in my input data, so I can't use the logarithmic cross entropy instead of the quadratic cost function. The result is the same, butMy input data consist of stresses and strains of soils.
import tensorflow as tf
import Datafiles3_pv_complete as soil
import numpy as np
m_training = int(18.0)
m_cv = int(5.0)
m_test = int(5.0)
total_examples = 28
" range for running "
range_training = xrange(0,m_training)
range_cv = xrange(m_training,(m_training+m_cv))
range_test = xrange((m_training+m_cv),total_examples)
""" Using interactive Sessions"""
sess = tf.InteractiveSession()
""" creating input and output vectors """
x = tf.placeholder(tf.float32, shape=[None, 11])
y_true = tf.placeholder(tf.float32, shape=[None, 3])
""" Standard Deviation Calculation"""
stdev = np.divide(2.0,np.sqrt(np.prod(x.get_shape().as_list()[1:])))
""" Weights and Biases """
def weights(shape):
initial = tf.truncated_normal(shape, stddev=stdev)
return tf.Variable(initial)
def bias(shape):
initial = tf.truncated_normal(shape, stddev=1.0)
return tf.Variable(initial)
""" Creating weights and biases for all layers """
theta1 = weights([11,7])
bias1 = bias([1,7])
theta2 = weights([7,7])
bias2 = bias([1,7])
"Last layer"
theta3 = weights([7,3])
bias3 = bias([1,3])
""" Hidden layer input (Sum of weights, activation functions and bias)
z = theta^T * activation + bias
"""
def Z_Layer(activation,theta,bias):
return tf.add(tf.matmul(activation,theta),bias)
""" Creating the sigmoid function
sigmoid = 1 / (1 + exp(-z))
"""
def Sigmoid(z):
return tf.div(tf.constant(1.0),tf.add(tf.constant(1.0), tf.exp(tf.neg(z))))
""" hypothesis functions - predicted output """
' layer 1 - input layer '
hyp1 = x
' layer 2 '
z2 = Z_Layer(hyp1, theta1, bias1)
hyp2 = Sigmoid(z2)
' layer 3 '
z3 = Z_Layer(hyp2, theta2, bias2)
hyp3 = Sigmoid(z3)
' layer 4 - output layer '
zL = Z_Layer(hyp3, theta3, bias3)
hypL = tf.add( tf.add(tf.pow(zL,3), tf.pow(zL,2) ), zL)
""" Cost function """
cost_function = tf.mul( tf.div(0.5, m_training), tf.pow( tf.sub(hypL, y_true), 2))
#cross_entropy = -tf.reduce_sum(y_true*tf.log(hypL) + (1-y_true)*tf.log(1-hypL))
""" Gradient Descent """
train_step = tf.train.GradientDescentOptimizer(learning_rate=0.003).minimize(cost_function)
""" Training and Evaluation """
correct_prediction = tf.equal(tf.arg_max(hypL, 1), tf.arg_max(y_true, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.initialize_all_variables())
keep_prob = tf.placeholder(tf.float32)
""" Testing - Initialise lists """
hyp1_test = []
z2_test = []
hyp2_test = []
z3_test = []
hyp3_test = []
zL_test = []
hypL_test = []
cost_function_test =[]
complete_error_test = []
theta1_test = []
theta2_test = []
theta3_test = []
bias1_test = []
bias2_test = []
bias3_test = []
""" ------------------------- """
complete_error_init = tf.abs(tf.reduce_mean(tf.sub(hypL,y_true),1))
training_error=[]
for j in range_training:
feedj = {x: soil.input_scale[j], y_true: soil.output_scale[j] , keep_prob: 1.0}
""" ------------------------- """
'Testing - adding to list'
z2_init = z2.eval(feed_dict=feedj)
z2_test.append(z2_init)
hyp2_init = hyp2.eval(feed_dict=feedj)
hyp2_test.append(hyp2_init)
z3_init = z3.eval(feed_dict=feedj)
z3_test.append(z3_init)
hyp3_init = hyp3.eval(feed_dict=feedj)
hyp3_test.append(hyp3_init)
zL_init = zL.eval(feed_dict=feedj)
zL_test.append(zL_init)
hypL_init = hypL.eval(feed_dict=feedj)
hypL_test.append(hypL_init)
cost_function_init = cost_function.eval(feed_dict=feedj)
cost_function_test.append(cost_function_init)
complete_error = complete_error_init.eval(feed_dict=feedj)
complete_error_test.append(complete_error)
print 'number iterations: %g, error (S1, S2, S3): %g, %g, %g' % (j, complete_error[0], complete_error[1], complete_error[2])
theta1_init = theta1.eval()
theta1_test.append(theta1_init)
theta2_init = theta2.eval()
theta2_test.append(theta2_init)
theta3_init = theta3.eval()
theta3_test.append(theta3_init)
bias1_init = bias1.eval()
bias1_test.append(bias1_init)
bias2_init = bias2.eval()
bias2_test.append(bias2_init)
bias3_init = bias3.eval()
bias3_test.append(bias3_init)
""" ------------------------- """
train_accuracy = accuracy.eval(feed_dict=feedj)
print("step %d, training accuracy %g" % (j, train_accuracy))
train_step.run(feed_dict=feedj)
training_error.append(1 - train_accuracy)
cv_error=[]
for k in range_cv:
feedk = {x: soil.input_scale[k], y_true: soil.output_scale[k] , keep_prob: 1.0}
cv_accuracy = accuracy.eval(feed_dict=feedk)
print("cross-validation accuracy %g" % cv_accuracy)
cv_error.append(1-cv_accuracy)
for l in range_test:
print("test accuracy %g" % accuracy.eval(feed_dict={x: soil.input_matrixs[l], y_true: soil.output_matrixs[l], keep_prob: 1.0}))
The last weeks I was working on a Unit-model for this problem, but the same output occurred. I have no idea what to try next. Hope someone can help me.
Edit:
I checked some parameters in detail again. The hypothesis function (hyp) and activation function (z) for layer 3 and 4 (last layer) have the same entries for each data point, i.e. the same value in each line for one column.
1e^-3 is still fairly high, for the classifier you've described. NaN actually means that the weights have tended to infinity, so I would suggest exploring even lower learning rates, around 1e^-7 specifically. If it continues to diverge, multiply your learning rate by 0.1, and repeat until the weights are finite-valued.
Finally, no more NaN values. The solution is to scale my input and output data. The result (accuracy) is still not good, but at least I get some real values for the parameters. I tried feature scaling before in other attempts (where I probably had some other mistakes as well) and assumed it wouldn't help with my problem either.