Getting the opposite outputs from Tensorflow learn with OR gate

Getting the opposite outputs from Tensorflow learn with OR gate - python

Given the DNN (simple case of multilayered perceptron) with 2 hidden layers of 5 and 3 dimensions respectively, I am training a model to recognize the OR gate.
Using tensorflow learn, it seems like it's giving me the reverse output and I have no idea why:
from tensorflow.contrib import learn
classifier = learn.DNNClassifier(hidden_units=[5, 3], n_classes=2)
or_input = np.array([[0.,0.], [0.,1.], [1.,0.]])
or_output = np.array([[0,1,1]]).T
classifier.fit(or_input, or_output, steps=0.05, batch_size=3)
classifier.predict(np.array([ [1., 1.], [1., 0.] , [0., 0.] , [0., 1.]]))
[out]:
array([0, 0, 1, 0])
If I'm doing it "old-school", without the tensorflow.learn as follows, I get the expected answer.
import tensorflow as tf
# Parameters
learning_rate = 1.0
num_epochs = 1000
# Network Parameters
input_dim = 2 # Input dimensions.
hidden_dim_1 = 5 # 1st layer number of features
hidden_dim_2 = 3 # 2nd layer number of features
output_dim = 1 # Output dimensions.
# tf Graph input
x = tf.placeholder("float", [None, input_dim])
y = tf.placeholder("float", [hidden_dim_2, output_dim])
# With biases.
weights = {
'syn0': tf.Variable(tf.random_normal([input_dim, hidden_dim_1])),
'syn1': tf.Variable(tf.random_normal([hidden_dim_1, hidden_dim_2])),
'syn2': tf.Variable(tf.random_normal([hidden_dim_2, output_dim]))
}
biases = {
'b0': tf.Variable(tf.random_normal([hidden_dim_1])),
'b1': tf.Variable(tf.random_normal([hidden_dim_2])),
'b2': tf.Variable(tf.random_normal([output_dim]))
}
# Create a model
def multilayer_perceptron(X, weights, biases):
# Hidden layer 1 + sigmoid activation function
layer_1 = tf.add(tf.matmul(X, weights['syn0']), biases['b0'])
layer_1 = tf.nn.sigmoid(layer_1)
# Hidden layer 2 + sigmoid activation function
layer_2 = tf.add(tf.matmul(layer_1, weights['syn1']), biases['b1'])
layer_2 = tf.nn.sigmoid(layer_2)
# Output layer
out_layer = tf.matmul(layer_2, weights['syn2']) + biases['b2']
out_layer = tf.nn.sigmoid(out_layer)
return out_layer
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
cost = tf.sub(y, pred)
# Or you can use fancy cost like:
##tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
init = tf.initialize_all_variables()
or_input = np.array([[0.,0.], [0.,1.], [1.,0.]])
or_output = np.array([[0.,1.,1.]]).T
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(num_epochs):
batch_x, batch_y = or_input, or_output # Loop over all data points.
# Run optimization op (backprop) and cost op (to get loss value)
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})
#print (c)
# Now let's test it on the unknown dataset.
new_inputs = np.array([[1.,1.], [1.,0.]])
feed_dict = {x: new_inputs}
predictions = sess.run(pred, feed_dict)
print (predictions)
[out]:
[[ 0.99998868]
[ 0.99998868]]
Why is it that I am getting the reversed output using tensorflow.learn? Am I doing something wrongly using the tensorflow.learn?
How do I get the tensorflow.learn code to produce the same output as the "old-school" tensorflow framework?

If you specify the right argument for steps you get the good results:
classifier.fit(or_input, or_output, steps=1000, batch_size=3)
Result:
array([1, 1, 0, 1])
How does steps work
The steps argument specifies the number of times you run the training operation. Let me give you some examples:
with batch_size = 16 and steps = 10, you will see a total of 160 examples
in your example, batch_size = 3 and steps = 1000, the algorithm will see 3000 examples. In fact, it will see 1000 times the same 3 examples you provided
So, steps is not the number of epochs, it is the number of times you run the training op, or the number of times you see a new batch.
Why is steps = 0.05 allowed?
In the tf.learn code, they don't check if steps is an integer. They just run a while loop checking that (at this line):
last_step < max_steps
So if max_steps = 0.05, it will behave the same as if max_steps = 1 (last_step is incremented in the loop).

Related

Why does this RNN in tensorflow not learn?

I am trying to train an RNN without using the RNN API in tensorflow (2) in Python 3.7, so the code is very basic. Something is going really wrong, but I'm not sure what it is.
As a reference, I am using a dataset from this tensorflow tutorial so I know what the error should roughly converge to. My RNN code is the following. What it is trying to do is use the previous 20 timesteps to predict the value of a series at the 21st timestep. I am training in batches of size 256.
While there is a decrease in loss over time, the ceiling is approximately 10x what it is if I follow the tutorial approach. Could it be some problem with the backpropagation through time?
state_size = 20 #dimensionality of the network
BATCH_SIZE = 256
#define recurrent weights and biases. W has 1 more dimension that the state
#dimension as also processes the inputs
W = tf.Variable(np.random.rand(state_size+1, state_size), dtype=tf.float32)
b = tf.Variable(np.zeros((1,state_size)), dtype=tf.float32)
#weights and biases for the output
W2 = tf.Variable(np.random.rand(state_size, 1),dtype=tf.float32)
b2 = tf.Variable(np.zeros((1,1)), dtype=tf.float32)
init_state = tf.Variable(np.random.normal(size=[BATCH_SIZE,state_size]),dtype='float32')
optimizer = tf.keras.optimizers.Adam(1e-3)
losses = []
for epoch in range(20):
with tf.GradientTape() as tape:
loss = 0
for batch_idx in range(200):
current_state = init_state
batchx = x_train_uni[batch_idx*BATCH_SIZE:(batch_idx+1)*BATCH_SIZE].swapaxes(0,1)
batchy = y_train_uni[batch_idx*BATCH_SIZE:(batch_idx+1)*BATCH_SIZE]
#forward pass through the timesteps
for x in batchx:
inst = tf.concat([current_state,x],1) #concatenate state and inputs for that timepoint
current_state = tf.tanh(tf.matmul(inst, W) + b) #
#predict using the hidden state after the full forward pass
pred = tf.matmul(current_state,W2) + b2
loss += tf.reduce_mean(tf.abs(batchy-pred))
#get gradients with respect to parameters
gradients = tape.gradient(loss, [W,b,W2,b2])
#apply gradients
optimizer.apply_gradients(zip(gradients, [W,b,W2,b2]))
losses.append(loss)
print(loss)

Tensorflow's loss function returns NAN after changing RNN to LSTM cell

I am training a model to predict Time Series using an RNN model. This model is trained without any issue. Here's the original code:
tf.reset_default_graph()
num_inputs = 1
num_neurons = 100
num_outputs = 1
learning_rate = 0.0001
num_train_iterations = 2000
batch_size = 1
X = tf.placeholder(tf.float32, [None, time_steps-1, num_inputs])
y = tf.placeholder(tf.float32, [None, time_steps-1, num_outputs])
cell = tf.contrib.rnn.OutputProjectionWrapper(
tf.contrib.rnn.BasicRNNCell(num_units=num_neurons, activation=tf.nn.relu),
output_size=num_outputs)
outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.75)
with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
sess.run(init)
for iteration in range(num_train_iterations):
elx,ely = next_batch(training_data, time_steps)
sess.run(train, feed_dict={X: elx, y: ely})
if iteration % 100 == 0:
mse = loss.eval(feed_dict={X: elx, y: ely})
print(iteration, "\tMSE:", mse)
The problem comes when I change tf.contrib.rnn.BasicRNNCell to tf.contrib.rnn.BasicLSTMCell, there's a huge slowdown in speed and the loss function (MSE variable becomes NAN). My best bet is that MSE is the incorrect loss function and that I should try cross entropy. I searched for similar code and found that tf.nn.softmax_cross_entropy_with_logits() could be the solution but still don't understand how to implement it in my problem.

Usually the "NAN" occurs when your gradients blow up.
Here is some code for tf.softmax. Have a try.
#Output Layer
logit = tf.add(tf.matmul(H1,w2),b2)
cross_entropy =
tf.nn.softmax_cross_entropy_with_logits(logits=logit,labels=Y)
#Cost
cost = (tf.reduce_mean(cross_entropy))
#Optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
#Prediction
y_pred = tf.nn.softmax(logit)
pred = tf.argmax(y_pred, axis=1 )

MLP(ReLu) stops learning after few iterations. Tensor Flow

2 layers MLP (Relu) + Softmax
After 20 iterations, Tensor Flow just gives up and stops updating any weights or biases.
I initially thought that my ReLu where dying, so I displayed histograms to make sure none of them where 0. And none of them are !
They just stop changing after few iterations and cross entropy is still high. ReLu, Sigmoid and tanh gives the same results. Tweaking GradientDescentOptimizer from 0.01 to 0.5 also doesn't change much.
There has to be a bug somewhere. Like an actual bug in my code. I can't even overfit a small sample set !
Here are my histograms and here's my code, if anyone could check it out, that would be a major help.
We have 3000 scalars with 6 values between 0 and 255
to classify in two classes : [1,0] or [0,1]
(I made sure to randomise the order)
def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
with tf.name_scope(layer_name):
weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=1.0 / math.sqrt(float(6))))
tf.summary.histogram('weights', weights)
biases = tf.Variable(tf.constant(0.4, shape=[output_dim]))
tf.summary.histogram('biases', biases)
preactivate = tf.matmul(input_tensor, weights) + biases
tf.summary.histogram('pre_activations', preactivate)
#act=tf.nn.relu
activations = act(preactivate, name='activation')
tf.summary.histogram('activations', activations)
return activations
#We have 3000 scalars with 6 values between 0 and 255 to classify in two classes
x = tf.placeholder(tf.float32, [None, 6])
y = tf.placeholder(tf.float32, [None, 2])
#After normalisation, input is between 0 and 1
normalised = tf.scalar_mul(1/255,x)
#Two layers
hidden1 = nn_layer(normalised, 6, 4, "hidden1")
hidden2 = nn_layer(hidden1, 4, 2, "hidden2")
#Finish by a softmax
softmax = tf.nn.softmax(hidden2)
#Defining loss, accuracy etc..
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=softmax))
tf.summary.scalar('cross_entropy', cross_entropy)
correct_prediction = tf.equal(tf.argmax(softmax, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', accuracy)
#Init session and writers and misc
session = tf.Session()
train_writer = tf.summary.FileWriter('log', session.graph)
train_writer.add_graph(session.graph)
init= tf.global_variables_initializer()
session.run(init)
merged = tf.summary.merge_all()
#Train
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
batch_x, batch_y = self.trainData
for _ in range(1000):
session.run(train_step, {x: batch_x, y: batch_y})
#Every 10 steps, add to the summary
if _ % 10 == 0:
s = session.run(merged, {x: batch_x, y: batch_y})
train_writer.add_summary(s, _)
#Evaluate
evaluate_x, evaluate_y = self.evaluateData
print(session.run(accuracy, {x: batch_x, y: batch_y}))
print(session.run(accuracy, {x: evaluate_x, y: evaluate_y}))
Hidden Layer 1. Output isn't zero, so that's not a dying ReLu problem. but still, weights are constant! TF didn't even try to modify them
Same for Hidden Layer 2. TF tried tweaking them a bit and gave up pretty fast.
Cross entropy does decrease, but stays staggeringly high.
EDIT :
LOTS of mistakes in my code.
First one is 1/255 = 0 in python... Changed it to 1.0/255.0 and my code started to live.
So basically, my input was multiplied by 0 and the neural network just was purely blind. So he tried to get the best result he could while being blind and then gave up. Which explains totally it's reaction.
Now I was applying a softmax twice... Modifying it helped also.
And by strying different learning rates and different number of epoch I finally found something good.
Here is the final working code :
def runModel(self):
def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
with tf.name_scope(layer_name):
#This is standard weight for neural networks with ReLu.
#I divide by math.sqrt(float(6)) because my input has 6 values
weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=1.0 / math.sqrt(float(6))))
tf.summary.histogram('weights', weights)
#I chose this bias myself. It work. Not sure why.
biases = tf.Variable(tf.constant(0.4, shape=[output_dim]))
tf.summary.histogram('biases', biases)
preactivate = tf.matmul(input_tensor, weights) + biases
tf.summary.histogram('pre_activations', preactivate)
#Some neurons will have ReLu as activation function
#Some won't have any activation functions
if act == "None":
activations = preactivate
else :
activations = act(preactivate, name='activation')
tf.summary.histogram('activations', activations)
return activations
#We have 3000 scalars with 6 values between 0 and 255 to classify in two classes
x = tf.placeholder(tf.float32, [None, 6])
y = tf.placeholder(tf.float32, [None, 2])
#After normalisation, input is between 0 and 1
#Normalising input really helps. Nothing is doable without it
#But my ERROR was to write 1/255. Becase in python
#1/255 = 0 .... (integer division)
#But 1.0/255.0 = 0,003921568 (float division)
normalised = tf.scalar_mul(1.0/255.0,x)
#Three layers total. The first one is just a matrix multiplication
input = nn_layer(normalised, 6, 4, "input", act="None")
#The second one has a ReLu after a matrix multiplication
hidden1 = nn_layer(input, 4, 4, "hidden", act=tf.nn.relu)
#The last one is also jsut a matrix multiplcation
#WARNING ! No softmax here ! Because later we call a function
#That implicitly does a softmax
#And it's bad practice to do two softmax one after the other
output = nn_layer(hidden1, 4, 2, "output", act="None")
#Tried different learning rates
#Higher learning rate means find a result faster
#But could be a local minimum
#Lower learning rate means we need much more epochs
learning_rate = 0.03
with tf.name_scope('learning_rate_'+str(learning_rate)):
#Defining loss, accuracy etc..
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output))
tf.summary.scalar('cross_entropy', cross_entropy)
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', accuracy)
#Init session and writers and misc
session = tf.Session()
train_writer = tf.summary.FileWriter('log', session.graph)
train_writer.add_graph(session.graph)
init= tf.global_variables_initializer()
session.run(init)
merged = tf.summary.merge_all()
#Train
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
batch_x, batch_y = self.trainData
for _ in range(1000):
session.run(train_step, {x: batch_x, y: batch_y})
#Every 10 steps, add to the summary
if _ % 10 == 0:
s = session.run(merged, {x: batch_x, y: batch_y})
train_writer.add_summary(s, _)
#Evaluate
evaluate_x, evaluate_y = self.evaluateData
print(session.run(accuracy, {x: batch_x, y: batch_y}))
print(session.run(accuracy, {x: evaluate_x, y: evaluate_y}))

I'm afraid that you have to reduce your learning rate. It's to high. High learning rate usually leads you to local minimum not global one.
Try 0.001, 0.0001 or even 0.00001. Or make your learning rate flexible.
I did not checked the code, so firstly try to tune LR.

Just incase someone needs it in the future:
I had initialized my dual layer network's layers with np.random.randn but the network refused to learn. Using the He (for ReLU) and Xavier(for softmax) initializations totally worked.

LSTM-RNN: num_classes usage

I am using LSTM RNN to detect whether a heart beat is arrhythmic or not. So the output classes are:[0,1] and n_classes=2, but when this code is executed:
# Fit training using batch data
_, loss, acc = sess.run(
[optimizer, cost, accuracy],
feed_dict={
x: batch_xs,
y: batch_ys
}
)
It gives following error
ValueError: Cannot feed value of shape (1, 1) for Tensor 'Placeholder_1:0', which has shape '(?, 2)'
Here is the whole code:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import tensorflow as tf # Version 1.0.0 (some previous versions are used in past commits)
from sklearn import metrics
import _pickle as cPickle
import os
import pandas as pd
import functions as f
[ml2_train_input,ml2_train_output,ml2_train_peaks,ml2_test_input,ml2_test_output,ml2_test_peaks]=f.get_ml2(0.5)
ml2_train_output=f.get_binary_output(ml2_train_output[:52500])
ml2_test_output=f.get_binary_output(ml2_test_output[:52500])
# Output classes to learn how to classify
LABELS = [0,1 ]
training_data_count = len(ml2_train_input[:52500]) # training series
test_data_count = len(ml2_test_input[:52500]) # testing series
n_input = 360 # 360 input parameters per timestep
# LSTM Neural Network's internal structure
n_hidden = 8 # Hidden layer num of features
n_classes = 2 # Total classes
# Training
learning_rate = 0.005
lambda_loss_amount = 0.0015
training_iters = training_data_count * 10 # Loop 10 times on the dataset
batch_size = 500
display_iter = 1000 # To show test set accuracy during training
X_test=np.array(ml2_test_input[:52500])
y_test=np.array(ml2_test_output[:52500])
# Some debugging info
print("Some useful info to get an insight on dataset's shape and normalisation:")
print("(X shape, y shape, every X's mean, every X's standard deviation)")
print(X_test.shape, y_test.shape, np.mean(X_test), np.std(X_test))
print("The dataset is therefore properly normalised, as expected, but not yet one-hot encoded.")
def LSTM_RNN(_X, _weights, _biases):
# Function returns a tensorflow LSTM (RNN) artificial neural network from given parameters.
# Moreover, two LSTM cells are stacked which adds deepness to the neural network.
# Note, some code of this notebook is inspired from an slightly different
# RNN architecture used on another dataset, some of the credits goes to
# "aymericdamien" under the MIT license.
# (NOTE: This step could be greatly optimised by shaping the dataset once
# input shape: (batch_size, n_steps, n_input)
# permute n_steps and batch_size
# Reshape to prepare input to hidden activation
#_X = tf.reshape(_X, [-1, n_input])
# new shape: (n_steps*batch_size, n_input)
# Linear activation
_X = tf.nn.relu(tf.matmul(_X, _weights['hidden']) + _biases['hidden'])
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(_X, 500,0)
# new shape: n_steps * (batch_size, n_hidden)
# Define two stacked LSTM cells (two recurrent layers deep) with tensorflow
lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True,reuse=None)
lstm_cell_2 = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True,reuse=None)
lstm_cells = tf.contrib.rnn.MultiRNNCell([lstm_cell_1, lstm_cell_2], state_is_tuple=True)
# Get LSTM cell output
outputs, states = tf.contrib.rnn.static_rnn(lstm_cells, _X, dtype=tf.float32)
# Get last time step's output feature for a "many to one" style classifier,
# as in the image describing RNNs at the top of this page
lstm_last_output = outputs[-1]
# Linear activation
return tf.matmul(lstm_last_output, _weights['out']) + _biases['out']
def extract_batch_size(_train, step, batch_size):
# Function to fetch a "batch_size" amount of data from "(X|y)_train" data.
shape = list(_train.shape)
shape[0] = batch_size
batch_s = np.empty(shape)
for i in range(batch_size):
# Loop index
index = ((step-1)*batch_size + i) % len(_train)
batch_s[i] = _train[index]
return batch_s
def one_hot(y_):
# Function to encode output labels from number indexes
# e.g.: [[5], [0], [3]] --> [[0, 0, 0, 0, 0, 1], [1, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0]]
y_ = y_.reshape(len(y_))
n_values = int(np.max(y_)) + 1
return np.eye(n_values)[np.array(y_, dtype=np.int32)] # Returns FLOATS
# Graph input/output
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
# Graph weights
weights = {
'hidden': tf.Variable(tf.random_normal([n_input, n_hidden])), # Hidden layer weights
'out': tf.Variable(tf.random_normal([n_hidden, n_classes], mean=1.0))
}
biases = {
'hidden': tf.Variable(tf.random_normal([n_hidden])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
pred = LSTM_RNN(x, weights, biases)
# Loss, optimizer and evaluation
l2 = lambda_loss_amount * sum(
tf.nn.l2_loss(tf_var) for tf_var in tf.trainable_variables()
) # L2 loss prevents this overkill neural network to overfit the data
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=pred)) + l2 # Softmax loss
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) # Adam Optimizer
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# To keep track of training's performance
test_losses = []
test_accuracies = []
train_losses = []
train_accuracies = []
# Launch the graph
sess = tf.InteractiveSession(config=tf.ConfigProto(log_device_placement=True))
init = tf.global_variables_initializer()
sess.run(init)
X_train=np.array(ml2_train_input[:52500])
y_train=np.array(ml2_train_output[:52500])
step = 1
while step * batch_size <= training_iters:
batch_xs = extract_batch_size(X_train, step, batch_size)
batch_ys = one_hot(extract_batch_size(y_train, step, batch_size))
# Fit training using batch data
_, loss, acc = sess.run(
[optimizer, cost, accuracy],
feed_dict={
x: batch_xs,
y: batch_ys
}
)
train_losses.append(loss)
train_accuracies.append(acc)
# Evaluate network only at some steps for faster training:
if (step*batch_size % display_iter == 0) or (step == 1) or (step * batch_size > training_iters):
# To not spam console, show training accuracy/loss in this "if"
print("Training iter #" + str(step*batch_size) + \
": Batch Loss = " + "{:.6f}".format(loss) + \
", Accuracy = {}".format(acc))
# Evaluation on the test set (no learning made here - just evaluation for diagnosis)
loss, acc = sess.run(
[cost, accuracy],
feed_dict={
x: X_test,
y: one_hot(y_test)
}
)
test_losses.append(loss)
test_accuracies.append(acc)
print("PERFORMANCE ON TEST SET: " + \
"Batch Loss = {}".format(loss) + \
", Accuracy = {}".format(acc))
step += 1
print("Optimization Finished!")
Please help!

I feel you should convert your Y values to categorical (one-hot encoded) than it should work. So try to convert your Y values to categorical

tensorflow deep neural network for regression always predict same results in one batch

I use a tensorflow to implement a simple multi-layer perceptron for regression. The code is modified from standard mnist classifier, that I only changed the output cost to MSE (use tf.reduce_mean(tf.square(pred-y))), and some input, output size settings. However, if I train the network using regression, after several epochs, the output batch are totally the same. for example:
target: 48.129, estimated: 42.634
target: 46.590, estimated: 42.634
target: 34.209, estimated: 42.634
target: 69.677, estimated: 42.634
......
I have tried different batch size, different initialization, input normalization using sklearn.preprocessing.scale (my inputs range are quite different). However, none of them worked. I have also tried one of sklearn example from Tensorflow (Deep Neural Network Regression with Boston Data). But I got another error in line 40:
'module' object has no attribute 'infer_real_valued_columns_from_input'
Anyone has clues on where the problem is? Thank you
My code is listed below, may be a little bit long, but very straghtforward:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from tensorflow.contrib import learn
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn import datasets, linear_model
from sklearn import cross_validation
import numpy as np
boston = learn.datasets.load_dataset('boston')
x, y = boston.data, boston.target
X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(
x, y, test_size=0.2, random_state=42)
total_len = X_train.shape[0]
# Parameters
learning_rate = 0.001
training_epochs = 500
batch_size = 10
display_step = 1
dropout_rate = 0.9
# Network Parameters
n_hidden_1 = 32 # 1st layer number of features
n_hidden_2 = 200 # 2nd layer number of features
n_hidden_3 = 200
n_hidden_4 = 256
n_input = X_train.shape[1]
n_classes = 1
# tf Graph input
x = tf.placeholder("float", [None, 13])
y = tf.placeholder("float", [None])
# Create model
def multilayer_perceptron(x, weights, biases):
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Hidden layer with RELU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Hidden layer with RELU activation
layer_3 = tf.add(tf.matmul(layer_2, weights['h3']), biases['b3'])
layer_3 = tf.nn.relu(layer_3)
# Hidden layer with RELU activation
layer_4 = tf.add(tf.matmul(layer_3, weights['h4']), biases['b4'])
layer_4 = tf.nn.relu(layer_4)
# Output layer with linear activation
out_layer = tf.matmul(layer_4, weights['out']) + biases['out']
return out_layer
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1], 0, 0.1)),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2], 0, 0.1)),
'h3': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3], 0, 0.1)),
'h4': tf.Variable(tf.random_normal([n_hidden_3, n_hidden_4], 0, 0.1)),
'out': tf.Variable(tf.random_normal([n_hidden_4, n_classes], 0, 0.1))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1], 0, 0.1)),
'b2': tf.Variable(tf.random_normal([n_hidden_2], 0, 0.1)),
'b3': tf.Variable(tf.random_normal([n_hidden_3], 0, 0.1)),
'b4': tf.Variable(tf.random_normal([n_hidden_4], 0, 0.1)),
'out': tf.Variable(tf.random_normal([n_classes], 0, 0.1))
}
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.square(pred-y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Launch the graph
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(total_len/batch_size)
# Loop over all batches
for i in range(total_batch-1):
batch_x = X_train[i*batch_size:(i+1)*batch_size]
batch_y = Y_train[i*batch_size:(i+1)*batch_size]
# Run optimization op (backprop) and cost op (to get loss value)
_, c, p = sess.run([optimizer, cost, pred], feed_dict={x: batch_x,
y: batch_y})
# Compute average loss
avg_cost += c / total_batch
# sample prediction
label_value = batch_y
estimate = p
err = label_value-estimate
print ("num batch:", total_batch)
# Display logs per epoch step
if epoch % display_step == 0:
print ("Epoch:", '%04d' % (epoch+1), "cost=", \
"{:.9f}".format(avg_cost))
print ("[*]----------------------------")
for i in xrange(3):
print ("label value:", label_value[i], \
"estimated value:", estimate[i])
print ("[*]============================")
print ("Optimization Finished!")
# Test model
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print ("Accuracy:", accuracy.eval({x: X_test, y: Y_test}))

Short answer:
Transpose your pred vector using tf.transpose(pred).
Longer answer:
The problem is that pred (the predictions) and y (the labels) are not of the same shape: one is a row vector and the other a column vector. Apparently when you apply an element-wise operation on them, you'll get a matrix, which is not what you want.
The solution is to transpose the prediction vector using tf.transpose() to get a proper vector and thus a proper loss function. Actually, if you set the batch size to 1 in your example you'll see that it works even without the fix, because transposing a 1x1 vector is a no-op.
I applied this fix to your example code and observed the following behaviour. Before the fix:
Epoch: 0245 cost= 84.743440580
[*]----------------------------
label value: 23 estimated value: [ 27.47437096]
label value: 50 estimated value: [ 24.71126747]
label value: 22 estimated value: [ 23.87785912]
And after the fix at the same point in time:
Epoch: 0245 cost= 4.181439120
[*]----------------------------
label value: 23 estimated value: [ 21.64333534]
label value: 50 estimated value: [ 48.76105118]
label value: 22 estimated value: [ 24.27996063]
You'll see that the cost is much lower and that it actually learned the value 50 properly. You'll have to do some fine-tuning on the learning rate and such to improve your results of course.

There is likely a problem with your dataset loading or indexing implementation. If you only modified the cost to MSE, make sure pred and y are correctly being updated and you did not overwrite them with a different graph operation.
Another thing to help debug would be to predict the actual regression outputs. It would also help if you posted more of your code so we can see your specific data loading implementation, etc.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting the opposite outputs from Tensorflow learn with OR gate - python

Related

Why does this RNN in tensorflow not learn?

Tensorflow's loss function returns NAN after changing RNN to LSTM cell

MLP(ReLu) stops learning after few iterations. Tensor Flow

LSTM-RNN: num_classes usage

tensorflow deep neural network for regression always predict same results in one batch

Categories

Resources