I have a csv data set with 2 columns, an input and output column and when I use Excel to to find the trend line, I get:
y = -0.4571x + 0.9011
When I run the following code w and converge to different values depending on the learning rate and batch size I choose. I have played around with different values without any luck. Maybe I am missing something perhaps?
The cost doesn't seem to change either.
learningRate = 0.001
epochs = 2000
batchSize = 20
df = pd.read_csv("C:\\Users\\Brian\\Desktop\\data.csv")
X = df[df.columns[0]].values
Y = df[df.columns[1]].values
def getBatch(batchSize, inputs, outputs):
idx = np.arange(0,len(inputs))
np.random.shuffle(idx)
idx = idx[:batchSize]
xBatch = [inputs[i] for i in idx]
yBatch = [outputs[i] for i in idx]
xBatch = np.reshape(xBatch, (batchSize,1))
return np.asarray(xBatch), np.asarray(yBatch)
w = tf.Variable(0.0, tf.float32)
b = tf.Variable(0.0, tf.float32)
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
prediction = tf.add(tf.multiply(x,w), b)
cost = tf.reduce_sum(tf.square(prediction-y))
optimizer = tf.train.GradientDescentOptimizer(learningRate).minimize(cost)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
xBatch, yBatch = getBatch(batchSize,X,Y)
#for (trainX, trainY) in zip(xBatch,yBatch):
sess.run(optimizer, feed_dict={x: xBatch, y: yBatch})
if(epoch+1) % 50 == 0:
c = sess.run(cost, feed_dict={x: X, y: Y})
print("Epoch:", (epoch+1), "cost=", "{:.4f}".format(c), "w=", sess.run(w), "b=", sess.run(b))
print("Optimization Finished")
trainingCost = sess.run(cost, feed_dict={x: X, y:Y})
print("Training cost=", trainingCost, "w=", sess.run(w), "b=", sess.run(b))
When I run the following code w and converge to different values depending on the learning rate and batch size I choose.
Because, if you run sess.run(optimizer, feed_dict={x: xBatch, y: yBatch}) TensorFlow does something like below.
w -= learningRate * dw
where dw is the value calculated by gradient descent optimizer.
So if you change learningRate and then run the program, you get different value of w. And w affect dw and dw affect next w. So it's difficult to predict what value w will become if you change the learningRate.
Related
I have a training data, with 1000 rows. I am using Tensorflow for training this data. Also trying to divide this into mini-batches of size 32. While Training the data, i am getting the error as mentioned below
InvalidArgumentError: Incompatible shapes: [1000] vs. [32]
[[{{node logistic_loss_1/mul}}]]
On the contrary, if i don't divide my training data into minibatches, or use a single minibatch of size 1000, the code works fine.
I have defined weights as tf.Variables and running the tensorflow session. See the code below
def sigmoid_cost(z,Y):
print("Entered Cost")
z = tf.squeeze(z)
Y = tf.cast(Y_train,tf.float64)
logits = tf.transpose(z)
labels = (Y)
print(logits.shape)
print(labels.shape)
return tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=labels,logits=logits))
def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001,
num_epochs = 1500, minibatch_size = 32, print_cost = True):
hidden_layer = 4
m,n = X_train.shape
n_y = Y_train.shape[0]
X = tf.placeholder(tf.float64,shape=(None,n), name="X")
Y = tf.placeholder(tf.float64,shape=(None),name="Y")
parameters = init_params(n)
z4, parameters = fwd_model(X,parameters)
cost = sigmoid_cost(z4,Y)
num_minibatch = m/minibatch_size
print("Getting Minibatches")
num_minibatch = tf.cast(num_minibatch,tf.int32)
optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)
print("Gradient Defination Done")
init = tf.global_variables_initializer()
init_op = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
sess.run(init_op)
for epoch in range(0,num_epochs):
minibatches = []
minibatches = minibatch(X_train,Y_train,minibatch_size)
minibatch_cost = 0
for i in range (0,len(minibatches)):
(X_m,Y_m) = minibatches[i]
Y_m = np.squeeze(Y_m)
print("Minibatch %d X shape Y Shape ",i, X_m.shape,Y_m.shape)
_ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: X_m, Y: Y_m})
print("Mini Batch Cost is ",minibatch_cost)
epoch_cost = minibatch_cost/num_minibatch
if print_cost == True and epoch % 100 == 0:
print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
print(epoch_cost)
For some reason, while running the cost function the size of either X or Y batch is being taken as 32, 100 or vice-versa. Any help would be appreciated.
I think you are getting above error because of Y = tf.cast(Y_train, tf.float64) line inside sigmoid_cost function. Here, Y_train has 1000 rows, but loss function is expecting 32(which is your batch size).
It should be Y = tf.cast(Y, tf.float64). Infact, there is no need to cast data type here as Y is already of type tf.float64. Check below line:
Y = tf.placeholder(tf.float64,shape=(None),name="Y")
That's why, when you were using a single minibatch of size 1000(full Y_train data), your code was working fine.
I have written the script that demonstrates the linear regression algorithm as follows:
training_epochs = 100
learning_rate = 0.01
# the training set
x_train = np.linspace(0, 10, 100)
y_train = x_train + np.random.normal(0,1,100)
# set up placeholders for input and output
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)
# set up variables for weights
w0 = tf.Variable(0.0, name="w0")
w1 = tf.Variable(0.0, name="w1")
y_predicted = X*w1 + w0
# Define the cost function
costF = 0.5*tf.square(Y-y_predicted)
# Define the operation that will be called on each iteration
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(costF)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
# Loop through the data training
for epoch in range(training_epochs):
for (x, y) in zip(x_train, y_train):
sess.run(train_op, feed_dict={X: x, Y: y})
# get values of the final weights
w_val_0,w_val_1 = sess.run([w0,w1])
sess.close()
With this script above, I could compute w_val_1 and w_val_0 easily. But if I changed something with the y_predicted:
w0 = tf.Variable(0.0, name="w0")
w1 = tf.Variable(0.0, name="w1")
w2 = tf.Variable(0.0, name="w2")
y_predicted = X*X*w2 + X*w1 + w0
...
w_val_0,w_val_1,w_val_2 = sess.run([w0,w1,w2])
then I couldn't compute w_val_0, w_val_1, w_val_2. Please help me!
When you are doing X*X the weight (w2, w1 and w0) increase rapidly reaching inf which results in nan values in the loss and no training happens. As a rule of thumb always normalize the data to 0 mean and unit variance.
Fixed code
training_epochs = 100
learning_rate = 0.01
# the training set
x_train = np.linspace(0, 10, 100)
y_train = x_train + np.random.normal(0,1,100)
# # Normalize the data
x_mean = np.mean(x_train)
x_std = np.std(x_train)
x_train_ = (x_train - x_mean)/x_std
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)
# set up variables for weights
w0 = tf.Variable(0.0, name="w0")
w1 = tf.Variable(0.0, name="w1")
w2 = tf.Variable(0.0, name="w3")
y_predicted = X*X*w1 + X*w2 + w0
# Define the cost function
costF = 0.5*tf.square(Y-y_predicted)
# Define the operation that will be called on each iteration
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(costF)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
# Loop through the data training
for epoch in range(training_epochs):
for (x, y) in zip(x_train_, y_train):
sess.run(train_op, feed_dict={X: x, Y: y})
y_hat = sess.run(y_predicted, feed_dict={X: x_train_})
print (sess.run([w0,w1,w2]))
sess.close()
plt.plot(x_train, y_train)
plt.plot(x_train, y_hat)
plt.show()
output:
[4.9228806, -0.08735728, 3.029659]
I'm working through this RNN tutorial to get a general idea of how to write an RNN using the lower level TensorFlow API. While I've gotten everything to work, I am getting different values for my total_loss depending on how I evaluate it within the session.
What is the difference in how the below losses are calculated? Why does running the train step with other nodes (i.e. in the same run statement) in the graph result in different loss values then when running the train step and other nodes separately (i.e. in different run statements)?
Here is the graph:
X = tf.placeholder(tf.int32, [batch_size, num_steps], name = 'X')
Y = tf.placeholder(tf.int32, [batch_size, num_steps], name = 'Y')
initial_state = tf.zeros([batch_size, state_size])
X_one_hot = tf.one_hot(X, num_classes)
rnn_inputs = tf.unstack(X_one_hot, axis = 1)
Y_one_hot = tf.one_hot(Y, num_classes)
Y_one_hot_list = tf.unstack(Y_one_hot, axis = 1)
with tf.variable_scope('RNN_cell'):
W = tf.get_variable('W', [num_classes + state_size, state_size])
b = tf.get_variable('b', [state_size], initializer = tf.constant_initializer(0.0))
tf.summary.histogram('RNN_cell/weights', W)
# define the RNN cell
def RNNCell(rnn_input, state, activation = tf.tanh):
with tf.variable_scope('RNN_cell', reuse = True):
W = tf.get_variable('W', [num_classes + state_size, state_size])
b = tf.get_variable('b', [state_size], initializer = tf.constant_initializer(0))
H = activation(tf.matmul(tf.concat([rnn_input, state], axis = 1), W) + b)
return H
# add RNN cells to the computational graph
state = initial_state
rnn_outputs = []
for rnn_input in rnn_inputs:
state = RNNCell(rnn_input, state, tf.tanh)
rnn_outputs.append(state)
final_state = rnn_outputs[-1]
# set up the softmax output layer
with tf.variable_scope('softmax_output'):
W = tf.get_variable('W', [state_size, num_classes])
b = tf.get_variable('b', [num_classes], initializer = tf.constant_initializer(0.0))
tf.summary.histogram('softmax_output/weights', W)
logits = [tf.matmul(rnn_output, W) + b for rnn_output in rnn_outputs]
probabilties = [tf.nn.softmax(logit) for logit in logits]
predictions = [tf.argmax(logit, 1) for logit in logits]
# set up loss function
losses = [tf.nn.softmax_cross_entropy_with_logits(labels = label, logits = logit) for
logit, label in zip(logits, Y_one_hot_list)]
total_loss = tf.reduce_mean(losses)
# set up the optimizer
train_step = tf.train.AdamOptimizer(learning_rate).minimize(total_loss)
tf.summary.scalar('loss', total_loss)
This version of the session evaluates the training loss, takes a train_step, and then evaluates the loss again.
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
train_writer = tf.summary.FileWriter( './RNN_Tutorial/temp1', sess.graph)
summary = tf.summary.merge_all()
for index, epoch in enumerate(gen_epochs(num_epochs, num_steps)):
training_state = np.zeros((batch_size, state_size))
for step, (x, y) in enumerate(epoch):
training_loss1 = sess.run(total_loss, feed_dict = {X: x, Y: y, initial_state: training_state})
sess.run(train_step, feed_dict = {X: x, Y: y, initial_state: training_state})
training_loss2 = sess.run(total_loss, feed_dict = {X: x, Y: y, initial_state: training_state})
if step % 1 == 0:
train_writer.add_summary(summary_str, global_step = step)
print(step, training_loss1, training_loss2)
The output looks like the model is not really learning. Here is the (partial) output, which doesn't really change through all 1000 iterations. It just sticks around 0.65 - 0.7
0 0.6757775 0.66556937
1 0.6581067 0.6867344
2 0.70850086 0.66878074
3 0.67115635 0.68184483
4 0.67868954 0.6858209
5 0.6853568 0.66989964
6 0.672376 0.6554015
7 0.66563135 0.6655373
8 0.660332 0.6666234
9 0.6514224 0.6536864
10 0.65912485 0.6518013
And here is the session when I run total_loss, losses, and final_state with the train_step:
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
train_writer = tf.summary.FileWriter( './RNN_Tutorial/temp1', sess.graph)
summary = tf.summary.merge_all()
for index, epoch in enumerate(gen_epochs(num_epochs, num_steps)):
training_state = np.zeros((batch_size, state_size))
for step, (x, y) in enumerate(epoch):
training_loss1 = sess.run(total_loss, feed_dict = {X: x, Y: y, initial_state: training_state})
tr_losses, training_loss_, training_state, _, summary_str = \
sess.run([losses,
total_loss,
final_state,
train_step,
summary], feed_dict={X:x, Y:y, initial_state:training_state})
training_loss2 = sess.run(total_loss, feed_dict = {X: x, Y: y, initial_state: training_state})
if step % 1 == 0:
train_writer.add_summary(summary_str, global_step = step)
print(step, training_loss1, training_loss_, training_loss2)
In this output, however, the total_loss calculated before the train step and the total loss calculated with train step have a steady decline and then plateau around 0.53 while the loss calculated after the train step (training_loss2) still fluctuates around 0.65 - 0.7 in the same way the first session did. Below is another partial output:
900 0.50464576 0.50464576 0.6973026
901 0.51603603 0.51603603 0.7115394
902 0.5465342 0.5465342 0.74994177
903 0.50591564 0.50591564 0.69172275
904 0.54837495 0.54837495 0.7333309
905 0.51697487 0.51697487 0.674438
906 0.5259896 0.5259896 0.70118546
907 0.5242365 0.5242365 0.71549624
908 0.50699174 0.50699174 0.7007787
909 0.5292892 0.5292892 0.7045353
910 0.49432433 0.49432433 0.73515224
I would think that the training loss would be the same for both versions of the session block. Why does using sess.run(total_loss, ...) then sess.run(train_step, ...) alone (i.e. in the first version) result in different loss values than when using sess.run([losses, total_loss, final_state, train_step], ...)?
Figured it out. Running the session without fetching and updating training_state = final_state within the second for loop was the issue. Without that, the model doesn't learn the longer dependencies built into the generated data.
I've tried the codes provided by Tensorflow here
I've also tried the solution provided by Nicolas, I encountered an error:
ValueError: Shape () must have rank at least 1
but I am incapable of manipulating the code such that I can grab the data and place it in train_X and train_Y variables.
I'm currently using hard coded data for variable train_X and train_Y.
My csv file contains 2 columns, Height & State of Charge(SoC), where height is a float value and SoC is a whole number (Int) starting from 0 with increment of 10 to a maximum of 100.
I want to grab the data from the columns and use it in a linear regression model, where Height is the Y value and SoC is the x value.
Here's my code:
filename_queue = tf.train.string_input_producer("battdata.csv")
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
# Default values, in case of empty columns. Also specifies the type of the
# decoded result.
record_defaults = [[1], [1]]
col1, col2= tf.decode_csv(
value, record_defaults=record_defaults)
features = tf.stack([col1, col2])
with tf.Session() as sess:
# Start populating the filename queue.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(1200):
# Retrieve a single instance:
example, label = sess.run([features, col2])
coord.request_stop()
coord.join(threads)
I want to change use the csv data in this model:
# Parameters
learning_rate = 0.01
training_epochs = 1000
display_step = 50
# Training Data
train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
2.827,3.465,1.65,2.904,2.42,2.94,1.3])
n_samples = train_X.shape[0]
# tf Graph Input
X = tf.placeholder("float")#Charge
Y = tf.placeholder("float")#Height
# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
# Construct a linear model
pred = tf.add(tf.multiply(X, W), b) # XW + b <- y = mx + b where W is gradient, b is intercept
# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Fit all training data
for epoch in range(training_epochs):
for (x, y) in zip(train_X, train_Y):
sess.run(optimizer, feed_dict={X: x, Y: y})
#Display logs per epoch step
if (epoch+1) % display_step == 0:
c = sess.run(cost, feed_dict={X: train_X, Y:train_Y})
print( "Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
"W=", sess.run(W), "b=", sess.run(b))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
print ("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')
#Graphic display
plt.plot(train_X, train_Y, 'ro', label='Original data')
plt.plot(train_X, sess.run(W) * train_X + sess.run(b), label='Fitted line')
plt.legend()
plt.show()
EDIT:
I've also tried the solution provided by Nicolas, I encountered an
error:
ValueError: Shape () must have rank at least 1
I solved this issue by adding square brackets around my file name like so:
filename_queue = tf.train.string_input_producer(['battdata.csv'])
All you need to do is to replace your placeholder tensors by the op you get form the decode_csv method. This way whenever you will run the optimiser, the TensorFlow Graph will ask for a new row to be read from the file through the various Tensor dependencies:
optimiser =>
cost=> pred=> X
cost => Y
It would give something like that:
filename_queue = tf.train.string_input_producer("battdata.csv")
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
# Default values, in case of empty columns. Also specifies the type of the
# decoded result.
record_defaults = [[1.], [1]]
X, Y = tf.decode_csv(
value, record_defaults=record_defaults)
# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
# Construct a linear model
pred = tf.add(tf.multiply(X, W), b) # XW + b <- y = mx + b where W is gradient, b is intercept
# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
with tf.Session() as sess:
# Start populating the filename queue.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
# Fit all training data
for epoch in range(training_epochs):
_, cost_value = sess.run([optimizer, cost])
[...] # The rest of your code
coord.request_stop()
coord.join(threads)
I had the same problem and the problem was resolved like:
tf.train.string_input_producer(tf.train.match_filenames_once("medal.csv"))
Found this here: .TensorFlow From CSV to API
I'm trying to make some price prediction on a kaggle dataset with Tensorflow.
My Neural network is learning, but, my cost function is really high and my predictions are far from the real output.
I tried to change my network by adding or removing some layers, neurons and activations functions.
I tried a lot with my hyper-parameters but that don't change so much things.
I don't think that the problem come from my datas, I checked on kaggle and that's the ones that most people uses.
If you have any idea why my cost is so high and how to reduce it and if you could explain it to me, it would be really great !
Her's my code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from sklearn.utils import shuffle
df = pd.read_csv(r"C:\Users\User\Documents\TENSORFLOW\Prediction prix\train2.csv", sep=';')
df.head()
df = df.loc[:, ['OverallQual', 'GrLivArea', 'GarageCars', 'TotalBsmtSF', 'FullBath', 'SalePrice']]
df = df.replace(np.nan, 0)
df
%matplotlib inline
plt = sns.pairplot(df)
plt
df = shuffle(df)
df_train = df[0:1000]
df_test = df[1001:1451]
inputX = df_train.drop('SalePrice', 1).as_matrix()
inputX = inputX.astype(int)
inputY = df_train.loc[:, ['SalePrice']].as_matrix()
inputY = inputY.astype(int)
inputX_test = df_test.drop('SalePrice', 1).as_matrix()
inputX_test = inputX_test.astype(int)
inputY_test = df_test.loc[:, ['SalePrice']].as_matrix()
inputY_test = inputY_test.astype(int)
# Parameters
learning_rate = 0.01
training_epochs = 1000
batch_size = 500
display_step = 50
n_samples = inputX.shape[0]
x = tf.placeholder(tf.float32, [None, 5])
y = tf.placeholder(tf.float32, [None, 1])
def add_layer(inputs, in_size, out_size, activation_function=None):
Weights = tf.Variable(tf.random_normal([in_size, out_size], stddev=0.1))
biases = tf.Variable(tf.zeros([1, out_size]) + 0.1)
Wx_plus_b = tf.matmul(inputs, Weights) + biases
if activation_function is None:
output = Wx_plus_b
else:
output = activation_function(Wx_plus_b)
return output
l1 = add_layer(x, 5, 3, activation_function=tf.nn.relu)
pred = add_layer(l1, 3, 1)
# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = batch_size
# Loop over all batches
for i in range(total_batch):
# Run optimization op (backprop) and cost op (to get loss value)
_, c = sess.run([optimizer, cost], feed_dict={x: inputX,
y: inputY})
# Compute average loss
avg_cost += c / total_batch
# Display logs per epoch step
if epoch % display_step == 0:
print("Epoch:", '%04d' % (epoch+1), "cost=", \
"{:.9f}".format(avg_cost))
print("Optimization Finished!")
# Test model
correct_prediction = tf.equal(pred,y)
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print("Accuracy:", accuracy.eval({x: inputX, y: inputY}))
print(sess.run(pred, feed_dict={x: inputX_test}))
Epoch: 0001 cost= 10142407502702304395526144.000000000
Epoch: 0051 cost= 3256106752.000019550
Epoch: 0101 cost= 3256106752.000019550
Epoch: 0151 cost= 3256106752.000019550
Epoch: 0201 cost= 3256106752.000019550
...
Thanks for your help !
I see couple of problems with the implementation:
Inputs are not scaled.
Use sklearn StandardScaler to scale the inputs inputX, inputY (and also inputX_text and inputY_text) to make it zero mean and unit variance. You can use the inverse_transform to convert the outputs back to proper scale again.
sc = StandardScaler().fit(inputX)
inputX = sc.transform(inputX)
inputX_test = sc.transform(inputX_test)
The batch_size is too large, you are passing the entire set as a single batch. This should not cause the particular problem you are facing, but for better convergence try with reduced batch size. Implement a get_batch() generator function and do the following:
for batch_X, batch_Y in get_batch(input_X, input_Y, batch_size):
_, c = sess.run([optimizer, cost], feed_dict={x: batch_X,
y: batch_Y})
Try smaller Weights initialization (stddev) if you still see issues.
WORKING CODE BELOW:
inputX = df_train.drop('SalePrice', 1).as_matrix()
inputX = inputX.astype(int)
sc = StandardScaler().fit(inputX)
inputX = sc.transform(inputX)
inputY = df_train.loc[:, ['SalePrice']].as_matrix()
inputY = inputY.astype(int)
sc1 = StandardScaler().fit(inputY)
inputY = sc1.transform(inputY)
inputX_test = df_test.drop('SalePrice', 1).as_matrix()
inputX_test = inputX_test.astype(int)
inputX_test = sc.transform(inputX_test)
inputY_test = df_test.loc[:, ['SalePrice']].as_matrix()
inputY_test = inputY_test.astype(int)
inputY_test = sc1.transform(inputY_test)
learning_rate = 0.01
training_epochs = 1000
batch_size = 50
display_step = 50
n_samples = inputX.shape[0]
x = tf.placeholder(tf.float32, [None, 5])
y = tf.placeholder(tf.float32, [None, 1])
def get_batch(inputX, inputY, batch_size):
duration = len(inputX)
for i in range(0,duration//batch_size):
idx = i*batch_size
yield inputX[idx:idx+batch_size], inputY[idx:idx+batch_size]
def add_layer(inputs, in_size, out_size, activation_function=None):
Weights = tf.Variable(tf.random_normal([in_size, out_size], stddev=0.005))
biases = tf.Variable(tf.zeros([1, out_size]))
Wx_plus_b = tf.matmul(inputs, Weights) + biases
if activation_function is None:
output = Wx_plus_b
else:
output = activation_function(Wx_plus_b)
return output
l1 = add_layer(x, 5, 3, activation_function=tf.nn.relu)
pred = add_layer(l1, 3, 1)
# Mean squared error
cost = tf.reduce_mean(tf.pow(tf.subtract(pred, y), 2))
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = batch_size
# Loop over all batches
#for i in range(total_batch):
for batch_x, batch_y in get_batch(inputX, inputY, batch_size):
# Run optimization op (backprop) and cost op (to get loss value)
_, c, _l1, _pred = sess.run([optimizer, cost, l1, pred], feed_dict={x: batch_x, y: batch_y})
# Compute average loss
avg_cost += c / total_batch
# Display logs per epoch step
if epoch % display_step == 0:
print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f} ".format(avg_cost))
#print(_l1, _pred)
print("Optimization Finished!")
I have already had a similar problem of a very high cost reached after a few training steps, and then the cost remaining constant there. For me it was a kind of overflow, with the gradients too big and creating Nan values quite early in training. I solved it by starting with a smaller learning rate (potentially much smaller), until the cost and gradients become more reasonable (a few dozen steps), and then back to a regular one (higher at the start, potentially decaying).
See my answer to this post for a similar case that was solved just by taking a smaller learning rate on start.
You can also clip your gradients to avoid this problem, using tf.clip_by_value. It sets a minimum and maximum value to your gradients, which avoids to have huge ones that send your weights straight to Nan after the first few iterations. To use it (with min and max at -1 and 1, which is probably too tight), replace
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
by
opt= tf.train.GradientDescentOptimizer(learning_rate)
gvs = opt.compute_gradients(cost)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
optimizer = opt.apply_gradients(capped_gvs)