TensorFlow - why doesn't this sofmax regression learn anything? - python

I am aiming to do big things with TensorFlow, but I'm trying to start small.
I have small greyscale squares (with a little noise) and I want to classify them according to their colour (e.g. 3 categories: black, grey, white). I wrote a little Python class to generate squares, and 1-hot vectors, and modified their basic MNIST example to feed them in.
But it won't learn anything - e.g. for 3 categories it always guesses ≈33% correct.
import tensorflow as tf
import generate_data.generate_greyscale
data_generator = generate_data.generate_greyscale.GenerateGreyScale(28, 28, 3, 0.05)
ds = data_generator.generate_data(10000)
ds_validation = data_generator.generate_data(500)
xs = ds[0]
ys = ds[1]
num_categories = data_generator.num_categories
x = tf.placeholder("float", [None, 28*28])
W = tf.Variable(tf.zeros([28*28, num_categories]))
b = tf.Variable(tf.zeros([num_categories]))
y = tf.nn.softmax(tf.matmul(x,W) + b)
y_ = tf.placeholder("float", [None,num_categories])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
# let batch_size = 100 --> therefore there are 100 batches of training data
xs = xs.reshape(100, 100, 28*28) # reshape into 100 minibatches of size 100
ys = ys.reshape((100, 100, num_categories)) # reshape into 100 minibatches of size 100
for i in range(100):
batch_xs = xs[i]
batch_ys = ys[i]
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
xs_validation = ds_validation[0]
ys_validation = ds_validation[1]
print sess.run(accuracy, feed_dict={x: xs_validation, y_: ys_validation})
My data generator looks like this:
import numpy as np
import random
class GenerateGreyScale():
def __init__(self, num_rows, num_cols, num_categories, noise):
self.num_rows = num_rows
self.num_cols = num_cols
self.num_categories = num_categories
# set a level of noisiness for the data
self.noise = noise
def generate_label(self):
lab = np.zeros(self.num_categories)
lab[random.randint(0, self.num_categories-1)] = 1
return lab
def generate_datum(self, lab):
i = np.where(lab==1)[0][0]
frac = float(1)/(self.num_categories-1) * i
arr = np.random.uniform(max(0, frac-self.noise), min(1, frac+self.noise), self.num_rows*self.num_cols)
return arr
def generate_data(self, num):
data_arr = np.zeros((num, self.num_rows*self.num_cols))
label_arr = np.zeros((num, self.num_categories))
for i in range(0, num):
label = self.generate_label()
datum = self.generate_datum(label)
data_arr[i] = datum
label_arr[i] = label
#data_arr = data_arr.astype(np.float32)
#label_arr = label_arr.astype(np.float32)
return data_arr, label_arr

For starters, try initializing your W matrix with random values, not zeros - you're not giving the optimizer anything to work with when the output is all zeros for all inputs.
Instead of:
W = tf.Variable(tf.zeros([28*28, num_categories]))
Try:
W = tf.Variable(tf.truncated_normal([28*28, num_categories],
stddev=0.1))

You issue is that the your gradients are increasing/decreasing without bounds, causing the loss function to become nan.
Take a look at this question: Why does TensorFlow example fail when increasing batch size?
Furthermore, make sure that you run the model for a sufficient number of steps. You are only running it once through your train dataset (100 times * 100 examples), and this is not enough for it to converge. Increase it to something like 2000 at a minimum (running 20 times through your dataset).
Edit (can't comment, so i'll add my thoughts here):
The point of the post i linked is that you can use GradientDescentOptimizer, as long as you make the learning rate something like 0.001. That's the issue, your learning rate was too high for the loss function you were using.
Alternatively, use a different loss function, that doesn't increase/decrease the gradients as much. Use tf.reduce_mean instead of tf.reduce_sum in the definition of crossEntropy.

While dga and syncd's responses were helpful, I tried using non-zero weight initialization and larger datasets but to no avail. The thing that finally worked was using a different optimization algorithm.
I replaced:
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
with
train_step = tf.train.AdamOptimizer(0.0005).minimize(cross_entropy)
I also embedded the training for loop in another for loop to train for several epochs, resulting in convergence like this:
===# EPOCH 0 #===
Error: 0.370000004768
===# EPOCH 1 #===
Error: 0.333999991417
===# EPOCH 2 #===
Error: 0.282000005245
===# EPOCH 3 #===
Error: 0.222000002861
===# EPOCH 4 #===
Error: 0.152000010014
===# EPOCH 5 #===
Error: 0.111999988556
===# EPOCH 6 #===
Error: 0.0680000185966
===# EPOCH 7 #===
Error: 0.0239999890327
===# EPOCH 8 #===
Error: 0.00999999046326
===# EPOCH 9 #===
Error: 0.00400000810623
EDIT - WHY IT WORKS: I suppose the problem was that I didn't manually choose a good learning rate schedule, and Adam was able to generate a better one automatically.

Found this question when I was having a similar issue..I fixed mine by scaling the features.
A little background: I was following the tensorflow tutorial, however I wanted to use the data from Kaggle(see data here) to do the modeling, but in the beginning I kepted having the same issue: the model just doesn't learn..after rounds of trouble-shooting, I realized that the Kaggle data was on a completely different scale. Therefore, I scaled the data so that it shares the same scale(0,1) as the tensorflow's MNIST dataset.
Just figured that I would add my two cents here..in case some beginners who are trying to follow the tutorial's settings get stuck like I did =)

Related

Trouble with tensorboard two 1D graphs instead of one 2D graph

I'm curently working on a machine-learning project.
I use Python3 with tensorflow to train a CNN neural network and I would like to measure its performance by using tensorboard.
I would like to measure the loss value per epoch. But instead of having only 1 graph with in X the value of the epoch and in Y the value of the loss, I have 2 graphs, one with epochs value and the other the loss value.
I put a screen-shot here:
There is the training part of my code:
with tf.Session(config = config) as sess:
sess.run(tf.global_variables_initializer())
writer = tf.summary.FileWriter(graphDirectory, sess.graph)
# Generate shepp logan for validation
x_arr_validate, y_arr_validate, x_true_arr_validate, y_true_arr_validate = generateData(datasize,nbiter,reco_space,operator,pseudoinverse,validation=True )
for step in tqdm(range(epoch)):
#Generate trading data
x_arr, y_arr, x_true_arr, y_true_arr = generateData(datasize,nbiter,reco_space,operator,pseudoinverse)
#Training
feed_dict = {x0: x_arr,
x_true: x_true_arr,
y: y_arr}
_,loss_training = sess.run([optimizer, loss], feed_dict)
#Validation
feed_dictValidate = {x0 : x_arr_validate,
x_true : x_true_arr_validate,
y : y_arr_validate}
x_values_result, loss_result = sess.run([x_values, loss], feed_dictValidate)
lossSummary = tf.Summary(value=[tf.Summary.Value(tag="loss", simple_value=loss_result)])
epochSummary = tf.Summary(value=[tf.Summary.Value(tag="epoch", simple_value=step)])
writer.add_summary(lossSummary)
writer.add_summary(epochSummary)
saver.save(sess, sessFileName,write_meta_graph=True)
writer.close()
I try to change:
writer.add_summary(lossSummary)
writer.add_summary(epochSummary)
By:
writer.add_summary(lossSummary,epochSummary)
But that doesn't work.
I aslo try to create an array:
step_per_epoch = []
...
x_values_result, loss_result = sess.run([x_values, loss], feed_dictValidate)
step_per_epoch.append(loss_result)
lossSummary = tf.Summary(value=[tf.Summary.Value(tag="loss", simple_value=step_per_epoch)])
writer.add_summary(lossSummary)
But got the following error:
lossSummary = tf.Summary(value=[tf.Summary.Value(tag="loss", simple_value=loss_per_epoch)])
TypeError: [] has type list, but expected one of: int, long, float
I have no idea. Any hints or tips ? Thanks you
If you want to see the epochs in the horizontal axis, then you have to pass a global_step parameter along with the summary (see the documentation for tf.summary.FileWriter.add_summary). In your case, that would be:
writer.add_summary(lossSummary, step)
writer.add_summary(epochSummary, step)
Alternatively, if you change the "Horizontal Axis" selection in this panel:
From "Step" to "Relative" or "Wall" you will have relative or absolute time stamps in the X axis, which will allow you to see the progress.

Always same output for tensorflow autoencoder

At the moment I try to build an Autoencoder for timeseries data in tensorflow. I have nearly 500 days of data where each day have 24 datapoints. Since this is my first try my architecture is very simple. After my input of size 24 the hidden layers are of size: 10; 3; 10 with an output of again 24. I normalized the data (datapoints are in range [-0.5; 0.5]), use the sigmoid activation function and the RMSPropOptimizer.
After training (loss function in picture) the output is the same for every timedata i give into the network. Does someone know what is the reason for that? Is it possible that my Dataset is the issue (code below)?
class TimeDataset:
def __init__(self,data):
self._index_in_epoch = 0
self._epochs_completed = 0
self._data = data
self._num_examples = data.shape[0]
pass
#property
def data(self):
return self._data
def next_batch(self, batch_size, shuffle=True):
start = self._index_in_epoch
# first call
if start == 0 and self._epochs_completed == 0:
idx = np.arange(0, self._num_examples) # get all possible indexes
np.random.shuffle(idx) # shuffle indexe
self._data = self.data[idx] # get list of `num` random samples
if start + batch_size > self._num_examples:
# not enough samples left -> go to the next batch
self._epochs_completed += 1
rest_num_examples = self._num_examples - start
data_rest_part = self.data[start:self._num_examples]
idx0 = np.arange(0, self._num_examples) # get all possible indexes
np.random.shuffle(idx0) # shuffle indexes
self._data = self.data[idx0] # get list of `num` random samples
start = 0
self._index_in_epoch = batch_size - rest_num_examples #avoid the case where the #sample != integar times of batch_size
end = self._index_in_epoch
data_new_part = self._data[start:end]
return np.concatenate((data_rest_part, data_new_part), axis=0)
else:
# get next batch
self._index_in_epoch += batch_size
end = self._index_in_epoch
return self._data[start:end]
*edit: here are some examples of the output (red original, blue reconstructed):
**edit: I just saw an autoencoder example with a more complicant luss function than mine. Someone know if the loss function self.loss = tf.reduce_mean(tf.pow(self.X - self.decoded, 2)) is sufficient?
***edit: some more code to describe my training
This is my Autoencoder Class:
class AutoEncoder():
def __init__(self):
# Training Parameters
self.learning_rate = 0.005
self.alpha = 0.5
# Network Parameters
self.num_input = 24 # one day as input
self.num_hidden_1 = 10 # 2nd layer num features
self.num_hidden_2 = 3 # 2nd layer num features (the latent dim)
self.X = tf.placeholder("float", [None, self.num_input])
self.weights = {
'encoder_h1': tf.Variable(tf.random_normal([self.num_input, self.num_hidden_1])),
'encoder_h2': tf.Variable(tf.random_normal([self.num_hidden_1, self.num_hidden_2])),
'decoder_h1': tf.Variable(tf.random_normal([self.num_hidden_2, self.num_hidden_1])),
'decoder_h2': tf.Variable(tf.random_normal([self.num_hidden_1, self.num_input])),
}
self.biases = {
'encoder_b1': tf.Variable(tf.random_normal([self.num_hidden_1])),
'encoder_b2': tf.Variable(tf.random_normal([self.num_hidden_2])),
'decoder_b1': tf.Variable(tf.random_normal([self.num_hidden_1])),
'decoder_b2': tf.Variable(tf.random_normal([self.num_input])),
}
self.encoded = self.encoder(self.X)
self.decoded = self.decoder(self.encoded)
# Define loss and optimizer, minimize the squared error
self.loss = tf.reduce_mean(tf.pow(self.X - self.decoded, 2))
self.optimizer = tf.train.RMSPropOptimizer(self.learning_rate).minimize(self.loss)
def encoder(self, x):
# sigmoid, tanh, relu
en_layer_1 = tf.nn.sigmoid (tf.add(tf.matmul(x, self.weights['encoder_h1']),
self.biases['encoder_b1']))
en_layer_2 = tf.nn.sigmoid (tf.add(tf.matmul(en_layer_1, self.weights['encoder_h2']),
self.biases['encoder_b2']))
return en_layer_2
def decoder(self, x):
de_layer_1 = tf.nn.sigmoid (tf.add(tf.matmul(x, self.weights['decoder_h1']),
self.biases['decoder_b1']))
de_layer_2 = tf.nn.sigmoid (tf.add(tf.matmul(de_layer_1, self.weights['decoder_h2']),
self.biases['decoder_b2']))
return de_layer_2
and this is how I train my network (input data have shape (number_days, 24)):
model = autoencoder.AutoEncoder()
num_epochs = 3
batch_size = 50
num_batches = 300
display_batch = 50
examples_to_show = 16
loss_values = []
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
#training
for e in range(1, num_epochs+1):
print('starting epoch {}'.format(e))
for b in range(num_batches):
# get next batch of data
batch_x = dataset.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
l = sess.run([model.loss], feed_dict={model.X: batch_x})
sess.run(model.optimizer, feed_dict={model.X: batch_x})
# Display logs
if b % display_batch == 0:
print('Epoch {}: Batch ({}) Loss: {}'.format(e, b, l))
loss_values.append(l)
# testing
test_data = dataset.next_batch(batch_size)
decoded_test_data = sess.run(model.decoded, feed_dict={model.X: test_data})
Just a suggestion, I have had some issues with autoencoders using the sigmoid function.
I switched to tanh or relu and those improved the results.
With the autoencoder it is basically learning to recreate the output from the input, by encoding and decoding. If you mean it's the same as the input, then you are getting what you want. It has learned the data set.
Ultimately you can compare by reviewing the Mean Squared Error between the input and output and see if it is exactly the same. If you mean that the output is exactly the same regardless of the input, that isn't something I've run into. I guess if your input doesn't vary much from day to day, then I could imagine that would have some impact. Are you looking for anomalies?
Also, if you have a time series for training, I wouldn't shuffle the data in this particular case. If the temporal order is significant, you introduce data leakage (basically introducing future data into the training set) depending on what you are trying to achieve.
Ah, I didn't initially see your post with the graph results.. thanks for adding.
The sigmoid output is floored at 0, so it cannot reproduce your data that is below 0.
If you want to use a sigmoid output, then rescale your data between ]0;1[ (0 and 1 excluded).
I know this is a very old post, so this is just an attempt to help whoever wonders here again with the same problem.... If the autoencoder is converging to the same encoding for all the different instances, there may be a problem in the loss function.... Check the size and shape of the return of the loss function, as it may be getting confused and evaluating the wrong tensors (i.e. you may need to transpose something somewhere) Basically, assuming you are using the autoencoder to encode M features of N training instances, your loss function should return N values. the size of your loss tensor should be the amount of instances in your training set. I found that the hard way.....

Tensorflow accuracy issues

I've wrote my first Tensorflow program ( using my own data) . It works well at least it doesn't crash! but I'm getting a wired accuracy values either 0 oder 1 ?
.................................
the previous part of the code, is only about handeling csv file an getting Data in correct format / shapes
......................................................
# Tensoflow
x = tf.placeholder(tf.float32,[None,len(Training_Data[0])],name='Train_data')# each input has a 457 lenght
y_ = tf.placeholder(tf.float32,[None, numberOFClasses],name='Labels')#
#w = tf.Variable(tf.zeros([len(Training_Data[0]),numberOFClasses]),name='Weights')
w = tf.Variable(tf.truncated_normal([len(Training_Data[0]),numberOFClasses],stddev=1./10),name='Weights')
b = tf.Variable(tf.zeros([numberOFClasses]),name='Biases')
model = tf.add(tf.matmul(x,w),b)
y = tf.nn.softmax(model)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
#cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for j in range(len(train_data)):
if(np.shape(train_data) == (batchSize,numberOFClasses)):
sess.run(train_step,feed_dict={x:train_data[j],y_:np.reshape(train_labels[j],(batchSize,numberOFClasses)) })
correct_prediction = tf.equal(tf.arg_max(y,1),tf.arg_max(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,"float"))
accuracy_vector= []
current_class =[]
for i in range(len(Testing_Data)):
if( np.shape(Testing_Labels[i]) == (numberOFClasses,)):
accuracy_vector.append(sess.run(accuracy,feed_dict={x:np.reshape(Testing_Data[i],(1,457)),y_:np.reshape(Testing_Labels[i],(1,19))}))#,i)#,Test_Labels[i])
current_class.append(int(Test_Raw[i][-1]))
ploting theaccuracy_vector delivers the following :
[]
any idea what I'm missing here ?
thanks a lot for any hint !
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
tf.nn.softmax_cross_entropy_with_logits wants unscaled logits.
From the doc:
WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
this means that the line y = tf.nn.softmax(model) is wrong.
Instead, you want to pass unscaled logits to that function, thus:
y = model
Moreover, once you fix this problem, if the network doesn't work, try to lower the learning rate from 0.01 to something about 1e-3 or 1e-4. (I tell you this because 1e-2 usually is an "high" learning rate)
You're testing on batches of size 1, so either the prediction is good or it's false, so you can only get 0 or 1 accuracy:
accuracy_vector.append(sess.run(accuracy,feed_dict={x:np.reshape(Testing_Data[i],(1,457)),y_:np.reshape(Testing_Labels[i],(1,19))}))#,i)#,Test_Labels[i])
Just use a bigger batch size :
accuracy_vector.append(sess.run(accuracy,feed_dict={x:np.reshape(Testing_Data[i:i+batch_size],(batch_size,457)),y_:np.reshape(Testing_Labels[i:i+batch_size],(batch_size,19))}))

Linear regression with tensorflow

I trying to understand linear regression... here is script that I tried to understand:
'''
A linear regression learning algorithm example using TensorFlow library.
Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
'''
from __future__ import print_function
import tensorflow as tf
from numpy import *
import numpy
import matplotlib.pyplot as plt
rng = numpy.random
# Parameters
learning_rate = 0.0001
training_epochs = 1000
display_step = 50
# Training Data
train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
2.827,3.465,1.65,2.904,2.42,2.94,1.3])
train_X=numpy.asarray(train_X)
train_Y=numpy.asarray(train_Y)
n_samples = train_X.shape[0]
# tf Graph Input
X = tf.placeholder("float")
Y = tf.placeholder("float")
# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
# Construct a linear model
pred = tf.add(tf.multiply(X, W), b)
# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Fit all training data
for epoch in range(training_epochs):
for (x, y) in zip(train_X, train_Y):
sess.run(optimizer, feed_dict={X: x, Y: y})
# Display logs per epoch step
if (epoch+1) % display_step == 0:
c = sess.run(cost, feed_dict={X: train_X, Y:train_Y})
print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
"W=", sess.run(W), "b=", sess.run(b))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')
# Graphic display
plt.plot(train_X, train_Y, 'ro', label='Original data')
plt.plot(train_X, sess.run(W) * train_X + sess.run(b), label='Fitted line')
plt.legend()
plt.show()
Question is what this part represent:
# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
And why are there random float numbers?
Also could you show me some math with formals represents cost, pred, optimizer variables?
let's try to put up some intuition&sources together with the tfapproach.
General intuition:
Regression as presented here is a supervised learning problem. In it, as defined in Russel&Norvig's Artificial Intelligence, the task is:
given a training set (X, y) of m input-output pairs (x1, y1), (x2, y2), ... , (xm, ym), where each output was generated by an unknown function y = f(x), discover a function h that approximates the true function f
For that sake, the h hypothesis function combines somehow each x with the to-be-learned parameters, in order to have an output that is as close to the corresponding y as possible, and this for the whole dataset. The hope is that the resulting function will be close to f.
But how to learn this parameters? in order to be able to learn, the model has to be able to evaluate. Here comes the cost (also called loss, energy, merit...) function to play: it is a metric function that compares the output of h with the corresponding y, and penalizes big differences.
Now it should be clear what is exactly the "learning" process here: alter the parameters in order to achieve a lower value for the cost function.
Linear Regression:
The example that you are posting performs a parametric linear regression, optimized with gradient descent based on the mean squared error as cost function. Which means:
Parametric: The set of parameters is fixed. They are held in the exact same memory placeholders thorough the learning process.
Linear: The output of h is merely a linear (actually, affine) combination between the input x and your parameters. So if x and w are real-valued vectors of the same dimensionality, and b is a real number, it holds that h(x,w, b)= w.transposed()*x+b. Page 107 of the Deep Learning Book brings more quality insights and intuitions into that.
Cost function: Now this is the interesting part. The average squared error is a convex function. This means it has a single, global optimum, and furthermore, it can be directly found with the set of normal equations (also explained in the DLB). In the case of your example, the stochastic (and/or minibatch) gradient descent method is used: this is the preferred method when optimizing non-convex cost functions (which is the case in more advanced models like neural networks) or when your dataset has a huge dimensionality (also explained in the DLB).
Gradient descent: tf deals with this for you, so it is enough to say that GD minimizes the cost function by following its derivative "downwards", in small steps, until reaching a saddle point. If you totally need to know, the exact technique applied by TF is called automatic differentiation, kind of a compromise between the numeric and symbolic approaches. For convex functions like yours this point will be the global optimum, and (if your learning rate is not too big) it will always converge to it, so it doesn't matter which values you initialize your Variables with. The random initialization is necessary in more complex architectures like neural networks. There is some extra code regarding the management of the minibatches, but I won't get into that because it is not the main focus of your question.
The TensorFlow approach:
Deep Learning frameworks are nowadays about nesting lots of functions by building computational graphs (you may want to take a look at the presentation on DL frameworks that I did some weeks ago). For constructing and running the graph, TensoFlow follows a declarative style, which means that the graph has to be first completely defined and compiled, before it is deployed and executed. It is very reccommended to read this short wiki article, if you haven't yet. In this context, the setup is split in two parts:
Firstly, you define your computational Graph, where you put your dataset and parameters in memory placeholders, define the hypothesis and cost functions building on them, and tell tf which optimization technique to apply.
Then you run the computation in a Session and the library will be able to (re)load the data placeholders and perform the optimization.
The code:
The code of the example follows this approach closely:
Define the test data X and labels Y, and prepare a placeholder in the Graph for them (which is fed in the feed_dict part).
Define the 'W' and 'b' placeholders for the parameters. They have to be Variables because they will be updated during the Session.
Define pred (our hypothesis) and cost as explained before.
From this, the rest of the code should be clearer. Regarding the optimizer, as I said, tf already knows how to deal with this but you may want to look into gradient descent for more details (again, the DLB is a pretty good reference for that)
Cheers!
Andres
CODE EXAMPLES: GRADIENT DESCENT VS. NORMAL EQUATIONS
This small snippets generate simple multi-dimensional datasets and test both approaches. Notice that the normal equations approach doesn't require looping, and brings better results. For small dimensionality (DIMENSIONS<30k) is probably the preferred approach:
from __future__ import absolute_import, division, print_function
import numpy as np
import tensorflow as tf
####################################################################################################
### GLOBALS
####################################################################################################
DIMENSIONS = 5
f = lambda(x): sum(x) # the "true" function: f = 0 + 1*x1 + 1*x2 + 1*x3 ...
noise = lambda: np.random.normal(0,10) # some noise
####################################################################################################
### GRADIENT DESCENT APPROACH
####################################################################################################
# dataset globals
DS_SIZE = 5000
TRAIN_RATIO = 0.6 # 60% of the dataset is used for training
_train_size = int(DS_SIZE*TRAIN_RATIO)
_test_size = DS_SIZE - _train_size
ALPHA = 1e-8 # learning rate
LAMBDA = 0.5 # L2 regularization factor
TRAINING_STEPS = 1000
# generate the dataset, the labels and split into train/test
ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)] # synthesize data
# ds = normalize_data(ds)
ds = [(x, [f(x)+noise()]) for x in ds] # add labels
np.random.shuffle(ds)
train_data, train_labels = zip(*ds[0:_train_size])
test_data, test_labels = zip(*ds[_train_size:])
# define the computational graph
graph = tf.Graph()
with graph.as_default():
# declare graph inputs
x_train = tf.placeholder(tf.float32, shape=(_train_size, DIMENSIONS))
y_train = tf.placeholder(tf.float32, shape=(_train_size, 1))
x_test = tf.placeholder(tf.float32, shape=(_test_size, DIMENSIONS))
y_test = tf.placeholder(tf.float32, shape=(_test_size, 1))
theta = tf.Variable([[0.0] for _ in range(DIMENSIONS)])
theta_0 = tf.Variable([[0.0]]) # don't forget the bias term!
# forward propagation
train_prediction = tf.matmul(x_train, theta)+theta_0
test_prediction = tf.matmul(x_test, theta) +theta_0
# cost function and optimizer
train_cost = (tf.nn.l2_loss(train_prediction - y_train)+LAMBDA*tf.nn.l2_loss(theta))/float(_train_size)
optimizer = tf.train.GradientDescentOptimizer(ALPHA).minimize(train_cost)
# test results
test_cost = (tf.nn.l2_loss(test_prediction - y_test)+LAMBDA*tf.nn.l2_loss(theta))/float(_test_size)
# run the computation
with tf.Session(graph=graph) as s:
tf.initialize_all_variables().run()
print("initialized"); print(theta.eval())
for step in range(TRAINING_STEPS):
_, train_c, test_c = s.run([optimizer, train_cost, test_cost],
feed_dict={x_train: train_data, y_train: train_labels,
x_test: test_data, y_test: test_labels })
if (step%100==0):
# it should return bias close to zero and parameters all close to 1 (see definition of f)
print("\nAfter", step, "iterations:")
#print(" Bias =", theta_0.eval(), ", Weights = ", theta.eval())
print(" train cost =", train_c); print(" test cost =", test_c)
PARAMETERS_GRADDESC = tf.concat(0, [theta_0, theta]).eval()
print("Solution for parameters:\n", PARAMETERS_GRADDESC)
####################################################################################################
### NORMAL EQUATIONS APPROACH
####################################################################################################
# dataset globals
DIMENSIONS = 5
DS_SIZE = 5000
TRAIN_RATIO = 0.6 # 60% of the dataset isused for training
_train_size = int(DS_SIZE*TRAIN_RATIO)
_test_size = DS_SIZE - _train_size
f = lambda(x): sum(x) # the "true" function: f = 0 + 1*x1 + 1*x2 + 1*x3 ...
noise = lambda: np.random.normal(0,10) # some noise
# training globals
LAMBDA = 1e6 # L2 regularization factor
# generate the dataset, the labels and split into train/test
ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)]
ds = [([1]+x, [f(x)+noise()]) for x in ds] # add x[0]=1 dimension and labels
np.random.shuffle(ds)
train_data, train_labels = zip(*ds[0:_train_size])
test_data, test_labels = zip(*ds[_train_size:])
# define the computational graph
graph = tf.Graph()
with graph.as_default():
# declare graph inputs
x_train = tf.placeholder(tf.float32, shape=(_train_size, DIMENSIONS+1))
y_train = tf.placeholder(tf.float32, shape=(_train_size, 1))
theta = tf.Variable([[0.0] for _ in range(DIMENSIONS+1)]) # implicit bias!
# optimum
optimum = tf.matrix_solve_ls(x_train, y_train, LAMBDA, fast=True)
# run the computation: no loop needed!
with tf.Session(graph=graph) as s:
tf.initialize_all_variables().run()
print("initialized")
opt = s.run(optimum, feed_dict={x_train:train_data, y_train:train_labels})
PARAMETERS_NORMEQ = opt
print("Solution for parameters:\n",PARAMETERS_NORMEQ)
####################################################################################################
### PREDICTION AND ERROR RATE
####################################################################################################
# generate test dataset
ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)]
ds = [([1]+x, [f(x)+noise()]) for x in ds] # add x[0]=1 dimension and labels
test_data, test_labels = zip(*ds)
# define hypothesis
h_gd = lambda(x): PARAMETERS_GRADDESC.T.dot(x)
h_ne = lambda(x): PARAMETERS_NORMEQ.T.dot(x)
# define cost
mse = lambda pred, lab: ((pred-np.array(lab))**2).sum()/DS_SIZE
# make predictions!
predictions_gd = np.array([h_gd(x) for x in test_data])
predictions_ne = np.array([h_ne(x) for x in test_data])
# calculate and print total error
cost_gd = mse(predictions_gd, test_labels)
cost_ne = mse(predictions_ne, test_labels)
print("total cost with gradient descent:", cost_gd)
print("total cost with normal equations:", cost_ne)
Variables allow us to add trainable parameters to a graph. They are constructed with a type and initial value:
W = tf.Variable([.3], tf.float32)
b = tf.Variable([-.3], tf.float32)
x = tf.placeholder(tf.float32)
linear_model = W * x + b
The variable with type tf.Variable is the parameter which we will learn use TensorFlow. Assume you use the gradient descent to minimize the loss function. You need initial these parameter first. The rng.randn() is used to generate a random value for this purpose.
I think the Getting Started With TensorFlow is a good start point for you.
I'll first define the variables:
W is a multidimensional line that spans R^d (same dimensionality as X)
b is a scalar value (bias)
Y is also a scalar value i.e. the value at X
pred = W (dot) X + b # dot here refers to dot product
# cost equals the average squared error
cost = ((pred - Y)^2) / 2*num_samples
#finally optimizer
# optimizer computes the gradient with respect to each variable and the update
W += learning_rate * (pred - Y)/num_samples * X
b += learning_rate * (pred - Y)/num_samples
Why are W and b set to random well this updates based on gradients from the error calculated from the cost so W and b could have been initialized to anything. It isn't performing linear regression via least squares method although both will converge to the same solution.
Look here for more information: Getting Started

How to test my trained Tensor Flow model

I currently have a regression model that tries to predict a value based on 25 other ones.
Here is the code I currently gave
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
rng = np.random
learning_rate = 0.11
training_epochs = 1000
display_step = 50
X = np.random.randint(5,size=(100,25)).astype('float32')
y_data = np.random.randint(5,size=(100,1)).astype('float32')
m = 100
epochs = 100
W = tf.Variable(tf.zeros([25,1]))
b = tf.Variable(tf.zeros([1]))
y = tf.add( tf.matmul(X,W), b)
loss = tf.reduce_sum(tf.square(y - y_data)) / (2 * m)
loss = tf.Print(loss, [loss], "loss: ")
optimizer = tf.train.GradientDescentOptimizer(.01)
train = optimizer.minimize(loss)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(epochs):
sess.run(train)
sess.close()
I understand that right now these variables are all random so the accuracy would not be very good anyways, but I just want to know how to make a test set and find the accuracy of the predictions.
Typically, you split your training set into two pieces: roughly 2/3 for training and 1/3 for testing (opinions vary on the proportions). Train your model with the first set. Check the training accuracy (run the training set back through the model to see how many it gets right).
Now, run the remainder (test set) through the model and check how well the predictions match the results. "Find the accuracy" depends on what sort of predictions you're making: classification vs scoring, binary vs disjoint vs contiguous, etc.

Categories

Resources