Unexpected output for tf.nn.sparse_softmax_cross_entropy_with_logits - python

The TensorFlow documentation for tf.nn.sparse_softmax_cross_entropy_with_logits explicitly declares that I should not apply softmax to the inputs of this op:
This op expects unscaled logits, since it performs a softmax on logits
internally for efficiency. Do not call this op with the output of
softmax, as it will produce incorrect results.
However if I use cross entropy without softmax it gives me unexpected results. According to CS231n course the expected loss value is around 2.3 for CIFAR-10:
For example, for CIFAR-10 with a Softmax classifier we would expect
the initial loss to be 2.302, because we expect a diffuse probability
of 0.1 for each class (since there are 10 classes), and Softmax loss
is the negative log probability of the correct class so: -ln(0.1) =
2.302.
However without softmax I get much bigger values, for example 108.91984.
What exactly am I doing wrong with sparse_softmax_cross_entropy_with_logits? The TF code is shown below.
import tensorflow as tf
import numpy as np
from tensorflow.python import keras
(_, _), (x_test, y_test) = keras.datasets.cifar10.load_data()
x_test = np.reshape(x_test, [-1, 32, 32, 3])
y_test = np.reshape(y_test, (10000,))
y_test = y_test.astype(np.int32)
x = tf.placeholder(dtype=tf.float32, shape=(None, 32, 32, 3))
y = tf.placeholder(dtype=tf.int32, shape=(None,))
layer = tf.layers.Conv2D(filters=16, kernel_size=3)(x)
layer = tf.nn.relu(layer)
layer = tf.layers.Flatten()(layer)
layer = tf.layers.Dense(units=1000)(layer)
layer = tf.nn.relu(layer)
logits = tf.layers.Dense(units=10)(layer)
# If this line is uncommented I get expected value around 2.3
# logits = tf.nn.softmax(logits)
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
logits=logits)
loss = tf.reduce_mean(loss, name='cross_entropy')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
res = sess.run(loss, feed_dict={x: x_test[0:256], y: y_test[0:256]})
print("loss: ", res)
# Expected output is value close to 2.3
# Real outputs are 108.91984, 72.82324, etc.

The issue is not in the lines
# If this line is uncommented I get expected value around 2.3
# logits = tf.nn.softmax(logits)
Images in cifar10 dataset are in RGB, thus pixel values are in range [0, 256). If you divide your x_test by 255
x_test = np.reshape(x_test, [-1, 32, 32, 3]).astype(np.float32) / 255
the values will be rescaled to [0,1] and tf.nn.sparse_softmax_cross_entropy_with_logits will return expected values

Related

Tensorflow - replace MNIST on other dataset

I have a problem with using diffrent dataset then default from tensorflow.
I have code using MNIST dataset to recognize digits. In this application there is generated graph, which is imported later by android app.
Now I would like to recognize digits and math's operators (basic one: +, -, *, /).
I found script to generate data I need. I have two .pickle files.
But even with the dataset which suits for me, still I don't know how to import this dataset to my app with tensorflow.
I would be grateful for help with this or maybe to give me other (maybe easier) solution.
EDIT
I did some changes in the code which were adviced by gabriele.
Now I have error:
(x, label) = train_pickle_reader('train.pickle')
ValueError: too many values to unpack (expected 2)
I found the description of the dataset I used:
Extracts trace groups from inkml files.
Converts extracted trace groups into images. Images are square shaped bitmaps with only black (value 0) and white (value 1) pixels. Black color denotes patterns (ROI).
Labels those images (according to inkml files).
Flattens images to one-dimensional vectors.
Converts labels to one-hot format.
Dumps training and testing sets separately into outputs folder.
Below there is code in python:
import tensorflow as tf
import pickle
def train_pickle_reader(filename):
with open(filename, 'rb') as f:
x = pickle.load(f)
# assuming x is already of the form (all_train_input, all_train_labels):
return x
def test_pickle_reader(filename):
with open(filename, 'rb') as f:
x = pickle.load(f)
# assuming x is already of the form (all_train_input, all_train_labels):
return x
# Function to create a weight neuron using a random number. Training will assign a real weight later
def weight_variable(shape, name):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial, name=name)
# Function to create a bias neuron. Bias of 0.1 will help to prevent any 1 neuron from being chosen too often
def biases_variable(shape, name):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial, name=name)
# Function to create a convolutional neuron. Convolutes input from 4d to 2d. This helps streamline inputs
def conv_2d(x, W, name):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME', name=name)
# Function to create a neuron to represent the max input. Helps to make the best prediction for what comes next
def max_pool(x, name):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)
# A way to input images (as 784 element arrays of pixel values 0 - 1)
x_input = tf.placeholder(dtype=tf.float32, shape=[None, 784], name='x_input')
# A way to input labels to show model what the correct answer is during training
y_input = tf.placeholder(dtype=tf.float32, shape=[None, 10], name='y_input')
# First convolutional layer - reshape/resize images
# A weight variable that examines batches of 5x5 pixels, returns 32 features (1 feature per bit value in 32 bit float)
W_conv1 = weight_variable([5, 5, 1, 32], 'W_conv1')
# Bias variable to add to each of the 32 features
b_conv1 = biases_variable([32], 'b_conv1')
# Reshape each input image into a 28 x 28 x 1 pixel matrix
x_image = tf.reshape(x_input, [-1, 28, 28, 1], name='x_image')
# Flattens filter (W_conv1) to [5 * 5 * 1, 32], multiplies by [None, 28, 28, 1] to associate each 5x5 batch with the
# 32 features, and adds biases
h_conv1 = tf.nn.relu(conv_2d(x_image, W_conv1, name='conv1') + b_conv1, name='h_conv1')
# Takes windows of size 2x2 and computes a reduction on the output of h_conv1 (computes max, used for better prediction)
# Images are reduced to size 14 x 14 for analysis
h_pool1 = max_pool(h_conv1, name='h_pool1')
# Second convolutional layer, reshape/resize images
# Does mostly the same as above but converts each 32 unit output tensor from layer 1 to a 64 feature tensor
W_conv2 = weight_variable([5, 5, 32, 64], 'W_conv2')
b_conv2 = biases_variable([64], 'b_conv2')
h_conv2 = tf.nn.relu(conv_2d(h_pool1, W_conv2, name='conv2') + b_conv2, name='h_conv2')
# Images at this point are reduced to size 7 x 7 for analysis
h_pool2 = max_pool(h_conv2, name='h_pool2')
# First dense layer, performing calculation based on previous layer output
# Each image is 7 x 7 at the end of the previous section and outputs 64 features, we want 32 x 32 neurons = 1024
W_dense1 = weight_variable([7 * 7 * 64, 1024], name='W_dense1')
# bias variable added to each output feature
b_dense1 = biases_variable([1024], name='b_dense1')
# Flatten each of the images into size [None, 7 x 7 x 64]
h_pool_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64], name='h_pool_flat')
# Multiply weights by the outputs of the flatten neuron and add biases
h_dense1 = tf.nn.relu(tf.matmul(h_pool_flat, W_dense1, name='matmul_dense1') + b_dense1, name='h_dense1')
# Dropout layer prevents overfitting or recognizing patterns where none exist
# Depending on what value we enter into keep_prob, it will apply or not apply dropout layer
keep_prob = tf.placeholder(dtype=tf.float32, name='keep_prob')
# Dropout layer will be applied during training but not testing or predicting
h_drop1 = tf.nn.dropout(h_dense1, keep_prob, name='h_drop1')
# Readout layer used to format output
# Weight variable takes inputs from each of the 1024 neurons from before and outputs an array of 10 elements
W_readout1 = weight_variable([1024, 10], name='W_readout1')
# Apply bias to each of the 10 outputs
b_readout1 = biases_variable([10], name='b_readout1')
# Perform final calculation by multiplying each of the neurons from dropout layer by weights and adding biases
y_readout1 = tf.add(tf.matmul(h_drop1, W_readout1, name='matmul_readout1'), b_readout1, name='y_readout1')
# Softmax cross entropy loss function compares expected answers (labels) vs actual answers (logits)
cross_entropy_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_input, logits=y_readout1))
# Adam optimizer aims to minimize loss
train_step = tf.train.AdamOptimizer(0.0001).minimize(cross_entropy_loss)
# Compare actual vs expected outputs to see if highest number is at the same index, true if they match and false if not
correct_prediction = tf.equal(tf.argmax(y_input, 1), tf.argmax(y_readout1, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Used to save the graph and weights
saver = tf.train.Saver()
# Run in with statement so session only exists within it
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# Save the graph shape and node names to pbtxt file
tf.train.write_graph(sess.graph_def, '.', 'advanced_mnist.pbtxt', False)
(x, label) = train_pickle_reader('train.pickle')
batch_size = 64 # the batch size you want to use
num_batches = len(x)//batch_size
# Train the model, running through data 20000 times in batches of 50
# Print out step # and accuracy every 100 steps and final accuracy at the end of training
# Train by running train_step and apply dropout by setting keep_prob to 0.5
for i in range(20000):
for j in range(num_batches):
x_batch = x[j * batch_size: (j + 1) * batch_size]
label_batch = label[j * batch_size: (j + 1)*batch_size]
train_step.run(feed_dict={x_input: x_batch, y_input: label_batch, keep_prob: 0.5})
# Save the session with graph shape and node weights
saver.save(sess, 'advanced_mnist.ckpt')
# Make a prediction
(x, labels) = test_pickle_reader('test.pickle')
print(sess.run(y_readout1, feed_dict={x_input: x, keep_prob: 1.0}))
In your code, after instantiating a tf.Session(), the line batch = mnist_data.train.next_batch(50) calls a built in function which returns a tuple of the kind (input, label). In order to feed the network with your data, here you need to define some function returning i.e. a numpy array having the input data and the associated label. For example, assuming you have a pickle file containing your training data, your code should look something like:
def pikle_reader(filename):
with open(filename, 'r') as f:
x = pickle.load(f)
# assuming x is already of the form (all_train_input, all_train_labels):
return x
[...]
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
[...]
# get your data:
(x, label) = pikle_reader(filename)
batch_size = 64 # the batch size you want to use
num_batches = len(x)//batch_size
for i in range(20000): # number of epochs
for j in range(num_batches):
x_batch = x[j*batch_size: (j+1)*batch_size]
label_batch = label[j* batch_size: (j+1)batch_size]
train_step.run(feed_dict={x_input: x_batch, y_input: label_batch, keep_prob: 0.5})
Here, feed_dict feeds the placeholders x_input with the values in x_batch and the placeholder y_input with label_batch. Then in the session the code will run the train_step operation.
Instead, when you want to make a prediction the code is basically the same:
(x, label) = pikle_reader(test_data_filename)
print(sess.run(y_readout1, feed_dict={x_input: x, keep_prob: 1.0}))

tensorflow prediction on test dataset

I am having trouble printing out the prediction on test data.
Can anyone help me filling in the input to sess.run at output step Thanks!
def nn_model(data):
convnet = conv_2d(in_data, 32, 3, padding='same', activation='relu')
convnet = max_pool_2d(convnet, 2)
logits = nn_model(next_element)
prediction = tf.argmax(logits, 1)
with tf.Session() as sess:
sess.run(init_op)
sess.run(training_init_op)
for i in range(epochs):
l, _, acc = sess.run([loss, optimizer, accuracy])
output = sess.run(prediction, ***{logits:nn_model(test_data)}***)
output = np.argmax(output)
print("The prediction for test is :", output)
On this line:
output = sess.run(prediction, ***{logits:nn_model(test_data)}***)
You seem to be trying to pass in your test data (presumably in numpy format) to your logits. logits is traditionally the name of the output of your model, which is quite confusing. In your nn_model function you should be returning both the logits (output of the model) and placeholders for your input. Normally you have something like this:
x = tf.placeholder(tf.float32, shape=(None, 1024))
labels = tf.placeholder(tf.float32, shape=(None,))
Now your output looks something like:
output = sess.run(logits, feed_dict={x:test_data, y:test_labels}
In the case of test data you might not need to pass in labels, but if you wanted to compute accuracy you would need them, you decide depending on your need.
Here are some really nice examples you can follow:
https://github.com/aymericdamien/TensorFlow-Examples

Tensorflow Sampled Softmax Loss Correct Usage

In a classification problem with many classes, tensorflow docs suggests using sampled_softmax_loss over a simple softmax to reduce training runtime.
According to the docs and source (line 1180), the call pattern for sampled_softmax_loss is:
tf.nn.sampled_softmax_loss(weights, # Shape (num_classes, dim) - floatXX
biases, # Shape (num_classes) - floatXX
labels, # Shape (batch_size, num_true) - int64
inputs, # Shape (batch_size, dim) - floatXX
num_sampled, # - int
num_classes, # - int
num_true=1,
sampled_values=None,
remove_accidental_hits=True,
partition_strategy="mod",
name="sampled_softmax_loss")
It's unclear (at least to me) how to convert a real world problem into the shapes that this loss function requires. I think the 'inputs' field is the problem.
Here is a copy-paste-ready minimum working example that throws a matrix multiplication shape error when calling the loss function.
import tensorflow as tf
# Network Parameters
n_hidden_1 = 256 # 1st layer number of features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# Dependent & Independent Variable Placeholders
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes]) #
# Weights and Biases
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'out': tf.Variable(tf.random_normal([n_hidden_1, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
# Super simple model builder
def tiny_perceptron(x, weights, biases):
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
out_layer = tf.matmul(layer_1, weights['out']) + biases['out']
return out_layer
# Create the model
pred = tiny_perceptron(x, weights, biases)
# Set up loss function inputs and inspect their shapes
w = tf.transpose(weights['out'])
b = biases['out']
labels = tf.reshape(tf.argmax(y, 1), [-1,1])
inputs = pred
num_sampled = 3
num_true = 1
num_classes = n_classes
print('Shapes\n------\nw:\t%s\nb:\t%s\nlabels:\t%s\ninputs:\t%s' % (w.shape, b.shape, labels.shape, inputs.shape))
# Shapes
# ------
# w: (10, 256) # Requires (num_classes, dim) - CORRECT
# b: (10,) # Requires (num_classes) - CORRECT
# labels: (?, 1) # Requires (batch_size, num_true) - CORRECT
# inputs: (?, 10) # Requires (batch_size, dim) - Not sure
loss_function = tf.reduce_mean(tf.nn.sampled_softmax_loss(
weights=w,
biases=b,
labels=labels,
inputs=inputs,
num_sampled=num_sampled,
num_true=num_true,
num_classes=num_classes))
The final line triggers and ValueError, stating that you cant multiply tensors with shape (?,10) and (?,256). As a general rule, I'd agree with that statement. Full error shown below:
ValueError: Dimensions must be equal, but are 10 and 256 for 'sampled_softmax_loss_2/MatMul_1' (op: 'MatMul') with input shapes: [?,10], [?,256].
If the 'dim' value from tensorflow docs is intended to be constant, either the 'weights' or 'inputs' variables going into the loss function are incorrect.
Any thoughts would be awesome, I'm totally stumped on how to use this loss function correctly & it would have a huge impact on training time for the model we're using it for (500k classes). Thanks!
---EDIT---
It is possible to get the sample shown above to run without errors by playing with parameters and ignoring the sampled_softmax_loss call pattern's expected inputs. If you do that, it results in a trainable model that has 0 impact on prediction accuracy (as you would expect).
In your softmax layer you are multiplying your network predictions, which have dimension (num_classes,) by your w matrix which has dimension (num_classes, num_hidden_1), so you end up trying to compare your target labels of size (num_classes,) to something that is now size (num_hidden_1,). Change your tiny perceptron to output layer_1 instead, then change the definition of your cost. The code below might do the trick.
def tiny_perceptron(x, weights, biases):
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
return layer_1
layer_1 = tiny_perceptron(x, weights, biases)
loss_function = tf.reduce_mean(tf.nn.sampled_softmax_loss(
weights=weights['h1'],
biases=biases['b1'],
labels=labels,
inputs=layer_1,
num_sampled=num_sampled,
num_true=num_true,
num_classes=num_classes))
When you train your network with some optimizer, you will tell it to minimize loss_function, which should mean that it will adjust both sets of weights and biases.
The key point is to pass right shape of weight, bias, input and label. The shape of weight passed to sampled_softmax is not the the same with the general situation.
For example, logits = xw + b, call sampled_softmax like this:
sampled_softmax(weight=tf.transpose(w), bias=b, inputs=x), NOT sampled_softmax(weight=w, bias=b, inputs=logits)!!
Besides, label is not one-hot representation. if your labels are one-hot represented, pass labels=tf.reshape(tf.argmax(labels_one_hot, 1), [-1,1])

Tensorflow: Convert Tensor to numpy array then pass into a feed_dict

I'm trying to build a softmax regression model for CIFAR classification. At first when I tried to pass in my images and labels into the feed dictionary, I got an error that said that feed dictionaries do not accept Tensors. I then converted them into numpy arrays using .eval() but the program hangs at the .eval() line and does not continue any further. How can I pass this data into the feed_dict?
CIFARIMAGELOADING.PY
import tensorflow as tf
import os
import tensorflow.models.image.cifar10 as cf
IMAGE_SIZE = 24
BATCH_SIZE = 128
def loadimagesandlabels(size):
# Load the images from the CIFAR data directory
FLAGS = tf.app.flags.FLAGS
data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')
filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i) for i in xrange(1, 6)]
filename_queue = tf.train.string_input_producer(filenames)
read_input = cf.cifar10_input.read_cifar10(filename_queue)
# Reshape and crop the image
height = IMAGE_SIZE
width = IMAGE_SIZE
reshaped_image = tf.cast(read_input.uint8image, tf.float32)
cropped_image = tf.random_crop(reshaped_image, [height, width, 3])
# Generate a batch of images and labels by building up a queue of examples
print('Filling queue with CIFAR images')
num_preprocess_threads = 16
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(BATCH_SIZE*min_fraction_of_examples_in_queue)
images, label_batch = tf.train.batch([cropped_image,read_input.label],batch_size=BATCH_SIZE, num_threads=num_preprocess_threads, capacity=min_queue_examples+3*BATCH_SIZE)
print(images)
print(label_batch)
return images, tf.reshape(label_batch, [BATCH_SIZE])
CIFAR.PY
#Set up placeholder vectors for image and labels
x = tf.placeholder(tf.float32, shape = [None, 1728])
y_ = tf.placeholder(tf.float32, shape = [None,10])
W = tf.Variable(tf.zeros([1728,10]))
b = tf.Variable(tf.zeros([10]))
#Implement regression model. Multiply input images x by weight matrix W, add the bias b
#Compute the softmax probabilities that are assigned to each class
y = tf.nn.softmax(tf.matmul(x,W) + b)
#Define cross entropy
#tf.reduce sum sums across all classes and tf.reduce_mean takes the average over these sums
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_*tf.log(y), reduction_indices = [1]))
#Train the model
#Each training iteration we load 128 training examples. We then run the train_step operation
#using feed_dict to replace the placeholder tensors x and y_ with the training examples
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
#Open up a Session
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000) :
images, labels = CIFARImageLoading.loadimagesandlabels(size=BATCH_SIZE)
unrolled_images = tf.reshape(images, (1728, BATCH_SIZE))
#convert labels to their one_hot representations
# should produce [[1,0,0,...],[0,1,0...],...]
one_hot_labels = tf.one_hot(indices= labels, depth=NUM_CLASSES, on_value=1.0, off_value= 0.0, axis=-1)
print(unrolled_images)
print(one_hot_labels)
images_numpy, labels_numpy = unrolled_images.eval(session=sess), one_hot_labels.eval(session=sess)
sess.run(train_step, feed_dict = {x: images_numpy, y_:labels_numpy})
#Evaluate the model
#.equal returns a tensor of booleans, we want to cast these as floats and then take their mean
#to get percent correctness (accuracy)
print("evaluating")
test_images, test_labels = CIFARImageLoading.loadimagesandlabels(TEST_SIZE)
test_images_unrolled = tf.reshape(test_images, (1728, TEST_SIZE))
test_images_one_hot = tf.one_hot(indices= test_labels, depth=NUM_CLASSES, on_value=1.0, off_value= 0.0, axis=-1)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(accuracy.eval(feed_dict = {x: unrolled_images.eval(), y_ : test_images_one_hot.eval()}))
Theres a couple of things that you not are understanding really well. Throughout your graph you will work with Tensors. You define Tensors by either using tf.placeholder and feeding them in the session.run(feed_dict{}) or with tf.Variable and initializing it with session.run(tf.initialize_all_variables()). You must feed your input this way, and it should be numpy arrays in the same as shape as you expect in the placeholders. Here's a simple example:
images = tf.placeholder(type, [1728, BATCH_SIZE])
labels = tf.placeholder(type, [size])
'''
Build your network here so you have the variable: Output
'''
images_feed, labels_feed = CIFARImageLoading.loadimagesandlabels(size=BATCH_SIZE)
# here you can see your output
print sess.run(Output, feed_dict = {x: images_feed, y_:labels_feed})
You do not feed tf.functions with numpy arrays, you always feed them with Tensors. And the feed_dict is always fed with numpy arrays. The thing is: you will never have to convert tensors to numpy arrays for the input, that does not make sense. Your input must be numpy arrays, if it's a list, you can use np.asarray(list), if it's a tensor, you are doing this wrong.
I do not know what CIFARImageLoading.loadimagesandlabels returns to you, but I imagine it's not a Tensor, it's probably a numpy array already, so just get rid of this .eval().

tensorflow deep neural network for regression always predict same results in one batch

I use a tensorflow to implement a simple multi-layer perceptron for regression. The code is modified from standard mnist classifier, that I only changed the output cost to MSE (use tf.reduce_mean(tf.square(pred-y))), and some input, output size settings. However, if I train the network using regression, after several epochs, the output batch are totally the same. for example:
target: 48.129, estimated: 42.634
target: 46.590, estimated: 42.634
target: 34.209, estimated: 42.634
target: 69.677, estimated: 42.634
......
I have tried different batch size, different initialization, input normalization using sklearn.preprocessing.scale (my inputs range are quite different). However, none of them worked. I have also tried one of sklearn example from Tensorflow (Deep Neural Network Regression with Boston Data). But I got another error in line 40:
'module' object has no attribute 'infer_real_valued_columns_from_input'
Anyone has clues on where the problem is? Thank you
My code is listed below, may be a little bit long, but very straghtforward:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from tensorflow.contrib import learn
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn import datasets, linear_model
from sklearn import cross_validation
import numpy as np
boston = learn.datasets.load_dataset('boston')
x, y = boston.data, boston.target
X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(
x, y, test_size=0.2, random_state=42)
total_len = X_train.shape[0]
# Parameters
learning_rate = 0.001
training_epochs = 500
batch_size = 10
display_step = 1
dropout_rate = 0.9
# Network Parameters
n_hidden_1 = 32 # 1st layer number of features
n_hidden_2 = 200 # 2nd layer number of features
n_hidden_3 = 200
n_hidden_4 = 256
n_input = X_train.shape[1]
n_classes = 1
# tf Graph input
x = tf.placeholder("float", [None, 13])
y = tf.placeholder("float", [None])
# Create model
def multilayer_perceptron(x, weights, biases):
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Hidden layer with RELU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Hidden layer with RELU activation
layer_3 = tf.add(tf.matmul(layer_2, weights['h3']), biases['b3'])
layer_3 = tf.nn.relu(layer_3)
# Hidden layer with RELU activation
layer_4 = tf.add(tf.matmul(layer_3, weights['h4']), biases['b4'])
layer_4 = tf.nn.relu(layer_4)
# Output layer with linear activation
out_layer = tf.matmul(layer_4, weights['out']) + biases['out']
return out_layer
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1], 0, 0.1)),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2], 0, 0.1)),
'h3': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3], 0, 0.1)),
'h4': tf.Variable(tf.random_normal([n_hidden_3, n_hidden_4], 0, 0.1)),
'out': tf.Variable(tf.random_normal([n_hidden_4, n_classes], 0, 0.1))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1], 0, 0.1)),
'b2': tf.Variable(tf.random_normal([n_hidden_2], 0, 0.1)),
'b3': tf.Variable(tf.random_normal([n_hidden_3], 0, 0.1)),
'b4': tf.Variable(tf.random_normal([n_hidden_4], 0, 0.1)),
'out': tf.Variable(tf.random_normal([n_classes], 0, 0.1))
}
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.square(pred-y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Launch the graph
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(total_len/batch_size)
# Loop over all batches
for i in range(total_batch-1):
batch_x = X_train[i*batch_size:(i+1)*batch_size]
batch_y = Y_train[i*batch_size:(i+1)*batch_size]
# Run optimization op (backprop) and cost op (to get loss value)
_, c, p = sess.run([optimizer, cost, pred], feed_dict={x: batch_x,
y: batch_y})
# Compute average loss
avg_cost += c / total_batch
# sample prediction
label_value = batch_y
estimate = p
err = label_value-estimate
print ("num batch:", total_batch)
# Display logs per epoch step
if epoch % display_step == 0:
print ("Epoch:", '%04d' % (epoch+1), "cost=", \
"{:.9f}".format(avg_cost))
print ("[*]----------------------------")
for i in xrange(3):
print ("label value:", label_value[i], \
"estimated value:", estimate[i])
print ("[*]============================")
print ("Optimization Finished!")
# Test model
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print ("Accuracy:", accuracy.eval({x: X_test, y: Y_test}))
Short answer:
Transpose your pred vector using tf.transpose(pred).
Longer answer:
The problem is that pred (the predictions) and y (the labels) are not of the same shape: one is a row vector and the other a column vector. Apparently when you apply an element-wise operation on them, you'll get a matrix, which is not what you want.
The solution is to transpose the prediction vector using tf.transpose() to get a proper vector and thus a proper loss function. Actually, if you set the batch size to 1 in your example you'll see that it works even without the fix, because transposing a 1x1 vector is a no-op.
I applied this fix to your example code and observed the following behaviour. Before the fix:
Epoch: 0245 cost= 84.743440580
[*]----------------------------
label value: 23 estimated value: [ 27.47437096]
label value: 50 estimated value: [ 24.71126747]
label value: 22 estimated value: [ 23.87785912]
And after the fix at the same point in time:
Epoch: 0245 cost= 4.181439120
[*]----------------------------
label value: 23 estimated value: [ 21.64333534]
label value: 50 estimated value: [ 48.76105118]
label value: 22 estimated value: [ 24.27996063]
You'll see that the cost is much lower and that it actually learned the value 50 properly. You'll have to do some fine-tuning on the learning rate and such to improve your results of course.
There is likely a problem with your dataset loading or indexing implementation. If you only modified the cost to MSE, make sure pred and y are correctly being updated and you did not overwrite them with a different graph operation.
Another thing to help debug would be to predict the actual regression outputs. It would also help if you posted more of your code so we can see your specific data loading implementation, etc.

Categories

Resources