Problems with KNN implemantion in TensorFlow - python

I am struggling to implement K-Nearest Neighbor in TensorFlow. I think that either I am overlooking a mistake or doing something terrible wrong.
The following code always predicts Mnist labels as 0.
from __future__ import print_function
import numpy as np
import tensorflow as tf
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
K = 4
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
# In this example, we limit mnist data
Xtr, Ytr = mnist.train.next_batch(55000) # whole training set
Xte, Yte = mnist.test.next_batch(10000) # whole test set
# tf Graph Input
xtr = tf.placeholder("float", [None, 784])
ytr = tf.placeholder("float", [None, 10])
xte = tf.placeholder("float", [784])
# Euclidean Distance
distance = tf.neg(tf.sqrt(tf.reduce_sum(tf.square(tf.sub(xtr, xte)), reduction_indices=1)))
# Prediction: Get min distance neighbors
values, indices = tf.nn.top_k(distance, k=K, sorted=False)
nearest_neighbors = []
for i in range(K):
nearest_neighbors.append(np.argmax(ytr[indices[i]]))
sorted_neighbors, counts = np.unique(nearest_neighbors, return_counts=True)
pred = tf.Variable(nearest_neighbors[np.argmax(counts)])
# not works either
# neighbors_tensor = tf.pack(nearest_neighbors)
# y, idx, count = tf.unique_with_counts(neighbors_tensor)
# pred = tf.slice(y, begin=[tf.arg_max(count, 0)], size=tf.constant([1], dtype=tf.int64))[0]
accuracy = 0.
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# loop over test data
for i in range(len(Xte)):
# Get nearest neighbor
nn_index = sess.run(pred, feed_dict={xtr: Xtr, xte: Xte[i, :]})
# Get nearest neighbor class label and compare it to its true label
print("Test", i, "Prediction:", nn_index,
"True Class:", np.argmax(Yte[i]))
# Calculate accuracy
if nn_index == np.argmax(Yte[i]):
accuracy += 1. / len(Xte)
print("Done!")
print("Accuracy:", accuracy)
Any help is greatly appreciated.

So in general it's not a good idea to go to numpy functions while defining your TensorFlow model. That's precisely why your code wasn't working. I have made just two changes to your code. I have replaced np.argmax with tf.argmax. I've also removed the comments from #This doesn't work either.
Here is the complete working code:
from __future__ import print_function
import numpy as np
import tensorflow as tf
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
K = 4
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
# In this example, we limit mnist data
Xtr, Ytr = mnist.train.next_batch(55000) # whole training set
Xte, Yte = mnist.test.next_batch(10000) # whole test set
# tf Graph Input
xtr = tf.placeholder("float", [None, 784])
ytr = tf.placeholder("float", [None, 10])
xte = tf.placeholder("float", [784])
# Euclidean Distance
distance = tf.negative(tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(xtr, xte)), reduction_indices=1)))
# Prediction: Get min distance neighbors
values, indices = tf.nn.top_k(distance, k=K, sorted=False)
nearest_neighbors = []
for i in range(K):
nearest_neighbors.append(tf.argmax(ytr[indices[i]], 0))
neighbors_tensor = tf.stack(nearest_neighbors)
y, idx, count = tf.unique_with_counts(neighbors_tensor)
pred = tf.slice(y, begin=[tf.argmax(count, 0)], size=tf.constant([1], dtype=tf.int64))[0]
accuracy = 0.
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# loop over test data
for i in range(len(Xte)):
# Get nearest neighbor
nn_index = sess.run(pred, feed_dict={xtr: Xtr, ytr: Ytr, xte: Xte[i, :]})
# Get nearest neighbor class label and compare it to its true label
print("Test", i, "Prediction:", nn_index,
"True Class:", np.argmax(Yte[i]))
#Calculate accuracy
if nn_index == np.argmax(Yte[i]):
accuracy += 1. / len(Xte)
print("Done!")
print("Accuracy:", accuracy)

Related

tensorflow does not train RNN variables

For a complex text classification task I am training an RNN model. Somehow the weights of the RNN only change at the beginning and then with very tiny steps (gradients are in the range of e^-7):
For those of you who are not familiar with tensorboard: This shows the distribution of values for the bias and the weights of the RNN (x axis is the value, y axis the number of values and z the training iterations).
I constructed a toy example which does not make sense but reproduces the same behaviour:
import numpy as np
import tensorflow as tf
tensorboard_save_path = "../RNN/tensorboard/supersimple/"
x = np.random.normal(size=(33, 20, 5000))
y = np.array([1 if i>0.5 else 0 for i in np.random.random(33)])
##### NETWORK #########
with tf.name_scope("RNN"):
rnn_cell = tf.contrib.rnn.BasicRNNCell(1)
outputs, states = tf.nn.dynamic_rnn(rnn_cell, x, dtype=tf.float64)
rnn_weights, rnn_biases = rnn_cell.variables
tf.summary.histogram("RNN weights", rnn_weights)
tf.summary.histogram("RNN biases", rnn_biases)
pred = tf.sigmoid(outputs[:,-1])
with tf.name_scope("cost"):
cost = tf.losses.mean_squared_error(predictions=pred, labels=np.reshape(y, (33,1)))
with tf.name_scope("train"):
optimizer = tf.train.AdagradOptimizer(learning_rate=0.5).minimize(cost)
init = tf.global_variables_initializer()
merged_summary = tf.summary.merge_all()
writer = tf.summary.FileWriter(tensorboard_save_path)
print("\ttensorboard --logdir=" + tensorboard_save_path)
sess = tf.Session()
sess.run(init)
for i in range(1000):
sess.run([optimizer])
if i % 50 == 0:
c, s = sess.run([cost, merged_summary])
writer.add_summary(s, i)
print("cost is %f" % c)
Expected behaviour would be that the model overfits due to the huge amount of variables in contrast to the few training samples. Any idea what's going wrong here?

tensorflow kmeans doesn't seem to take new initial points

I'm finding the best cluster set in my data by getting a result which has the lowest average distance from many k means trials on Tensorflow.
But my code doesn't update initial centroids in each trial so all results are same.
Here's my code1 - tensor_kmeans.py
import numpy as np
import pandas as pd
import random
import tensorflow as tf
from tensorflow.contrib.factorization import KMeans
from sklearn import metrics
import imp
import pickle
# load as DataFrame
pkl = 'fasttext_words_k.pkl'
with open(pkl, 'rb') as f:
unique_words_in_fasttext = pickle.load(f).T
vector =[]
for i in range(len(unique_words_in_fasttext)):
vector.append(list(unique_words_in_fasttext.iloc[i,:]))
vector = [np.array(f) for f in vector ]
# Import data
full_data_x = vector
# Parameters
num_steps = 100 # Total steps to train
batch_size = 1024 # The number of samples per batch
n_clusters = 1300 # The number of clusters
num_classes = 100 # The 10 digits
num_rows = 13074
num_features = 300 # Each image is 28x28 pixels
### tensor kmeans ###
# Input images
X = tf.placeholder(tf.float32, shape=[None , num_features])
# Labels (for assigning a label to a centroid and testing)
# Y = tf.placeholder(tf.float32, shape=[None, num_classes])
# K-Means Parameters
kmeans = KMeans(inputs=X, num_clusters=n_clusters, distance_metric='cosine',
use_mini_batch=True, initial_clusters="random")
# Build KMeans graph
training_graph = kmeans.training_graph()
if len(training_graph) > 6: # Tensorflow 1.4+
(all_scores, cluster_idx, scores, cluster_centers_initialized,
cluster_centers_var, init_op, train_op) = training_graph
else:
(all_scores, cluster_idx, scores, cluster_centers_initialized,
init_op, train_op) = training_graph
cluster_idx = cluster_idx[0] # fix for cluster_idx being a tuple
avg_distance = tf.reduce_mean(scores)
# Initialize the variables (i.e. assign their default value)
init_vars = tf.global_variables_initializer()
# Start TensorFlow session
sess = tf.Session()
# Run the initializer
sess.run(init_vars, feed_dict={X: full_data_x})
sess.run(init_op, feed_dict={X: full_data_x})
# Training
for i in range(1, num_steps + 1):
_, d, idx = sess.run([train_op, avg_distance, cluster_idx],
feed_dict={X: full_data_x})
if i % 10 == 0 or i == 1:
print("Step %i, Avg Distance: %f" % (i, d))
labels = list(range(num_rows))
# Assign a label to each centroid
# Count total number of labels per centroid, using the label of each training
# sample to their closest centroid (given by 'idx')
counts = np.zeros(shape=(n_clusters, num_classes))
for i in range(len(idx)):
counts[idx[i]] += labels[i]
# Assign the most frequent label to the centroid
labels_map = [np.argmax(c) for c in counts]
labels_map = tf.convert_to_tensor(labels_map)
# Evaluation ops
# Lookup: centroid_id -> label
cluster_label = tf.nn.embedding_lookup(labels_map, cluster_idx)
# assign variables
cluster_list_k = idx
and here's a code outside the code1.
k_li=[]
rotation = 50
best_labels = []
best_k = -1
for i in range(rotation):
import tensor_kmeans
k_li.append(tensor_kmeans.k)
if len(k_li) > 0:
for i in range(len(k_li)):
if k_li[i] > best_k:
best_labels = tensor_kmeans.cluster_list_k
best_k = k_li[i]
tensor_kmeans = imp.reload(tensor_kmeans)
Where can I find the problem?
I'm waiting your answer, thank you.
Each time you call KMeans() you should use a new random_seed, i.e.
kmeans = KMeans(inputs=X, num_clusters=n_clusters, distance_metric='cosine',
use_mini_batch=True, initial_clusters="random", random_seed=SOME_NEW_VALUE)
Otherwise the function KMeans() will assume random_seed=0, so that the results are reproducible (i.e. the results are always the same).
A simple way to resolve your issue would be to make a function out of code1 - tensor_kmeans.py, then calling this function with a new random_seed (as input parameter) for each trial.

Making predictions with tensorflow

I'm really a beginner with tensor flow and in all of this field, but I've seen all the lectures of Andrej Karpathy in CS231n class so I'm understanding the code.
So this is the code (not mine): https://github.com/nfmcclure/tensorflow_cookbook/tree/master/09_Recurrent_Neural_Networks/02_Implementing_RNN_for_Spam_Prediction
# Implementing an RNN in TensorFlow
# ----------------------------------
#
# We implement an RNN in TensorFlow to predict spam/ham from texts
#
# https://github.com/nfmcclure/tensorflow_cookbook/blob/master/09_Recurrent_Neural_Networks/02_Implementing_RNN_for_Spam_Prediction/02_implementing_rnn.py
import os
import re
import io
import glob
import requests
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from zipfile import ZipFile
from tensorflow.python.framework import ops
ops.reset_default_graph()
# Start a graph
sess = tf.Session()
# Set RNN parameters
epochs = 20
batch_size = 250
max_sequence_length = 25
rnn_size = 10
embedding_size = 50
min_word_frequency = 10
learning_rate = 0.0005
dropout_keep_prob = tf.placeholder(tf.float32)
# Download or open data
data_dir = 'temp'
data_file = 'text_data.txt'
if not os.path.exists(data_dir):
os.makedirs(data_dir)
if not os.path.isfile(os.path.join(data_dir, data_file)):
zip_url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/00228/smsspamcollection.zip'
r = requests.get(zip_url)
z = ZipFile(io.BytesIO(r.content))
file = z.read('SMSSpamCollection')
# Format Data
text_data = file.decode()
text_data = text_data.encode('ascii', errors='ignore')
text_data = text_data.decode().split('\n')
# Save data to text file
with open(os.path.join(data_dir, data_file), 'w') as file_conn:
for text in text_data:
file_conn.write("{}\n".format(text))
else:
# Open data from text file
text_data = []
with open(os.path.join(data_dir, data_file), 'r') as file_conn:
for row in file_conn:
text_data.append(row)
text_data = text_data[:-1]
text_data = [x.split('\t') for x in text_data if len(x) >= 1]
text_data = [x for x in text_data if len(x) > 1]
print([list(x) for x in zip(*text_data)])
[text_data_target, text_data_train] = [list(x) for x in zip(*text_data)]
# Create a text cleaning function
def clean_text(text_string):
text_string = re.sub(r'([^\s\w]|_|[0-9])+', '', text_string)
text_string = " ".join(text_string.split())
text_string = text_string.lower()
return (text_string)
# Clean texts
text_data_train = [clean_text(x) for x in text_data_train]
# Change texts into numeric vectors
vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(max_sequence_length,
min_frequency=min_word_frequency)
text_processed = np.array(list(vocab_processor.fit_transform(text_data_train)))
# Shuffle and split data
text_processed = np.array(text_processed)
text_data_target = np.array([1 if x == 'ham' else 0 for x in text_data_target])
shuffled_ix = np.random.permutation(np.arange(len(text_data_target)))
x_shuffled = text_processed[shuffled_ix]
y_shuffled = text_data_target[shuffled_ix]
# Split train/test set
ix_cutoff = int(len(y_shuffled) * 0.80)
x_train, x_test = x_shuffled[:ix_cutoff], x_shuffled[ix_cutoff:]
y_train, y_test = y_shuffled[:ix_cutoff], y_shuffled[ix_cutoff:]
vocab_size = len(vocab_processor.vocabulary_)
print("Vocabulary Size: {:d}".format(vocab_size))
print("80-20 Train Test split: {:d} -- {:d}".format(len(y_train), len(y_test)))
# Create placeholders
x_data = tf.placeholder(tf.int32, [None, max_sequence_length])
y_output = tf.placeholder(tf.int32, [None])
# Create embedding
embedding_mat = tf.Variable(tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0))
embedding_output = tf.nn.embedding_lookup(embedding_mat, x_data)
# embedding_output_expanded = tf.expand_dims(embedding_output, -1)
# Define the RNN cell
# tensorflow change >= 1.0, rnn is put into tensorflow.contrib directory. Prior version not test.
if tf.__version__[0] >= '1':
cell = tf.contrib.rnn.BasicRNNCell(num_units=rnn_size)
else:
cell = tf.nn.rnn_cell.BasicRNNCell(num_units=rnn_size)
output, state = tf.nn.dynamic_rnn(cell, embedding_output, dtype=tf.float32)
output = tf.nn.dropout(output, dropout_keep_prob)
# Get output of RNN sequence
output = tf.transpose(output, [1, 0, 2])
last = tf.gather(output, int(output.get_shape()[0]) - 1)
weight = tf.Variable(tf.truncated_normal([rnn_size, 2], stddev=0.1))
bias = tf.Variable(tf.constant(0.1, shape=[2]))
logits_out = tf.matmul(last, weight) + bias
# Loss function
losses = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits_out,
labels=y_output) # logits=float32, labels=int32
loss = tf.reduce_mean(losses)
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits_out, 1), tf.cast(y_output, tf.int64)), tf.float32))
optimizer = tf.train.RMSPropOptimizer(learning_rate)
train_step = optimizer.minimize(loss)
init = tf.global_variables_initializer()
sess.run(init)
train_loss = []
test_loss = []
train_accuracy = []
test_accuracy = []
# Start training
for epoch in range(epochs):
# Shuffle training data
shuffled_ix = np.random.permutation(np.arange(len(x_train)))
x_train = x_train[shuffled_ix]
y_train = y_train[shuffled_ix]
num_batches = int(len(x_train) / batch_size) + 1
# TO DO CALCULATE GENERATIONS ExACTLY
for i in range(num_batches):
# Select train data
min_ix = i * batch_size
max_ix = np.min([len(x_train), ((i + 1) * batch_size)])
x_train_batch = x_train[min_ix:max_ix]
y_train_batch = y_train[min_ix:max_ix]
# Run train step
train_dict = {x_data: x_train_batch, y_output: y_train_batch, dropout_keep_prob: 0.5}
sess.run(train_step, feed_dict=train_dict)
# Run loss and accuracy for training
temp_train_loss, temp_train_acc = sess.run([loss, accuracy], feed_dict=train_dict)
train_loss.append(temp_train_loss)
train_accuracy.append(temp_train_acc)
# Run Eval Step
test_dict = {x_data: x_test, y_output: y_test, dropout_keep_prob: 1.0}
temp_test_loss, temp_test_acc = sess.run([loss, accuracy], feed_dict=test_dict)
test_loss.append(temp_test_loss)
test_accuracy.append(temp_test_acc)
print('Epoch: {}, Test Loss: {:.2}, Test Acc: {:.2}'.format(epoch + 1, temp_test_loss, temp_test_acc))
# Plot loss over time
epoch_seq = np.arange(1, epochs + 1)
plt.plot(epoch_seq, train_loss, 'k--', label='Train Set')
plt.plot(epoch_seq, test_loss, 'r-', label='Test Set')
plt.title('Softmax Loss')
plt.xlabel('Epochs')
plt.ylabel('Softmax Loss')
plt.legend(loc='upper left')
plt.show()
# Plot accuracy over time
plt.plot(epoch_seq, train_accuracy, 'k--', label='Train Set')
plt.plot(epoch_seq, test_accuracy, 'r-', label='Test Set')
plt.title('Test Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(loc='upper left')
plt.show()
def findFiles(path): return glob.glob(path)
pred_array = "words"
pred_num = np.array(list(vocab_processor.fit_transform(pred_array)))
print(pred_num)
pred_output = tf.placeholder(tf.float32,[1,len(pred_array),max_sequence_length])
feed_dict = {pred_output: [pred_num]}
classification = sess.run(losses, feed_dict)
print(classification)
It's a RNN spam classifier, and It's working great (accept for the part I wrote at the end where I'm trying to create the predictions).
I'm just want to understand how to create a prediction function to this, something that looks like that:
def predict(text): # text is a string (my mail)
# Doing prediction stuff
return (top result) # ham or spam
The last few lines are my last try is giving me the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float
[[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Also I tried to do something using Making predictions with a TensorFlow model, and I also read https://www.tensorflow.org/serving/serving_basic and every thing I've tried failed...
Since I'm just a beginner explanations are welcomed, but I'm not sure I'll know how to code it so can you please post the code answer too.
(Python 3.6 btw)
Thanks!
If you take a look at how the original code does the training and testing steps, specifically how they set up their train_dict and test_dict, you see that they feed values to each of the tensors defined as placeholder in the graph. Basically placeholders need to be given some value if they are going to be used in whatever calculation you are asking your network to do. Since you are looking for predictions from the network, you probably do not need to provide an expected output, but you will need to give it input data x_data, and a value for dropout_keep_prob. This should be dropout_keep_prob=1.0 for prediction.
You also want a prediction, not the loss of the network. The loss is basically a measure of how far your network's output is from what you expect, but since you are trying to predict something for new data you really just want to see what the network says it is. You can do this using the logits_out op directly, or we can add an op that converts your logits into a probability distribution over your classes. Either way you can look at the distribution to get an idea of how likely the network thinks your data falls into each category, or you can take the max value of this vector to just output the network's best guess.
So you might try something like:
prediction = tf.nn.softmax(logits_out)
feed_dict = {x_data: your_input_data, dropout_keep_prob: 1.0}
pred = sess.run(prediction, feed_dict)
best_guess = np.argmax(pred) # highest-rated class

Training LSTM network and predicting with different starting points

This is a simple example of using LSTM cell from tensor flow. I am generating a sin wave and training my network for ten periods and I'm trying to predict the eleventh period. The predictor values X are one epoch lag of the true y. After training, I save the session to the disk and I restore it at prediction time - this is typical of training and deploying models to production.
When I predict the last period, y_predicted is matching very well the true y.
If I try to predict the sin wave using an arbitrary starting point, (i.e. uncomment line 114)
test_data = test_data[16:]
such that the true values of y would be shifted by a quarter period, it seems like the LSTM prediction still starts at zero and it takes a couple of epochs to catch up with the true values, eventually matching the previous prediction. As a matter of fact it seems that the prediction in the second case is still a full sin wave instead of the 3/4 wave.
What is the reason why this is happening. If I implement a regressor I would like to use it starting with any point.
https://github.com/fbora/mytensorflow/issues/1
import os
import pandas as pd
import numpy as np
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
def sin_signal():
'''
generate a sin function
the train set is ten periods in length
the test set is one additional period
the return variable is in pandas format for easy plotting
'''
phase = np.arange(0, 2*np.pi*11, 0.1)
y = np.sin(phase)
data = pd.DataFrame.from_dict({'phase': phase, 'y':y})
# fill the last element by 0 - it's the end of the period anyways
data['X'] = data.y.shift(-1).fillna(0.0)
train_data = data[data.phase<=2*np.pi*10].copy()
test_data = data[data.phase>2*np.pi*10].copy()
return train_data, test_data
class lstm_model():
def __init__(self, size_x, size_y, num_units=32, num_layers=3, keep_prob=0.5):
# def single_unit():
# return rnn.DropoutWrapper(
# rnn.LSTMCell(num_units), output_keep_prob=keep_prob)
def single_unit():
return rnn.LSTMCell(num_units)
self.graph = tf.Graph()
with self.graph.as_default():
'''input place holders'''
self.X = tf.placeholder(tf.float32, [None, size_x], name='X')
self.y = tf.placeholder(tf.float32, [None, size_y], name='y')
'''network'''
cell = rnn.MultiRNNCell([single_unit() for _ in range(num_layers)])
X = tf.expand_dims(self.X, -1)
val, state = tf.nn.dynamic_rnn(cell, X, time_major=True, dtype=tf.float32)
val = tf.transpose(val, [1, 0, 2])
last = tf.gather(val, int(val.get_shape()[0])-1)
weights = tf.Variable(tf.truncated_normal([num_units, size_y], 0.0, 1.0), name='weights')
bias = tf.Variable(tf.zeros(size_y), name='bias')
predicted_y = tf.nn.xw_plus_b(last, weights, bias, name='predicted_y')
'''optimizer'''
optimizer = tf.train.AdamOptimizer(name='adam_optimizer')
global_step = tf.Variable(0, trainable=False, name='global_step')
self.loss = tf.reduce_mean(tf.squared_difference(predicted_y, self.y), name='mse_loss')
self.train_op = optimizer.minimize(self.loss, global_step=global_step, name='training_op')
'''initializer'''
self.init_op = tf.global_variables_initializer()
class lstm_regressor():
def __init__(self):
if not os.path.isdir('./check_pts'):
os.mkdir('./check_pts')
#staticmethod
def get_shape(dataframe):
df_shape = dataframe.shape
num_rows = df_shape[0]
num_cols = 1 if len(df_shape)<2 else df_shape[1]
return num_rows, num_cols
def train(self, X_train, y_train, iterations):
train_pts, size_x = lstm_regressor.get_shape(X_train)
train_pts, size_y = lstm_regressor.get_shape(y_train)
model = lstm_model(size_x=size_x, size_y=size_y, num_units=32, num_layers=1)
with tf.Session(graph=model.graph) as sess:
sess.run(model.init_op)
saver = tf.train.Saver()
feed_dict={
model.X: X_train.values.reshape(-1, size_x),
model.y: y_train.values.reshape(-1, size_y)
}
for step in range(iterations):
_, loss = sess.run([model.train_op, model.loss], feed_dict=feed_dict)
if step%100==0:
print('step={}, loss={}'.format(step, loss))
saver.save(sess, './check_pts/lstm')
def predict(self, X_test):
test_pts, size_x = lstm_regressor.get_shape(X_test)
X_np = X_test.values.reshape(-1, size_x)
graph = tf.Graph()
with graph.as_default():
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver = tf.train.import_meta_graph('./check_pts/lstm.meta')
saver.restore(sess, './check_pts/lstm')
X = graph.get_tensor_by_name('X:0')
y_tf = graph.get_tensor_by_name('predicted_y:0')
y_np = sess.run(y_tf, feed_dict={X: X_np})
return y_np.reshape(test_pts)
def main():
train_data, test_data = sin_signal()
regressor = lstm_regressor()
regressor.train(train_data.X, train_data.y, iterations=1000)
# test_data = test_data[16:]
y_predicted = regressor.predict(test_data.X)
test_data['y_predicted'] = y_predicted
test_data[['y', 'y_predicted']].plot()
if __name__ == '__main__':
main()
I suspect that since you are starting your predictions at an arbitrary starting point in the future, there is a gap of values between what your model was trained on and what it is starting to see for predictions, and the State of your LSTM has not updated with the values in that gap?
*** UPDATE:
In your code, you have this:
val, state = tf.nn.dynamic_rnn(cell, X, time_major=True, dtype=tf.float32)
and then during training this:
_, loss = sess.run([model.train_op, model.loss], feed_dict=feed_dict)
I would suggest feeding the initial State into dynamic_rnn and re-feeding the updated state at each training iteration, something like this:
inState = tf.placeholder(tf.float32, [YOUR_DIMENSIONS], name='inState')
val, state = tf.nn.dynamic_rnn(cell, X, time_major=True, dtype=tf.float32, initial_state=inState)
And during training:
iState = np.zeros([YOUR_DIMENSIONS])
feed_dict={
model.X: X_train.values.reshape(-1, size_x),
model.y: y_train.values.reshape(-1, size_y),
inState: iState # feed initial value for state placeholder
}
_, loss, oState = sess.run([model.train_op, model.loss, model.state], feed_dict=feed_dict) # run one additional variable from the session
iState = oState # assign latest out-state to be re-fed as in-state
So, this way your model not only learns the parameters during training, but also keeps track of everything that it's seen during training in the State. NOW, you save this State with the rest of your session and use it during the prediction stage.
The small difficulty with this is that technically this State is a placeholder, so it won't be saved in the Graph automatically in my experience. So you create another variable manually at the end of training and assign the State to it; this way it is saved in the graph for later:
# make sure this variable is declared BEFORE the saver is declared
savedState = tf.get_variable('savedState', shape=[YOUR_DIMENSIONS])
# then, at the end of training:
assignOp = tf.assign(savedState, oState)
sess.run(assignOp)
# now save your graph
So now once you restore the Graph, if you want to start your predictions after some artificial gap, then somehow you still have to run your model through this gap so as to update the state. In my case, I just run one dummy prediction for the whole gap, just so as to update the state, and then you continue at your normal intervals from here.
Hope this helps...

Getting very high values in linear regression

I am trying to make a simple MLP to predict values of a pixel of an image - original blog .
Here's my earlier attempt using Keras in python - link
I've tried to do the same in tensorflow, but I am getting very large output values (~10^12) when they should be less than 1.
Here's my code:
import numpy as np
import cv2
from random import shuffle
import tensorflow as tf
'''
Image preprocessing
'''
image_file = cv2.imread("Mona Lisa.jpg")
h = image_file.shape[0]
w = image_file.shape[1]
preX = []
preY = []
for i in xrange(h):
for j in xrange(w):
preX.append([i,j])
preY.append(image_file[i,j,:].astype('float32')/255.0)
print preX[:5], preY[:5]
zipped = [i for i in zip(preX,preY)]
shuffle(zipped)
X_train = np.array([i for (i,j) in zipped]).astype('float32')
Y_train = np.array([j for (i,j) in zipped]).astype('float32')
print X_train[:10], Y_train[:10]
'''
Tensorflow code
'''
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
x = tf.placeholder(tf.float32, shape=[None,2])
y = tf.placeholder(tf.float32, shape=[None,3])
'''
Layers
'''
w1 = weight_variable([2,300])
b1 = bias_variable([300])
L1 = tf.nn.relu(tf.matmul(X_train,w1)+b1)
w2 = weight_variable([300,3])
b2 = bias_variable([3])
y_model = tf.matmul(L1,w2)+b2
'''
Training
'''
# criterion
MSE = tf.reduce_mean(tf.square(tf.sub(y,y_model)))
# trainer
train_op = tf.train.GradientDescentOptimizer(learning_rate = 0.01).minimize(MSE)
nb_epochs = 10
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
cost = 0
for i in range(nb_epochs):
sess.run(train_op, feed_dict ={x: X_train, y: Y_train})
cost += sess.run(MSE, feed_dict ={x: X_train, y: Y_train})
cost /= nb_epochs
print cost
'''
Prediction
'''
pred = sess.run(y_model,feed_dict = {x:X_train})*255.0
print pred[:10]
output_image = []
index = 0
h = image_file.shape[0]
w = image_file.shape[1]
for i in xrange(h):
row = []
for j in xrange(w):
row.append(pred[index])
index += 1
row = np.array(row)
output_image.append(row)
output_image = np.array(output_image)
output_image = output_image.astype('uint8')
cv2.imwrite('out_mona_300x3_tf.png',output_image)
First of all, I think that instead of running the train_op and then the MSE
you can run both ops in a list and reduce your computational cost significantly.
for i in range(nb_epochs):
cost += sess.run([MSE, train_op], feed_dict ={x: X_train, y: Y_train})
Secondly, I suggest always writing out your cost function so you can see what is going on during the training phase. Either manually print it out or use tensorboard to log your cost and plot it (you can find examples on the official tf page).
You can also monitor your weights to see that they aren't blowing up.
A few things you can try:
Reduce learning rate, add regularization to weights.
Check that your training set (pixels) really consist of the values that
you expect them to.
You give the input layer weights and the output layer weights the same names w and b, so it seems something goes wrong in the gradient-descent procedure. Actually I'm surprised tensorflow doesn't issue an error or at leas a warning (or am I missing something?)

Categories

Resources