TensorFlow: parameters do not update when training - python

I'm implementing a classification model using TensorFlow
The problem that I'm facing is that my weights and error are not being updated when I run the training step. As a result, my network keeps returning the same results.
I've developed my model based on the MNIST example from the TensorFlow website.
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
#load dataset
dataset = np.loadtxt('char8k.txt', dtype='float', comments='#', delimiter=",")
Y = np.asmatrix( dataset[:,0] )
X = np.asmatrix( dataset[:,1:1201] )
m = 11527
labels = 26
# y is update to 11527x26
Yt = np.zeros((m,labels))
for i in range(0,m):
index = Y[0,i] - 1
Yt[i,index]= 1
Y = Yt
Y = np.asmatrix(Y)
#graph settings
x = tf.placeholder(tf.float32, shape=[None, 1200])
y_ = tf.placeholder(tf.float32, shape=[None, 26])
Wtest = tf.Variable(tf.truncated_normal([1200,26], stddev=0.001))
W = tf.Variable(tf.truncated_normal([1200,26], stddev=0.001))
b = tf.Variable(tf.zeros([26]))
y = tf.nn.softmax(tf.matmul(x,W) + b)
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
Wtest = W
for i in range(10):
Xbatch = X[np.random.randint(X.shape[0],size=100),:]
Ybatch = Y[np.random.randint(Y.shape[0],size=100),:]
train_step.run(feed_dict={x: Xbatch, y_: Ybatch})
print("atualizacao de pesos")
print(Wtest==W)#monitora atualizaƧao dos pesos
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print accuracy.eval(feed_dict={x: X, y_: Y})
print(" ")
print(" ")

The issue probably arises from how you initialize the weight matrix, W. If it is initialized to all zeroes, all of the neurons will follow the same gradient in each step, which leads to the network not training. Replacing the line
W = tf.Variable(tf.zeros([1200,26]))
...with something like
W = tf.Variable(tf.truncated_normal([1200,26], stddev=0.001))
...should cause it to start training.
This question on the CrossValidated site has a good explanation of why you should not initialize all of your weights to zero.


tensorflow does not train RNN variables

For a complex text classification task I am training an RNN model. Somehow the weights of the RNN only change at the beginning and then with very tiny steps (gradients are in the range of e^-7):
For those of you who are not familiar with tensorboard: This shows the distribution of values for the bias and the weights of the RNN (x axis is the value, y axis the number of values and z the training iterations).
I constructed a toy example which does not make sense but reproduces the same behaviour:
import numpy as np
import tensorflow as tf
tensorboard_save_path = "../RNN/tensorboard/supersimple/"
x = np.random.normal(size=(33, 20, 5000))
y = np.array([1 if i>0.5 else 0 for i in np.random.random(33)])
##### NETWORK #########
with tf.name_scope("RNN"):
rnn_cell = tf.contrib.rnn.BasicRNNCell(1)
outputs, states = tf.nn.dynamic_rnn(rnn_cell, x, dtype=tf.float64)
rnn_weights, rnn_biases = rnn_cell.variables
tf.summary.histogram("RNN weights", rnn_weights)
tf.summary.histogram("RNN biases", rnn_biases)
pred = tf.sigmoid(outputs[:,-1])
with tf.name_scope("cost"):
cost = tf.losses.mean_squared_error(predictions=pred, labels=np.reshape(y, (33,1)))
with tf.name_scope("train"):
optimizer = tf.train.AdagradOptimizer(learning_rate=0.5).minimize(cost)
init = tf.global_variables_initializer()
merged_summary = tf.summary.merge_all()
writer = tf.summary.FileWriter(tensorboard_save_path)
print("\ttensorboard --logdir=" + tensorboard_save_path)
sess = tf.Session()
for i in range(1000):
if i % 50 == 0:
c, s = sess.run([cost, merged_summary])
writer.add_summary(s, i)
print("cost is %f" % c)
Expected behaviour would be that the model overfits due to the huge amount of variables in contrast to the few training samples. Any idea what's going wrong here?

it is hard to using tf.train.batch for np.array data

import tensorflow as tf
import numpy as np
xy = np.loadtxt('data-01-test-score.csv', delimiter = ',', dtype = np.float32)
# numpy array data is made for tensor
x_imp_np = xy[:,:-1]
y_imp_np = xy[:,[-1]]
x_imp_ten = tf.constant(x_imp_np)
y_imp_ten = tf.constant(y_imp_np)
# make batches for data
x_batch, y_batch = tf.train.batch([x_imp_ten, y_imp_ten], batch_size = 10)
x = tf.placeholder(tf.float32, shape = [None,3])
y = tf.placeholder(tf.float32, shape = [None,1])
w = tf.Variable(tf.random_normal([3,1]), name = 'weight')
b = tf.Variable(tf.random_normal([1]), name = 'bias')
hypothesis = tf.matmul(x,w) + b
cost = tf.reduce_mean(tf.square(hypothesis - y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate = 1e-5)
train = optimizer.minimize(cost)
sess = tf.Session()
x_data, y_data = sess.run([x_imp_ten,y_imp_ten])
for step in range(2001):
x_batch_tr, y_batch_tr = sess.run([x_batch,y_batch])
_, cost_val , hypothesis_val = sess.run([train, cost,hypothesis], feed_dict= {x: x_data, y: y_data})
if step % 10 == 0:
print(step, cost_val)
above code is just simple linear regression problem which is from sung kim's lecture. i have a problem about tf.train.batch. when queue was used, it operates well. however if i didn't uses the queue it doesn.t working. is there any method for not using queue data loading?
in here, it takes so much lots of time that it almostly useless..
i just want to use tf.train.batch for that numpy array by using just simple array slicing.

tensorflow mnist example with my own get_next_minibatch

I just started using tensorflow and I followed the tutorial example on MNIST dataset. It went well, I got like around 90% accuracy.
But after I replace the next_batch with my own version, the result was way worse than it used to be, usually 50%.
Instead of using the data Tensorflow downloaded and parsed, I download the dataset from this website. Using numpy to get what I want.
df = pd.read_csv('mnist_train.csv', header=None)
X = df.drop(0,1)
Y = df[0]
temp = np.zeros((Y.size, Y.max()+1))
temp[np.arange(Y.size),Y] = 1
do the same thing to the test data, then following the tutorial, nothing is changed
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
X = np.load('X.npy')
Y = np.load('Y.npy')
X_test = np.load('X_test.npy')
Y_test = np.load('Y_test.npy')
BATCHES = 1000
W = tf.Variable(tf.truncated_normal([784,10], stddev=0.1))
# W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess = tf.InteractiveSession()
right here is my own get_mini_batch, I shuffle the original data's index, then every time I get 100 data out of it, which seems to be like the exact same thing example code does. The only difference is data I throw away some of the data in the tail.
pos = 0
idx = np.arange(X.shape[0])
for _ in range(1000):
batch_xs, batch_ys = X[idx[range(pos,pos+BATCHES)],:], Y[idx[range(pos,pos+BATCHES)],]
if pos+BATCHES >= X.shape[0]:
pos = 0
idx = np.arange(X.shape[0])
pos += BATCHES
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
print(sess.run(accuracy, feed_dict={x: X_test, y_: Y_test}))
It confuses me why my version is way worse than the tutorial one.
Like lejilot said, we should normalize the data before we push it into the neural network.
See this post

Why is my simple neural network not learning?

I am new to TensorFlow and neural networks. I am trying to build a neural network that can classify images in the CIFAR-10 dataset.
Here is my code:
import tensorflow as tf
import pickle
import numpy as np
import random
image_size= 32*32*3 # because 3 channels
n_classes = 10
lay1_size = 50
batch_size = 100
def unpickle(filename):
with open(filename,'rb') as f:
data = pickle.load(f, encoding='latin1')
x = data['data']
y = data['labels']
# shuffle the data
z = list(zip(x,y))
x, y = zip(*z)
x = x[:batch_size]
y = y[:batch_size]
# covert decimals to one hot arrays
y = np.eye(n_classes)[[y]]
return x, y
# set up network
def add_layer(inputs, in_size, out_size, activation_function=None):
W = tf.Variable(tf.random_normal([in_size, out_size]), dtype=tf.float32)
b = tf.Variable(tf.zeros([1,out_size]) + 0.1, dtype=tf.float32)
Wx_plus_b = tf.matmul(inputs, W) + b
if activation_function is None:
output = Wx_plus_b
output = activation_function(Wx_plus_b)
return output
def compute_accuracy(v_xs, v_ys):
global prediction
y_pre = sess.run(prediction, feed_dict={xs:v_xs})
correct_prediction = tf.equal(tf.argmax(y_pre,1), tf.argmax(v_ys, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
result = sess.run(accuracy, feed_dict={xs:v_xs, ys:v_ys})
return result
xs = tf.placeholder(tf.float32, [None,image_size])
ys = tf.placeholder(tf.float32)
lay1 = add_layer(xs, image_size, lay1_size, activation_function=tf.nn.tanh)
lay2 = add_layer(lay1, lay1_size, lay1_size, activation_function=tf.nn.tanh)
prediction = add_layer(lay2, lay1_size, n_classes, activation_function=tf.nn.softmax)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(ys*tf.log(prediction), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
#train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
# run network
sess = tf.Session()
x_test, y_test = unpickle('test_batch')
for i in range(1000):
x_train, y_train = unpickle('data_batch_1')
sess.run(train_step, feed_dict={xs:x_train,ys:y_train})
if i % 50 == 0:
print(compute_accuracy(x_test, y_test))
I am using two hidden layers with 50 nodes in each layer. I am running 1,000 cycles, where in each cycle I shuffle data in the dataset and pick the first 100 images of that shuffle to train on.
I am consistently getting ~0.1 accuracy, the machine is not learning at all.
When I modify the code to use the MNIST dataset instead of the CIFAR-10 dataset I get ~0.87 accuracy.
I took code from an MNIST tutorial and am trying to modify it to classify CIFAR-10 data.
I can't figure out what's wrong here. How do I get my algorithm to learn?

Tensorflow - Testing a mnist neural net with my own images

I'm trying to write a script that will allow me to draw an image of a digit and then determine what digit it is with a model trained on MNIST.
Here is my code:
import random
import image
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
import scipy.ndimage
mnist = input_data.read_data_sets( "MNIST_data/", one_hot=True )
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize (cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
for i in range( 1000 ):
batch_xs, batch_ys = mnist.train.next_batch( 1000 )
sess.run(train_step, feed_dict= {x: batch_xs, y_: batch_ys})
print ("done with training")
data = np.ndarray.flatten(scipy.ndimage.imread("im_01.jpg", flatten=True))
result = sess.run(tf.argmax(y,1), feed_dict={x: [data]})
print (' '.join(map(str, result)))
For some reason the results are always wrong but has a 92% accuracy when I use the standard testing method.
I think the problem might be how I encoded the image:
data = np.ndarray.flatten(scipy.ndimage.imread("im_01.jpg", flatten=True))
I tried looking in the tensorflow code for the next_batch() function to see how they did it, but I have no idea how I can compare against my approach.
The problem might be somewhere else too.
Any help to make the accuracy 80+% would be greatly appreciated.
I found my mistake: it encoded the reverse, blacks were at 255 instead of 0.
data = np.vectorize(lambda x: 255 - x)(np.ndarray.flatten(scipy.ndimage.imread("im_01.jpg", flatten=True)))
Fixed it.

