Tensorflow graph run speed keeps decreasing each iteration

Tensorflow graph run speed keeps decreasing each iteration - python

I have a very simply tensorflow setup but one aspect of it (calculating the accuracy) keeps increasing in how long it takes to run. I'm confused about why this is. I've simplified down the code as much as I can while stile keeping the error. Here is the code
import time
import tensorflow as tf
import numpy as np
# dummy data
data = np.zeros((12, 784))
labels = np.zeros((12, 10))
xs = tf.placeholder(tf.float32, [12, 784])
ys = tf.placeholder(tf.float32, [12, 10])
weights = tf.Variable(tf.truncated_normal([784, 10], stddev=0.1))
prediction = tf.nn.softmax(tf.matmul(xs, weights))
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
while True:
y_pre = sess.run(prediction, feed_dict={xs: data})
correct_prediction = tf.equal(tf.argmax(y_pre, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
start_time = time.time()
r = sess.run(accuracy, feed_dict={xs: data, ys: labels})
time_taken = time.time() - start_time
#why does time_taken keep growing?
print("time_taken", time_taken)
I suspect it's something I'm doing wrong in the while True loop. For my experience time_taken starts of low around 0.01 but then seemingly indefinitely grows to 0.30 and beyond if you leave it long enough. Is there some way to keep the time_taken constant? Any help would be appreciated thanks.

Can you take a look at your RAM during execution?

Related

Loss for classification using MNIST data stays around the same through every epoch

import numpy as np
import tensorflow as tf
import pandas as pd
data = pd.read_csv('mnist_train.csv')
X = data.drop('label', axis=1).values
y = data['label'].values
with tf.Session() as sess:
Y = tf.one_hot(y, 10).eval()
hidden = [5, 4, 3]
def costa(y, yhat):
loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=yhat, labels=y)
loss = tf.reduce_sum(loss)
return loss
def train(cost):
train_op = tf.train.GradientDescentOptimizer(0.0001).minimize(cost)
return train_op
with tf.Graph().as_default():
X1 = tf.placeholder(tf.float32, [None, 784])
y1 = tf.placeholder(tf.float32, [None, 10])
w1 = tf.Variable(tf.random_normal((784, hidden[0])))
w2 = tf.Variable(tf.random_normal((hidden[0], hidden[1])))
w3 = tf.Variable(tf.random_normal((hidden[1], hidden[2])))
wo = tf.Variable(tf.random_normal((hidden[2], 10)))
b1 = tf.Variable(tf.random_normal((1, hidden[0])))
b2 = tf.Variable(tf.random_normal((1, hidden[1])))
b3 = tf.Variable(tf.random_normal((1, hidden[2])))
bo = tf.Variable(tf.random_normal((1, 10)))
layer1 = tf.nn.relu(tf.matmul(X1, w1) + b1)
layer2 = tf.nn.relu(tf.matmul(layer1, w2) + b2)
layer3 = tf.nn.relu(tf.matmul(layer2, w3) + b3)
layerout = (tf.matmul(layer3, wo) + bo)
yhat = layerout
cost = costa(y1, yhat)
train_op = train(cost)
init_op = tf.global_variables_initializer()
for epoch in range(1000):
with tf.Session() as sess:
sess.run(init_op)
sess.run(train_op, feed_dict={X1:X, y1:Y})
loss = sess.run(cost, feed_dict={X1:X, y1:Y})
print("Loss for epoch {}: {}".format(epoch, loss))
The loss stays around the same, jumps up and down a lot, but does not decrease accordingly.
I can't seem to find what is going wrong here, any help would be appeciated.
Is it the activations to the layers or am I getting the cost function wrong?

There are a couple of issues here:
You are running sess.run(init_op) every epoch. This means that the model parameters are being reset to random numbers every epoch, and therefore will be unable to learn. Try putting this op before for epoch in range(1000)
You are creating a new session every epoch. Change your code so it looks like this:
with tf.Session() as sess:
sess.run(init_op)
for epoch in range(1000):
sess.run(train_op, feed_dict={X1:X, y1:Y})
loss = sess.run(cost, feed_dict={X1:X, y1:Y})
print("Loss for epoch {}: {}".format(epoch, loss))

Initialising weights with a standard deviation of (2.0/neurons_in_prev_layer)**0.5 worked like a charm for me!
Also changed the hidden layers to, 2 hidden layers of 256, 256 neurons.

Okay one little tweak did the trick, I used RMSPropOptimizer instead and the loss started decreasing as expected.
I still have to figure out as to why this works, I’m still learning, but for now this is the solution I have.
Although the loss decreases very slowly.

Tensorflow: Why my code is running slower and slower?

I am new to tensorflow. The following code can run successfully, without any error. In the first 10 lines of output, the computation is fast, and the output (defined in the last line) flies line by line. However, as the iteration goes up, the computation become slower and slower, and finally become intolerable. So I wonder whether there are any modifications that can speed this up.
Here is a brief description of this code:
This code apply the single hidden-layer neural network to the dataset. It aims to find the best parameter for rate[0] and rate[1], which are parameters that will effect the loss function. During each step of training, one tuple is fed to the model, and the accuracy of the tuple is immediately evaluated (this kind of data comes as a stream in real world).
import tensorflow as tf
import numpy as np
n_hidden=50
n_input=37
n_output=2
data_raw=np.genfromtxt(r'data.csv',delimiter=",",dtype=None)
data_info=np.genfromtxt(r'data2.csv',delimiter=",",dtype=None)
def pre_process( tuple):
ans = []
temp = [0 for i in range(24)]
temp[int(tuple[0])] = 1
# np.append(ans,np.array(temp))
ans.extend(temp)
temp = [0 for i in range(7)]
temp[int(tuple[1]) - 1] = 1
ans.extend(temp)
# np.append(ans,np.array(temp))
temp = [0 for i in range(3)]
temp[int(tuple[3])] = 1
ans.extend(temp)
temp = [0 for i in range(2)]
temp[int(tuple[4])] = 1
ans.extend(temp)
ans.extend([int(tuple[5])])
return np.array(ans)
x=tf.placeholder(tf.float32, shape=[1,n_input])
y_=tf.placeholder(tf.float32,shape=[n_output])
y_r=tf.placeholder(tf.float32,shape=[n_output])
W1=tf.Variable(tf.random_uniform([n_input, n_hidden]))
b1=tf.Variable(tf.zeros([n_hidden]))
W2=tf.Variable(tf.zeros([n_hidden,n_output]))
b2=tf.Variable(tf.zeros([n_output]))
logits_1 = tf.matmul(x, W1) + b1
relu_layer= tf.nn.relu(logits_1)
logits_2 = tf.matmul(relu_layer, W2) + b2
correct_prediction = tf.equal(tf.argmax(logits_2,1), tf.argmax(y_,0))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
rate=[0,0]
for i in range(-100,200,10):
rate[0]=i;
for j in range(-100,i,10):
rate[1]=j
loss=tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=logits_2)*[rate[0],rate[1]])
# loss2=tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels=y_r, logits=logits_2)*[rate[2],rate[3]])
# loss=loss1+loss2
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
data_line=1
accur=0
local_local=0
remote_remote=0
local_remote=0
remote_local=0
total=0
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(200):
# print(int(data_raw[data_line][0]),data_info[i][0])
if i>100:
total+=1
if int(data_raw[data_line][0])==data_info[i][0]:
sess.run(train_step,feed_dict={x:pre_process(data_info[i]).reshape(1,-1),y_:[1,0],y_r:[0,1]})
# print(sess.run(logits_2,{x:pre_process(data_info[i]).reshape(1,-1), y_: #[1,0]}))
data_line+=1;
if data_line==len(data_raw):
break
if i>100:
acc=accuracy.eval(feed_dict={x: pre_process(data_info[i]).reshape(1,-1), y_: [1,0], y_r:[0,1]})
local_local+=acc
local_remote+=1-acc
accur+=acc
else:
sess.run(train_step,feed_dict={x:pre_process(data_info[i]).reshape(1,-1),y_:[0,1], y_r:[1,0]})
# print(sess.run(logits_2,{x: pre_process(data_info[i]).reshape(1,-1), y_: #[0,1]}))
if i>100:
acc=accuracy.eval(feed_dict={x: pre_process(data_info[i]).reshape(1,-1), y_: [0,1], y_r:[1,0]})
remote_remote+=acc
remote_local+=1-acc
accur+=acc
print("correctness: (%.3d,%.3d): \t%.2f %.2f %.2f %.2f %.2f" % (rate[0],rate[1],accur/total,local_local/total,local_remote/total,remote_local/total,remote_remote/total))

Though GPhilo's answer addresses the issue why running the code is getting slower and slower, but in reality, that solution will result in creation of computation graph again and again which is not good.
The following two lines of code, (GPhilo has also mentioned) are continuously adding operations to your graph for each iteration.
loss=tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits( \
labels=y_, logits=logits_2)*[rate[0],rate[1]])
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
As I can see, you are having two values rate[0], rate[1] which needs to be supplied to your graph. Why are you not supplying these two values through placeholder and define your graph only once. Once you start running Session you shouldn't add more operations in your graph. Also, you shouldn't be considering initializing your Session for iteration.
Check this modified code (only important parts)
# To clear previously created graph (if any) present in memory.
tf.reset_default_graph()
x=tf.placeholder(tf.float32, shape=[1,n_input])
y_=tf.placeholder(tf.float32,shape=[n_output])
y_r=tf.placeholder(tf.float32,shape=[n_output])
# Add these two placeholders (Assuming they are single float value)
rate0 = tf.placeholder(tf.float32, shape = [])
rate1 = tf.placeholder(tf.float32, shape = [])
W1=tf.Variable(tf.random_uniform([n_input, n_hidden]))
....
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Bring this code outside from loop (Note replacement of rate[0] with placeholder)
loss=tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels=y_, \
logits=logits_2) * [rate0, rate1])
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
# Instantiate session only once.
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# Move the subsequent looping code inside.
rate=[0,0]
for i in range(-100,200,10):
rate[0]=i;
After this modification, whenever your Session runs train_step, you need to supply these two extra placeholders in your feed_dict.
Ex:
sess.run(train_step,feed_dict={x:pre_process(data_info[i]).reshape(1,-1),
y_:[1,0],y_r:[0,1], rate0: rate[0], rate1: rate[1]})
In this way, you will not be creating graph for every iteration and in fact this code will be faster than GPhilo's solution.

Every time you run train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss) you're adding (quite some) operations to your graph, which becomes bigger and bigger with more loops of your program. The bigger the graph, the slower the execution.
Put your model definition in the loops' body and call tf.reset_default_graph() each time you start a new iteration:
rate=[0,0]
for i in range(-100,200,10):
rate[0]=i;
for j in range(-100,i,10):
tf.reset_default_graph()
x=tf.placeholder(tf.float32, shape=[1,n_input])
y_=tf.placeholder(tf.float32,shape=[n_output])
y_r=tf.placeholder(tf.float32,shape=[n_output])
W1=tf.Variable(tf.random_uniform([n_input, n_hidden]))
b1=tf.Variable(tf.zeros([n_hidden]))
W2=tf.Variable(tf.zeros([n_hidden,n_output]))
b2=tf.Variable(tf.zeros([n_output]))
logits_1 = tf.matmul(x, W1) + b1
relu_layer= tf.nn.relu(logits_1)
logits_2 = tf.matmul(relu_layer, W2) + b2
correct_prediction = tf.equal(tf.argmax(logits_2,1), tf.argmax(y_,0))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
rate[1]=j
#...

tensorflow mnist example with my own get_next_minibatch

I just started using tensorflow and I followed the tutorial example on MNIST dataset. It went well, I got like around 90% accuracy.
But after I replace the next_batch with my own version, the result was way worse than it used to be, usually 50%.
Instead of using the data Tensorflow downloaded and parsed, I download the dataset from this website. Using numpy to get what I want.
df = pd.read_csv('mnist_train.csv', header=None)
X = df.drop(0,1)
Y = df[0]
temp = np.zeros((Y.size, Y.max()+1))
temp[np.arange(Y.size),Y] = 1
np.save('X',X)
np.save('Y',temp)
do the same thing to the test data, then following the tutorial, nothing is changed
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
X = np.load('X.npy')
Y = np.load('Y.npy')
X_test = np.load('X_test.npy')
Y_test = np.load('Y_test.npy')
BATCHES = 1000
W = tf.Variable(tf.truncated_normal([784,10], stddev=0.1))
# W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
right here is my own get_mini_batch, I shuffle the original data's index, then every time I get 100 data out of it, which seems to be like the exact same thing example code does. The only difference is data I throw away some of the data in the tail.
pos = 0
idx = np.arange(X.shape[0])
np.random.shuffle(idx)
for _ in range(1000):
batch_xs, batch_ys = X[idx[range(pos,pos+BATCHES)],:], Y[idx[range(pos,pos+BATCHES)],]
if pos+BATCHES >= X.shape[0]:
pos = 0
idx = np.arange(X.shape[0])
np.random.shuffle(idx)
pos += BATCHES
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
print(sess.run(accuracy, feed_dict={x: X_test, y_: Y_test}))
It confuses me why my version is way worse than the tutorial one.

Like lejilot said, we should normalize the data before we push it into the neural network.
See this post

Tensorflow and threading

Below is the simple mnist tutorial (i.e. single layer softmax) from the Tensorflow website, which I tried to extend with a multi-threaded training step:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import threading
# Training loop executed in each thread
def training_func():
while True:
batch = mnist.train.next_batch(100)
global_step_val,_ = sess.run([global_step, train_step], feed_dict={x: batch[0], y_: batch[1]})
print("global step: %d" % global_step_val)
if global_step_val >= 4000:
break
# create session and graph
sess = tf.Session()
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
global_step = tf.Variable(0, name="global_step")
y = tf.matmul(x,W) + b
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
inc = global_step.assign_add(1)
with tf.control_dependencies([inc]):
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# initialize graph and create mnist loader
sess.run(tf.global_variables_initializer())
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
# create workers and execute threads
workers = []
for _ in range(8):
t = threading.Thread(target=training_func)
t.start()
workers.append(t)
for t in workers:
t.join()
# evaluate accuracy of the model
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels},
session=sess))
I must be missing something, as 8 threads as below yield inconsistent results (accuracy approx. = 0.1), when with 1 thread only the expected accuracy is obtained (approx. 0.92). Does anybody have a clue about my mistake(s)? Thanks!

Note that unfortunately, threadingwith python doesn't create real parallelism because of the GIL. So what happens here is that you will have multiple threads which are all running on the same CPU where in reality they are running sequentially. Therefore, I would suggest using Coordinator in Tensorflow. More information about Coordinator can be found here:
https://www.tensorflow.org/programmers_guide/threading_and_queues
https://www.tensorflow.org/programmers_guide/reading_data
Finally, I would suggest you say:
with tf.device('/cpu:0'):
your code should go here... 'for the first thread'
Then use another cpu for the other thread and so on...
Hope this answer finds you well!!

Tensorflow - Testing a mnist neural net with my own images

I'm trying to write a script that will allow me to draw an image of a digit and then determine what digit it is with a model trained on MNIST.
Here is my code:
import random
import image
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
import scipy.ndimage
mnist = input_data.read_data_sets( "MNIST_data/", one_hot=True )
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize (cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range( 1000 ):
batch_xs, batch_ys = mnist.train.next_batch( 1000 )
sess.run(train_step, feed_dict= {x: batch_xs, y_: batch_ys})
print ("done with training")
data = np.ndarray.flatten(scipy.ndimage.imread("im_01.jpg", flatten=True))
result = sess.run(tf.argmax(y,1), feed_dict={x: [data]})
print (' '.join(map(str, result)))
For some reason the results are always wrong but has a 92% accuracy when I use the standard testing method.
I think the problem might be how I encoded the image:
data = np.ndarray.flatten(scipy.ndimage.imread("im_01.jpg", flatten=True))
I tried looking in the tensorflow code for the next_batch() function to see how they did it, but I have no idea how I can compare against my approach.
The problem might be somewhere else too.
Any help to make the accuracy 80+% would be greatly appreciated.

I found my mistake: it encoded the reverse, blacks were at 255 instead of 0.
data = np.vectorize(lambda x: 255 - x)(np.ndarray.flatten(scipy.ndimage.imread("im_01.jpg", flatten=True)))
Fixed it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tensorflow graph run speed keeps decreasing each iteration - python

Can you take a look at your RAM during execution?

Related

Loss for classification using MNIST data stays around the same through every epoch

Tensorflow: Why my code is running slower and slower?

tensorflow mnist example with my own get_next_minibatch

Tensorflow and threading

Tensorflow - Testing a mnist neural net with my own images

Categories

Resources