Can't get gradients from loaded model in tensorflow/keras

Can't get gradients from loaded model in tensorflow/keras - python

I have a pre-trained model that I load and it effectively works (i.e. I can make predictions). I want to get the gradients of the model for a certain parameter, however I cannot manage to get any meaningful results. Always a Noneoutput.
My code:
sess = tf.Session()
K.set_session(sess)
x = X_test[0].reshape(1,100)
y = np.reshape(Y_test[0], (1,1))
tf_y = tf.convert_to_tensor(y,dtype=np.float32)
model2 = ClassificationModel(config, logging).model
model2.load_weights("class_models/model.382-0.46-0.87.h5")
# predict real x_test
y_hat = model2.predict(x)
tf_y_hat = tf.convert_to_tensor(y_hat, dtype=np.float32)
loss = keras.losses.binary_crossentropy(tf_y,tf_y_hat)
grad, = K.gradients(loss,x)
print(grad)
And the output I get for the print is None. What am I doing wrong? How do I get the gradient given my model?

With your current code, tensorflow cannot connect x to the computational graph of loss since loss is created from a numpy array (y_hat) and x is also just a numpy array. The following code should work instead:
tf_x = tf.convert_to_tensor(x, dtype=np.float32)
loss = tf.keras.losses.binary_crossentropy(tf_y, model2(tf_x))
grad, = K.gradients(loss, tf_x)

Related

Efficient example implementation of GPU-training of a simple feed-forward NN in TensorFlow? Maybe with tf.data?

I just started using the GPU version of TensorFlow hoping that it would speed up the training of my feed-forward neural networks. I am able to train on my GPU (GTX1080ti), but unfortunately it is not notably faster than doing the same training on my CPU (i7-8700K) the current way I’ve implemented it. During training, the GPU appears to barely be utilized at all, which makes me suspect that the bottleneck in my implementation is how the data is copied from the host to the device using feed_dict.
I’ve heard that TensorFlow has something called the “tf.data” pipeline which is supposed to make it easier and faster to feed data to GPUs etc. However I have not been able to find any simple examples where this concept is implemented into multilayer perceptron training as a replacement for feed_dict.
Is anyone aware of such an example and can point me to it? Preferably as simple as possible since I’m new to TensorFlow in general. Or is there something else I should change in my current implementation to make it more efficient? I’m pasting the code I have here:
import tensorflow as tf
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
tf.reset_default_graph()
import time
# Function for iris dataset.
def get_iris_data():
iris = datasets.load_iris()
data = iris["data"]
target = iris["target"]
# Convert to one-hot vectors
num_labels = len(np.unique(target))
all_Y = np.eye(num_labels)[target]
return train_test_split(data, all_Y, test_size=0.33, random_state=89)
# Function which initializes tensorflow weights & biases for feed-forward NN.
def InitWeights(LayerSizes):
with tf.device('/gpu:0'):
# Make tf placeholders for network inputs and outputs.
X = tf.placeholder( shape = (None,LayerSizes[0]),
dtype = tf.float32,
name ='InputData')
y = tf.placeholder( shape = (None,LayerSizes[-1]),
dtype = tf.float32,
name ='OutputData')
# Initialize weights and biases.
W = {}; b = {};
for ii in range(len(LayerSizes)-1):
layername = f'layer%s' % ii
with tf.variable_scope(layername):
ny = LayerSizes[ii]
nx = LayerSizes[ii+1]
# Weights (initialized with xavier initializatiion).
W['Weights_'+layername] = tf.get_variable(
name = 'Weights_'+layername,
shape = (ny, nx),
initializer = tf.contrib.layers.xavier_initializer(),
dtype = tf.float32
)
# Bias (initialized with xavier initializatiion).
b['Bias_'+layername] = tf.get_variable(
name = 'Bias_'+layername,
shape = (nx),
initializer = tf.contrib.layers.xavier_initializer(),
dtype = tf.float32
)
return W, b, X, y
# Function for forward propagation of NN.
def FeedForward(X, W, b):
with tf.device('/gpu:0'):
# Initialize 'a' of first layer to the placeholder of the network input.
a = X
# Loop all layers of the network.
for ii in range(len(W)):
# Use name of each layer as index.
layername = f'layer%s' % ii
## Weighted sum: z = input*W + b
z = tf.add(tf.matmul(a, W['Weights_'+layername], name = 'WeightedSum_z_'+layername), b['Bias_'+layername])
## Passed through actication fcn: a = h(z)
if ii == len(W)-1:
a = z
else:
a = tf.nn.relu(z, name = 'activation_a_'+layername)
return a
if __name__ == "__main__":
# Import data
train_X, test_X, train_y, test_y = get_iris_data()
# Define network size [ninputs-by-256-by-outputs]
LayerSizes = [4, 256, 3]
# Initialize weights and biases.
W, b, X, y = InitWeights(LayerSizes)
# Define loss function to optimize.
yhat = FeedForward(X, W, b)
loss = tf.reduce_sum(tf.square(y - yhat),reduction_indices=[0])
# Define optimizer to use when minimizing loss function.
all_variables = tf.trainable_variables()
optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.0001)
train_op = optimizer.minimize(loss, var_list = all_variables)
# Start tf session and initialize variables.
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# Train 10000 minibatches and time how long it takes.
t0 = time.time()
for i in range(10000):
ObservationsToUse = np.random.choice(len(train_X), 32)
X_minibatch = train_X[ObservationsToUse,:]
y_minibatch = train_y[ObservationsToUse,:]
sess.run(train_op, feed_dict={X : X_minibatch, y : y_minibatch})
t1 = time.time()
print('Training took %0.2f seconds' %(t1-t0))
sess.close()

The speed might be low because:
You are creating placeholders. Using numpy, we insert the data in the
placeholders and thereby they are converted to tensors of the graph.
By using tf.data.Dataset, you can create a direct pipeline which makes the data directly flow into the graph without the need of placeholders. They are fast, scalable and have a number of functions to play around with.
with np.load("/var/data/training_data.npy") as data:
features = data["features"]
labels = data["labels"]
# Assume that each row of `features` corresponds to the same row as `labels`.
assert features.shape[0] == labels.shape[0]
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
Some useful functions :
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(32) # Creating batches
dataset = dataset.repeat(num_epochs) # repeat the dataset 'N' times
iterator = dataset.make_one_shot_iterator() # Create a iterator to retrieve batches of data
X, Y = iterator.get_next()
Here, 32 is the batch size.
In your case,
dataset = tf.data.Dataset.from_tensor_slices((data, targets))
Hence, there is no need of placeholders. Directly run,
session.run( train_op ) # no feed_dict!!

Tensorflow program give different answers after deployed on aws lambda

I have wrote a program with Tensorflow that identifies a number of figures in an image. The model is trained with a function and then used with another function to label the figures. The training have been done on my computer and the resulting model upload to aws with the solve function.
I my computer it works well, but when create a lambda in aws it works strange and start giving different answers with the same test data.
The model in the solve function is this:
# Recreate neural network from model file generated during training
# input
x = tf.placeholder(tf.float32, [None, size_of_image])
# weights
W = tf.Variable(tf.zeros([size_of_image, num_chars]))
# biases
b = tf.Variable(tf.zeros([num_chars]))
The solve function code to label the figures is this:
for testi in range(captcha_letters_num):
# load model from file
saver = tf.train.import_meta_graph(model_path + '.meta',
clear_devices=True)
saver.restore(sess, model_path)
# Data to label
test_x = np.asarray(char_imgs[testi], dtype=np.float32)
predict_op = model(test_x, W, b)
op = sess.run(predict_op, feed_dict={x: test_x})
# find max probability from the probability distribution returned by softmax
max_probability = op[0][0]
max_probability_index = -1
for i in range(num_chars):
if op[0][i] > max_probability:
max_probability = op[0][i]
max_probability_index = i
# append it to final output
final_text += char_map_list[max_probability_index]
# Reset the model so it can be used again
tf.reset_default_graph()
With the same test data it gives different answers, don't know why.

Solved!
What I finally do was to keep the Session outside the loop and initialize the variables. After ending the loop, reset the graph.
saver = tf.train.Saver()
sess = tf.Session()
# Initialize variables
sess.run(tf.global_variables_initializer())
.
.
.
# passing each of the 5 characters through the NNet
for testi in range(captcha_letters_num):
# Data to label
test_x = np.asarray(char_imgs[testi], dtype=np.float32)
predict_op = model(test_x, W, b)
op = sess.run(predict_op, feed_dict={x: test_x})
# find max probability from the probability distribution returned by softmax
max_probability = op[0][0]
max_probability_index = -1
for i in range(num_chars):
if op[0][i] > max_probability:
max_probability = op[0][i]
max_probability_index = i
# append it to final output
final_text += char_map_list[max_probability_index]
# Reset the model so it can be used again
tf.reset_default_graph()
sess.close()

Feed a trainable variable into a placeholder in TensorFlow

Suppose I want to train a model over a number of samples (and also variables) that are known only at run time.
Case study: PCA (X W = Y)
(this is only a simplification of a much more complex model)
Take for example this simple PCA model, where only the features dimensions (Din and Dout) are known and fixed.
W = tf.Variable(tf.zeros([D_in, D_out]), name='weights', trainable=True)
X = tf.placeholder(tf.float32, [None, D_in], name='placeholder_latent')
Y_est = tf.matmul(X, W)
loss = tf.reduce_sum((Y_tf-Y_est)**2)
train_step = tf.train.AdamOptimizer(0.001).minimize(loss)
Suppose now we generate some data
W_true = np.random.randn(D_in, D_out)
X_true = np.random.randn(N, D_in)
Y_true = np.dot(X_true, W_true)
Y_tf = tf.constant(Y_true.astype(np.float32))
As soon as I know the dimension of my training data, I can declare the latent variable that will be fed to the placeholder X to be optimised.
latent = tf.Variable(tf.zeros([N, D_in]), name='latent', trainable=True)
init_op = tf.global_variables_initializer()
After that, what I would like to do is to feed this latent variable to the placeholder X and run the optimisation.
with tf.Session() as sess:
sess.run(init_op)
for n in range(10000):
sess.run(train_step, feed_dict={X : sess.run(latent)})
if (n+1) % 1000 == 0:
print('iter %i, %f' % (n+1, sess.run(loss, feed_dict={X : sess.run(latent)})))
The problem is that the optimiser does not optimise W and latent, but only W. I have tried also to feed the variable directly without evaluation but I get this error:
ValueError: setting an array element with a sequence.
Have you ever encountered this kind of issue? Do you know how to overcome this problem? Are there any possible workaround to optimise on a placeholder?
By the way, I am using TensorFlow 1.1.0rc0 with Python 2.7.13

Tensorflow: No gradients provided for any variable

I am new to tensorflow and I am building a network but failing to compute/apply the gradients for it. I get the error:
ValueError: No gradients provided for any variable: ((None, tensorflow.python.ops.variables.Variable object at 0x1025436d0), ... (None, tensorflow.python.ops.variables.Variable object at 0x10800b590))
I tried using a tensorboard graph to see if there`s was something that made it impossible to trace the graph and get the gradients but I could not see anything.
Here`s part of the code:
sess = tf.Session()
X = tf.placeholder(type, [batch_size,feature_size])
W = tf.Variable(tf.random_normal([feature_size, elements_size * dictionary_size]), name="W")
target_probabilties = tf.placeholder(type, [batch_size * elements_size, dictionary_size])
lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_hidden_size)
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * number_of_layers)
initial_state = state = stacked_lstm.zero_state(batch_size, type)
output, state = stacked_lstm(X, state)
pred = tf.matmul(output,W)
pred = tf.reshape(pred, (batch_size * elements_size, dictionary_size))
# instead of calculating this, I will calculate the difference between the target_W and the current W
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(target_probabilties, pred)
cost = tf.reduce_mean(cross_entropy)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
sess.run(optimizer, feed_dict={X:my_input, target_probabilties:target_prob})
I will appreciate any help on figuring this out.

I always have the tf.nn.softmax_cross_entropy_with_logits() used so that I have the logits as first argument and the labels as second. Can you try this?

How to test tensorflow cifar10 cnn tutorial model

I am relatively new to machine-learning and currently have almost no experiencing in developing it.
So my Question is: after training and evaluating the cifar10 dataset from the tensorflow tutorial I was wondering how could one test it with sample images?
I could train and evaluate the Imagenet tutorial from the caffe machine-learning framework and it was relatively easy to use the trained model on custom applications using the python API.
Any help would be very appreciated!

This isn't 100% the answer to the question, but it's a similar way of solving it, based on a MNIST NN training example suggested in the comments to the question.
Based on the TensorFlow begginer MNIST tutorial, and thanks to this tutorial, this is a way of training and using your Neural Network with custom data.
Please note that similar should be done for tutorials such as the CIFAR10, as #Yaroslav Bulatov mentioned in the comments.
import input_data
import datetime
import numpy as np
import tensorflow as tf
import cv2
from matplotlib import pyplot as plt
import matplotlib.image as mpimg
from random import randint
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x = tf.placeholder("float", [None, 784])
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x,W) + b)
y_ = tf.placeholder("float", [None,10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
#Train our model
iter = 1000
for i in range(iter):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
#Evaluationg our model:
correct_prediction=tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,"float"))
print "Accuracy: ", sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})
#1: Using our model to classify a random MNIST image from the original test set:
num = randint(0, mnist.test.images.shape[0])
img = mnist.test.images[num]
classification = sess.run(tf.argmax(y, 1), feed_dict={x: [img]})
'''
#Uncomment this part if you want to plot the classified image.
plt.imshow(img.reshape(28, 28), cmap=plt.cm.binary)
plt.show()
'''
print 'Neural Network predicted', classification[0]
print 'Real label is:', np.argmax(mnist.test.labels[num])
#2: Using our model to classify MNIST digit from a custom image:
# create an an array where we can store 1 picture
images = np.zeros((1,784))
# and the correct values
correct_vals = np.zeros((1,10))
# read the image
gray = cv2.imread("my_digit.png", 0 ) #0=cv2.CV_LOAD_IMAGE_GRAYSCALE #must be .png!
# rescale it
gray = cv2.resize(255-gray, (28, 28))
# save the processed images
cv2.imwrite("my_grayscale_digit.png", gray)
"""
all images in the training set have an range from 0-1
and not from 0-255 so we divide our flatten images
(a one dimensional vector with our 784 pixels)
to use the same 0-1 based range
"""
flatten = gray.flatten() / 255.0
"""
we need to store the flatten image and generate
the correct_vals array
correct_val for a digit (9) would be
[0,0,0,0,0,0,0,0,0,1]
"""
images[0] = flatten
my_classification = sess.run(tf.argmax(y, 1), feed_dict={x: [images[0]]})
"""
we want to run the prediction and the accuracy function
using our generated arrays (images and correct_vals)
"""
print 'Neural Network predicted', my_classification[0], "for your digit"
For further image conditioning (digits should be completely dark in a white background) and better NN training (accuracy>91%) please check the Advanced MNIST tutorial from TensorFlow or the 2nd tutorial i've mentioned.

The below example is not for the mnist tutorial, but a simple XOR example. Note the train() and test() methods. All that we declare & keep globally are the weights, biases, and session. In the test method we redefine the shape of the input and reuse the same weights & biases (and session) that we refined in training.
import tensorflow as tf
#parameters for the net
w1 = tf.Variable(tf.random_uniform(shape=[2,2], minval=-1, maxval=1, name='weights1'))
w2 = tf.Variable(tf.random_uniform(shape=[2,1], minval=-1, maxval=1, name='weights2'))
#biases
b1 = tf.Variable(tf.zeros([2]), name='bias1')
b2 = tf.Variable(tf.zeros([1]), name='bias2')
#tensorflow session
sess = tf.Session()
def train():
#placeholders for the traning inputs (4 inputs with 2 features each) and outputs (4 outputs which have a value of 0 or 1)
x = tf.placeholder(tf.float32, [4, 2], name='x-inputs')
y = tf.placeholder(tf.float32, [4, 1], name='y-inputs')
#set up the model calculations
temp = tf.sigmoid(tf.matmul(x, w1) + b1)
output = tf.sigmoid(tf.matmul(temp, w2) + b2)
#cost function is avg error over training samples
cost = tf.reduce_mean(((y * tf.log(output)) + ((1 - y) * tf.log(1.0 - output))) * -1)
#training step is gradient descent
train_step = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(cost)
#declare training data
training_x = [[0,1], [0,0], [1,0], [1,1]]
training_y = [[1], [0], [1], [0]]
#init session
init = tf.initialize_all_variables()
sess.run(init)
#training
for i in range(100000):
sess.run(train_step, feed_dict={x:training_x, y:training_y})
if i % 1000 == 0:
print (i, sess.run(cost, feed_dict={x:training_x, y:training_y}))
print '\ntraining done\n'
def test(inputs):
#redefine the shape of the input to a single unit with 2 features
xtest = tf.placeholder(tf.float32, [1, 2], name='x-inputs')
#redefine the model in terms of that new input shape
temp = tf.sigmoid(tf.matmul(xtest, w1) + b1)
output = tf.sigmoid(tf.matmul(temp, w2) + b2)
print (inputs, sess.run(output, feed_dict={xtest:[inputs]})[0, 0] >= 0.5)
train()
test([0,1])
test([0,0])
test([1,1])
test([1,0])

I recommend taking a look at the basic MNIST tutorial on the TensorFlow website. It looks like you define some function that generates the type of output that you want, and then run your session, passing it this evaluation function (correct_prediction below), and a dictionary containing whatever arguments you require (x and y_ below).
If you have defined and trained some network that takes an input x, generates a response y based on your inputs, and you know your expected responses for your testing set y_, you may be able to print out every response to your testing set with something like:
correct_prediction = tf.equal(y, y_) % Check whether your prediction is correct
print(sess.run(correct_prediction, feed_dict={x: test_images, y_: test_labels}))
This is just a modification of what is done in the tutorial, where instead of trying to print each response, they determine the percent of correct responses. Also note that the tutorial uses one-hot vectors for the prediction y and actual value y_, so in order to return the associated numeral, they have to find which index of these vectors are equal to one with tf.argmax(y, 1).
Edit
In general, if you define something in your graph, you can output it later when you run your graph. Say you define something that determines the result of the softmax function on your output logits as:
graph = tf.Graph()
with graph.as_default():
...
prediction = tf.nn.softmax(logits)
...
then you can output this at run time with:
with tf.Session(graph=graph) as sess:
...
feed_dict = { ... } # define your feed dictionary
pred = sess.run([prediction], feed_dict=feed_dict)
# do stuff with your prediction vector

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't get gradients from loaded model in tensorflow/keras - python

Related

Efficient example implementation of GPU-training of a simple feed-forward NN in TensorFlow? Maybe with tf.data?

Tensorflow program give different answers after deployed on aws lambda

Feed a trainable variable into a placeholder in TensorFlow

Tensorflow: No gradients provided for any variable

How to test tensorflow cifar10 cnn tutorial model

Categories

Resources