I am currently trying to get familiar with the Tensorflow library and I have a rather fundamental question that bugs me.
While building a convolutional neural network for MNIST classification I tried to use my own model_fn. In which usually the following line occurs to reshape the input features.
x = tf.reshape(x, shape=[-1, 28, 28, 1]), with the -1 referring to the input batch size.
Since I use this node as input to my convolutional layer,
x = tf.reshape(x, shape=[-1, 28, 28, 1])
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
does this mean that the size all my networks layers are dependent on the batch size?
I tried freezing and running the graph on a single test input, which will only work if I provide n=batch_size test images.
Can you give me a hint on how to make my network run on any input batchsize while predicting?
Also I guess using the tf.reshape node (see first node in cnn_layout) in the network definition is not the best input for serving.
I will append my network layer-up and the model_fn
def cnn_layout(features,reuse,is_training):
with tf.variable_scope('cnn',reuse=reuse):
# resize input to [batchsize,height,width,channel]
x = tf.reshape(features['x'], shape=[-1,30,30,1], name='input_placeholder')
# conv1, 32 filter, 5 kernel
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu, name='conv1')
# pool1, 2 stride, 2 kernel
pool1 = tf.layers.max_pooling2d(conv1, 2, 2, name='pool1')
# conv2, 64 filter, 3 kernel
conv2 = tf.layers.conv2d(pool1, 64, 3, activation=tf.nn.relu, name='conv2')
# pool2, 2 stride, 2 kernel
pool2 = tf.layers.max_pooling2d(conv2, 2, 2, name='pool2')
# flatten pool2
flatten = tf.contrib.layers.flatten(pool2)
# fc1 with 1024 neurons
fc1 = tf.layers.dense(flatten, 1024, name='fc1')
# 75% dropout
drop = tf.layers.dropout(fc1, rate=0.75, training=is_training, name='dropout')
# output logits
output = tf.layers.dense(drop, 1, name='output_logits')
return output
def model_fn(features, labels, mode):
# setup two networks one for training one for prediction while sharing weights
logits_train = cnn_layout(features=features,reuse=False,is_training=True)
logits_test = cnn_layout(features=features,reuse=True,is_training=False)
# predictions
predictions = tf.round(tf.sigmoid(logits_test),name='predictions')
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode, predictions=predictions)
# define loss and optimizer
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits_train,labels=labels),name='loss')
optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE, name='optimizer')
train = optimizer.minimize(loss, global_step=tf.train.get_global_step(),name='train')
# accuracy for evaluation
accuracy = tf.metrics.accuracy(labels=labels,predictions=predictions,name='accuracy')
# summarys for tensorboard
tf.summary.scalar('loss',loss)
# return training and evalution spec
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=loss,
train_op=train,
eval_metric_ops={'accuracy':accuracy}
)
Thanks!
In the typical scenario, the rank of features['x'] is already going to be 4, with the outer dimension being the actual batch size, so there's no need to resize it.
Let me try to explain.
You haven't shown your serving_input_receiver_fn yet and there are several ways to do that, although in the end the principle is similar across them all. If you're using TensorFlow Serving, then you probably use build_parsing_serving_input_receiver_fn. It's informative to look at the source code:
def build_parsing_serving_input_receiver_fn(feature_spec,
default_batch_size=None):
serialized_tf_example = array_ops.placeholder(
dtype=dtypes.string,
shape=[default_batch_size],
name='input_example_tensor')
receiver_tensors = {'examples': serialized_tf_example}
features = parsing_ops.parse_example(serialized_tf_example, feature_spec)
return ServingInputReceiver(features, receiver_tensors)
So in your client, you're going to prepare a request that has one or more Examples in it (let's say the length is N). The server treats the serialized examples as a list of strings which get "fed" into the input_example_tensor placeholder. The shape (which is None) dynamically gets filled in to be the size of the list (N).
Then the parse_example op parses each item in the placeholder and out pops a Tensor for each feature whose outer dimension is N. In your case, you'll have x with shape=[N, 30, 30, 1].
(Note that other serving systems, such as CloudML Engine, do not operate on Example objects, but the principles are the same).
I just want to briefly provide my found solution. Since I did not want to build a scalable production grade model, but a simple model runner in python to execute my CNN locally.
To export the model I used,
input_size = 900
def serving_input_receiver_fn():
inputs = {"x": tf.placeholder(shape=[None, input_size], dtype=tf.float32)}
return tf.estimator.export.ServingInputReceiver(inputs, inputs)
model.export_savedmodel(
export_dir_base=model_dir,
serving_input_receiver_fn=serving_input_receiver_fn)
To load and run it (without needing the model definition again) I used the tensorflow predictor class.
from tensorflow.contrib import predictor
class TFRunner:
""" runs a frozen cnn graph """
def __init__(self,model_dir):
self.predictor = predictor.from_saved_model(model_dir)
def run(self, input_list):
""" runs the input list through the graph, returns output """
if len(input_list) > 1:
inputs = np.vstack(input_list)
predictions = self.predictor({"x": inputs})
elif len(input_list) == 1:
predictions = self.predictor({"x": input_list[0]})
else:
predictions = []
return predictions
Related
Hi I need to change the first convolution of a model from rgb/resnet_v1_50/conv1/weights:0 (float32_ref 7x7x3x64) to rgb/resnet_v1_50/conv1/weights:0 (float32_ref 7x7x4x64), so basicaly augmenting the number of filter form 3 to 4 to accept 4 channels images but keeping the pretrained weight elsewhere (just the additional channel initialize ramdonly).
Do you have an idea of how to do that in Tensorflow 1.x (I'm more of a PyTorch guy...) ?
In PyTorch I do:
net = model.resnet50(num_classes=dataset_train.num_classes(),pretrained=True)
new_conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2,padding=3,bias=False)
conv1 = net.conv1
with torch.no_grad():
new_conv1.weight[:, :3, :, :]= conv1.weight
new_conv1.bias = conv1.bias
net.conv1 = new_conv1
Here is how the model is created in tensorflow:
def single_stream(self, images, modality, is_training, reuse=False):
with tf.variable_scope(modality, reuse=reuse):
with slim.arg_scope(resnet_v1.resnet_arg_scope()):
_, end_points = resnet_v1.resnet_v1_50(
images, self.no_classes, is_training=is_training, reuse=reuse)
# last bottleneck before logits
net = end_points[modality + '/resnet_v1_50/block4']
if 'autoencoder' in self.mode:
return net
with tf.variable_scope(modality + '/resnet_v1_50', reuse=reuse):
bottleneck = slim.conv2d(net, self.hidden_repr_size, [
7, 7], padding='VALID', activation_fn=tf.nn.relu, scope='f_repr')
net = slim.conv2d(bottleneck, self.no_classes, [
1, 1], activation_fn=None, scope='_logits_')
if ('train_hallucination' in self.mode or 'test_disc' in self.mode or 'train_eccv' in self.mode):
return net, bottleneck
return net
I am able with the command in the build_model: self.images = tf.placeholder(tf.float32, [None, 224, 224, 4], modality + '_images') to effectively change the 3 to a 4: rgb/resnet_v1_50/conv1/weights:0 (float32_ref 7x7x4x64) [12544, bytes: 50176] but the problem is thus now with the checkpoint!
Thanks a lot for your help!
As you do with Pytorch, you can do the same in Keras, which is now a module of TF2 (more info).
I'm gonna show you one possible way to do so:
net_conv1 = model.layers[2] # first 2D convolutional layer, from model.layers, or model.summary()
# your new set of weights must have same dimensions of the ouput of the layer
print( 'weights shape: ', numpy.shape(net_conv1.weights) )
print( net_conv1.weights[0].shape )
print( net_conv1.weights[1].shape )
# New weights
osh_0 = net_conv1.weights[0].shape.as_list()
osh_1 = net_conv1.weights[1].shape.as_list()
print(osh_0, osh_1)
new_conv1_w_0 = numpy.random.rand( *osh_0 )
new_conv1_w_1 = numpy.random.rand( *osh_1 )
# update the weights
net_conv1.set_weights([new_conv1_w_0, new_conv1_w_1])
# check the result
net_conv1.get_weights()
# update the model
model.layers[2] = net_conv1
Check the layers section of Keras doc.
Hope it will be helpful
I would like to extract and store the dropout mask [array of 1/0s] from a dropout layer in a Sequential Keras model at each batch while training. I was wondering if there was a straight forward way way to do this within Keras or if I would need to switch over to tensorflow (How to get the dropout mask in Tensorflow).
Would appreciate any help! I'm quite new to TensorFlow and Keras.
There are a couple of functions (dropout_layer.get_output_mask(), dropout_layer.get_input_mask()) for the dropout layer that I tried using but got None after calling on the previous layer.
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(name="flat", input_shape=(28, 28, 1)))
model.add(tf.keras.layers.Dense(
512,
activation='relu',
name = 'dense_1',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros'))
dropout = tf.keras.layers.Dropout(0.2, name = 'dropout') #want this layer's mask
model.add(dropout)
x = dropout.output_mask
y = dropout.input_mask
model.add(tf.keras.layers.Dense(
10,
activation='softmax',
name='dense_2',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros'))
model.compile(...)
model.fit(...)
It's not easily exposed in Keras. It goes deep until it calls the Tensorflow dropout.
So, although you're using Keras, it's will also be a tensor in the graph that can be gotten by name (finding it's name: In Tensorflow, get the names of all the Tensors in a graph).
This option, of course will lack some keras information, you should probably have to do that inside a Lambda layer so Keras adds certain information to the tensor. And you must take extra care because the tensor will exist even when not training (where the mask is skipped)
Now, you can also use a less hacky way, that may consume a little processing:
def getMask(x):
boolMask = tf.not_equal(x, 0)
floatMask = tf.cast(boolMask, tf.float32) #or tf.float64
return floatMask
Use a Lambda(getMasc)(output_of_dropout_layer)
But instead of using a Sequential model, you will need a functional API Model.
inputs = tf.keras.layers.Input((28, 28, 1))
outputs = tf.keras.layers.Flatten(name="flat")(inputs)
outputs = tf.keras.layers.Dense(
512,
# activation='relu', #relu will be a problem here
name = 'dense_1',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros')(outputs)
outputs = tf.keras.layers.Dropout(0.2, name = 'dropout')(outputs)
mask = Lambda(getMask)(outputs)
#there isn't "input_mask"
#add the missing relu:
outputs = tf.keras.layers.Activation('relu')(outputs)
outputs = tf.keras.layers.Dense(
10,
activation='softmax',
name='dense_2',
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
bias_initializer='zeros')(outputs)
model = Model(inputs, outputs)
model.compile(...)
model.fit(...)
Training and predicting
Since you can't train the masks (it doesn't make any sense), it should not be an output of the model for training.
Now, we could try this:
trainingModel = Model(inputs, outputs)
predictingModel = Model(inputs, [output, mask])
But masks don't exist in prediction, because dropout is only applied in training. So this doesn't bring us anything good in the end.
The only way for training is then using a dummy loss and dummy targets:
def dummyLoss(y_true, y_pred):
return y_true #but this might evoke a "None" gradient problem since it's not trainable, there is no connection to any weights, etc.
model.compile(loss=[loss_for_main_output, dummyLoss], ....)
model.fit(x_train, [y_train, np.zeros((len(y_Train),) + mask_shape), ...)
It's not guaranteed that these will work.
I found a very hacky way to do this by trivially extending the provided dropout layer. (Almost all code from TF.)
class MyDR(tf.keras.layers.Layer):
def __init__(self,rate,**kwargs):
super(MyDR, self).__init__(**kwargs)
self.noise_shape = None
self.rate = rate
def _get_noise_shape(self,x, noise_shape=None):
# If noise_shape is none return immediately.
if noise_shape is None:
return array_ops.shape(x)
try:
# Best effort to figure out the intended shape.
# If not possible, let the op to handle it.
# In eager mode exception will show up.
noise_shape_ = tensor_shape.as_shape(noise_shape)
except (TypeError, ValueError):
return noise_shape
if x.shape.dims is not None and len(x.shape.dims) == len(noise_shape_.dims):
new_dims = []
for i, dim in enumerate(x.shape.dims):
if noise_shape_.dims[i].value is None and dim.value is not None:
new_dims.append(dim.value)
else:
new_dims.append(noise_shape_.dims[i].value)
return tensor_shape.TensorShape(new_dims)
return noise_shape
def build(self, input_shape):
self.noise_shape = input_shape
print(self.noise_shape)
super(MyDR,self).build(input_shape)
#tf.function
def call(self,input):
self.noise_shape = self._get_noise_shape(input)
random_tensor = tf.random.uniform(self.noise_shape, seed=1235, dtype=input.dtype)
keep_prob = 1 - self.rate
scale = 1 / keep_prob
# NOTE: if (1.0 + rate) - 1 is equal to rate, then we want to consider that
# float to be selected, hence we use a >= comparison.
self.keep_mask = random_tensor >= self.rate
#NOTE: here is where I save the binary masks.
#the file grows quite big!
tf.print(self.keep_mask,output_stream="file://temp/droput_mask.txt")
ret = input * scale * math_ops.cast(self.keep_mask, input.dtype)
return ret
I have used Keras and TensorFlow to classify the Fashion MNIST following this tutorial .
It uses the AdamOptimizer to find the value for model parameters that minimize the loss function of the network. The input for the network is a 2-D tensor with shape [28, 28], and output is a 1-D tensor with shape [10] which is the result of a softmax function.
Once the network has been trained, I want to use the optimizer for another task: find an input that maximizes one of the elements of the output tensor. How can this be done? Is it possible to do so using Keras or one have to use a lower level API?
Since the input is not unique for a given output, it would be even better if we could impose some constraints on the values the input can take.
The trained model has the following format
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
I feel you would want to backprop with respect to the input freezing all the weights to your model. What you could do is:
Add a dense layer after the input layer with the same dimensions as input and set it as trainable
Freeze all the other layers of your model. (except the one you added)
As an input, feed an identity matrix and train your model based on whatever output you desire.
This article and this post might be able to help you if you want to backprop based on the input instead. It's a bit like what you are aiming for but you can get the intuition.
It would be very similar to the way that filters of a Convolutional Network is visualized: we would do gradient ascent optimization in input space to maximize the response of a particular filter.
Here is how to do it: after training is finished, first we need to specify the output and define a loss function that we want to maximize:
from keras import backend as K
output_class = 0 # the index of the output class we want to maximize
output = model.layers[-1].output
loss = K.mean(output[:,output_class]) # get the average activation of our desired class over the batch
Next, we need to take the gradient of the loss we have defined above with respect to the input layer:
grads = K.gradients(loss, model.input)[0] # the output of `gradients` is a list, just take the first (and only) element
grads = K.l2_normalize(grads) # normalize the gradients to help having an smooth optimization process
Next, we need to define a backend function that takes the initial input image and gives the values of loss and gradients as outputs, so that we can use it in the next step to implement the optimization process:
func = K.function([model.input], [loss, grads])
Finally, we implement the gradient ascent optimization process:
import numpy as np
input_img = np.random.random((1, 28, 28)) # define an initial random image
lr = 1. # learning rate used for gradient updates
max_iter = 50 # number of gradient updates iterations
for i in range(max_iter):
loss_val, grads_val = func([input_img])
input_img += grads_val * lr # update the image based on gradients
Note that, after this process is finished, to display the image you may need to make sure that all the values in the image are in the range [0, 255] (or [0,1]).
After the hints Saket Kumar Singh gave in his answer, I wrote the following that seems to solve the question.
I create two custom layers. Maybe Keras offers already some classes that are equivalent to them.
The first on is a trainable input:
class MyInputLayer(keras.layers.Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyInputLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.kernel = self.add_weight(name='kernel',
shape=self.output_dim,
initializer='uniform',
trainable=True)
super(MyInputLayer, self).build(input_shape)
def call(self, x):
return self.kernel
def compute_output_shape(self, input_shape):
return self.output_dim
The second one gets the probability of the label of interest:
class MySelectionLayer(keras.layers.Layer):
def __init__(self, position, **kwargs):
self.position = position
self.output_dim = 1
super(MySelectionLayer, self).__init__(**kwargs)
def build(self, input_shape):
super(MySelectionLayer, self).build(input_shape)
def call(self, x):
mask = np.array([False]*x.shape[-1])
mask[self.position] = True
return tf.boolean_mask(x, mask,axis=1)
def compute_output_shape(self, input_shape):
return self.output_dim
I used them in this way:
# Build the model
layer_flatten = keras.layers.Flatten(input_shape=(28, 28))
layerDense1 = keras.layers.Dense(128, activation=tf.nn.relu)
layerDense2 = keras.layers.Dense(10, activation=tf.nn.softmax)
model = keras.Sequential([
layer_flatten,
layerDense1,
layerDense2
])
# Compile the model
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
# ...
# Freeze the model
layerDense1.trainable = False
layerDense2.trainable = False
# Build another model
class_index = 7
layerInput = MyInputLayer((1,784))
layerSelection = MySelectionLayer(class_index)
model_extended = keras.Sequential([
layerInput,
layerDense1,
layerDense2,
layerSelection
])
# Compile it
model_extended.compile(optimizer=tf.train.AdamOptimizer(),
loss='mean_absolute_error')
# Train it
dummyInput = np.ones((1,1))
target = np.ones((1,1))
model_extended.fit(dummyInput, target,epochs=300)
# Retrieve the weights of layerInput
layerInput.get_weights()[0]
Interesting. Maybe a solution would be to feed all your data to the network and for each sample save the output_layer after softmax.
This way, for 3 classes, where you want to find the best input for class 1, you are looking for outputs where the first component is high. For example: [1 0 0]
Indeed the output means the probability, or the confidence of the network, for the sample being one of the classes.
Funny coincident I was just working on the same "problem". I'm interested in the direction of adversarial training etc. What I did was to insert a LocallyConnected2D Layer after the input and then train with data which is all one and has as targets the class of interest.
As model I use
batch_size = 64
num_classes = 10
epochs = 20
input_shape = (28, 28, 1)
inp = tf.keras.layers.Input(shape=input_shape)
conv1 = tf.keras.layers.Conv2D(32, kernel_size=(3, 3),activation='relu',kernel_initializer='he_normal')(inp)
pool1 = tf.keras.layers.MaxPool2D((2, 2))(conv1)
drop1 = tf.keras.layers.Dropout(0.20)(pool1)
flat = tf.keras.layers.Flatten()(drop1)
fc1 = tf.keras.layers.Dense(128, activation='relu')(flat)
norm1 = tf.keras.layers.BatchNormalization()(fc1)
dropfc1 = tf.keras.layers.Dropout(0.25)(norm1)
out = tf.keras.layers.Dense(num_classes, activation='softmax')(dropfc1)
model = tf.keras.models.Model(inputs = inp , outputs = out)
model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.keras.optimizers.RMSprop(),
metrics=['accuracy'])
model.summary()
after training I insert the new layer
def insert_intermediate_layer_in_keras(model,position, before_layer_id):
layers = [l for l in model.layers]
if(before_layer_id==0) :
x = new_layer
else:
x = layers[0].output
for i in range(1, len(layers)):
if i == before_layer_id:
x = new_layer(x)
x = layers[i](x)
else:
x = layers[i](x)
new_model = tf.keras.models.Model(inputs=layers[0].input, outputs=x)
return new_model
def fix_model(model):
for l in model.layers:
l.trainable=False
fix_model(model)
new_layer = tf.keras.layers.LocallyConnected2D(1, kernel_size=(1, 1),
activation='linear',
kernel_initializer='he_normal',
use_bias=False)
new_model = insert_intermediate_layer_in_keras(model,new_layer,1)
new_model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.keras.optimizers.RMSprop(),
metrics=['accuracy'])
and finally rerun training with my fake data.
X_fake = np.ones((60000,28,28,1))
print(Y_test.shape)
y_fake = np.ones((60000))
Y_fake = tf.keras.utils.to_categorical(y_fake, num_classes)
new_model.fit(X_fake, Y_fake, epochs=100)
weights = new_layer.get_weights()[0]
imshow(weights.reshape(28,28))
plt.show()
Results are not yet satisfying but I'm confident of the approach and guess I need to play around with the optimiser.
I have a problem with using diffrent dataset then default from tensorflow.
I have code using MNIST dataset to recognize digits. In this application there is generated graph, which is imported later by android app.
Now I would like to recognize digits and math's operators (basic one: +, -, *, /).
I found script to generate data I need. I have two .pickle files.
But even with the dataset which suits for me, still I don't know how to import this dataset to my app with tensorflow.
I would be grateful for help with this or maybe to give me other (maybe easier) solution.
EDIT
I did some changes in the code which were adviced by gabriele.
Now I have error:
(x, label) = train_pickle_reader('train.pickle')
ValueError: too many values to unpack (expected 2)
I found the description of the dataset I used:
Extracts trace groups from inkml files.
Converts extracted trace groups into images. Images are square shaped bitmaps with only black (value 0) and white (value 1) pixels. Black color denotes patterns (ROI).
Labels those images (according to inkml files).
Flattens images to one-dimensional vectors.
Converts labels to one-hot format.
Dumps training and testing sets separately into outputs folder.
Below there is code in python:
import tensorflow as tf
import pickle
def train_pickle_reader(filename):
with open(filename, 'rb') as f:
x = pickle.load(f)
# assuming x is already of the form (all_train_input, all_train_labels):
return x
def test_pickle_reader(filename):
with open(filename, 'rb') as f:
x = pickle.load(f)
# assuming x is already of the form (all_train_input, all_train_labels):
return x
# Function to create a weight neuron using a random number. Training will assign a real weight later
def weight_variable(shape, name):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial, name=name)
# Function to create a bias neuron. Bias of 0.1 will help to prevent any 1 neuron from being chosen too often
def biases_variable(shape, name):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial, name=name)
# Function to create a convolutional neuron. Convolutes input from 4d to 2d. This helps streamline inputs
def conv_2d(x, W, name):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME', name=name)
# Function to create a neuron to represent the max input. Helps to make the best prediction for what comes next
def max_pool(x, name):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)
# A way to input images (as 784 element arrays of pixel values 0 - 1)
x_input = tf.placeholder(dtype=tf.float32, shape=[None, 784], name='x_input')
# A way to input labels to show model what the correct answer is during training
y_input = tf.placeholder(dtype=tf.float32, shape=[None, 10], name='y_input')
# First convolutional layer - reshape/resize images
# A weight variable that examines batches of 5x5 pixels, returns 32 features (1 feature per bit value in 32 bit float)
W_conv1 = weight_variable([5, 5, 1, 32], 'W_conv1')
# Bias variable to add to each of the 32 features
b_conv1 = biases_variable([32], 'b_conv1')
# Reshape each input image into a 28 x 28 x 1 pixel matrix
x_image = tf.reshape(x_input, [-1, 28, 28, 1], name='x_image')
# Flattens filter (W_conv1) to [5 * 5 * 1, 32], multiplies by [None, 28, 28, 1] to associate each 5x5 batch with the
# 32 features, and adds biases
h_conv1 = tf.nn.relu(conv_2d(x_image, W_conv1, name='conv1') + b_conv1, name='h_conv1')
# Takes windows of size 2x2 and computes a reduction on the output of h_conv1 (computes max, used for better prediction)
# Images are reduced to size 14 x 14 for analysis
h_pool1 = max_pool(h_conv1, name='h_pool1')
# Second convolutional layer, reshape/resize images
# Does mostly the same as above but converts each 32 unit output tensor from layer 1 to a 64 feature tensor
W_conv2 = weight_variable([5, 5, 32, 64], 'W_conv2')
b_conv2 = biases_variable([64], 'b_conv2')
h_conv2 = tf.nn.relu(conv_2d(h_pool1, W_conv2, name='conv2') + b_conv2, name='h_conv2')
# Images at this point are reduced to size 7 x 7 for analysis
h_pool2 = max_pool(h_conv2, name='h_pool2')
# First dense layer, performing calculation based on previous layer output
# Each image is 7 x 7 at the end of the previous section and outputs 64 features, we want 32 x 32 neurons = 1024
W_dense1 = weight_variable([7 * 7 * 64, 1024], name='W_dense1')
# bias variable added to each output feature
b_dense1 = biases_variable([1024], name='b_dense1')
# Flatten each of the images into size [None, 7 x 7 x 64]
h_pool_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64], name='h_pool_flat')
# Multiply weights by the outputs of the flatten neuron and add biases
h_dense1 = tf.nn.relu(tf.matmul(h_pool_flat, W_dense1, name='matmul_dense1') + b_dense1, name='h_dense1')
# Dropout layer prevents overfitting or recognizing patterns where none exist
# Depending on what value we enter into keep_prob, it will apply or not apply dropout layer
keep_prob = tf.placeholder(dtype=tf.float32, name='keep_prob')
# Dropout layer will be applied during training but not testing or predicting
h_drop1 = tf.nn.dropout(h_dense1, keep_prob, name='h_drop1')
# Readout layer used to format output
# Weight variable takes inputs from each of the 1024 neurons from before and outputs an array of 10 elements
W_readout1 = weight_variable([1024, 10], name='W_readout1')
# Apply bias to each of the 10 outputs
b_readout1 = biases_variable([10], name='b_readout1')
# Perform final calculation by multiplying each of the neurons from dropout layer by weights and adding biases
y_readout1 = tf.add(tf.matmul(h_drop1, W_readout1, name='matmul_readout1'), b_readout1, name='y_readout1')
# Softmax cross entropy loss function compares expected answers (labels) vs actual answers (logits)
cross_entropy_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_input, logits=y_readout1))
# Adam optimizer aims to minimize loss
train_step = tf.train.AdamOptimizer(0.0001).minimize(cross_entropy_loss)
# Compare actual vs expected outputs to see if highest number is at the same index, true if they match and false if not
correct_prediction = tf.equal(tf.argmax(y_input, 1), tf.argmax(y_readout1, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Used to save the graph and weights
saver = tf.train.Saver()
# Run in with statement so session only exists within it
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# Save the graph shape and node names to pbtxt file
tf.train.write_graph(sess.graph_def, '.', 'advanced_mnist.pbtxt', False)
(x, label) = train_pickle_reader('train.pickle')
batch_size = 64 # the batch size you want to use
num_batches = len(x)//batch_size
# Train the model, running through data 20000 times in batches of 50
# Print out step # and accuracy every 100 steps and final accuracy at the end of training
# Train by running train_step and apply dropout by setting keep_prob to 0.5
for i in range(20000):
for j in range(num_batches):
x_batch = x[j * batch_size: (j + 1) * batch_size]
label_batch = label[j * batch_size: (j + 1)*batch_size]
train_step.run(feed_dict={x_input: x_batch, y_input: label_batch, keep_prob: 0.5})
# Save the session with graph shape and node weights
saver.save(sess, 'advanced_mnist.ckpt')
# Make a prediction
(x, labels) = test_pickle_reader('test.pickle')
print(sess.run(y_readout1, feed_dict={x_input: x, keep_prob: 1.0}))
In your code, after instantiating a tf.Session(), the line batch = mnist_data.train.next_batch(50) calls a built in function which returns a tuple of the kind (input, label). In order to feed the network with your data, here you need to define some function returning i.e. a numpy array having the input data and the associated label. For example, assuming you have a pickle file containing your training data, your code should look something like:
def pikle_reader(filename):
with open(filename, 'r') as f:
x = pickle.load(f)
# assuming x is already of the form (all_train_input, all_train_labels):
return x
[...]
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
[...]
# get your data:
(x, label) = pikle_reader(filename)
batch_size = 64 # the batch size you want to use
num_batches = len(x)//batch_size
for i in range(20000): # number of epochs
for j in range(num_batches):
x_batch = x[j*batch_size: (j+1)*batch_size]
label_batch = label[j* batch_size: (j+1)batch_size]
train_step.run(feed_dict={x_input: x_batch, y_input: label_batch, keep_prob: 0.5})
Here, feed_dict feeds the placeholders x_input with the values in x_batch and the placeholder y_input with label_batch. Then in the session the code will run the train_step operation.
Instead, when you want to make a prediction the code is basically the same:
(x, label) = pikle_reader(test_data_filename)
print(sess.run(y_readout1, feed_dict={x_input: x, keep_prob: 1.0}))
Given a trained LSTM model I want to perform inference for single timesteps, i.e. seq_length = 1 in the example below. After each timestep the internal LSTM (memory and hidden) states need to be remembered for the next 'batch'. For the very beginning of the inference the internal LSTM states init_c, init_h are computed given the input. These are then stored in a LSTMStateTuple object which is passed to the LSTM. During training this state is updated every timestep. However for inference I want the state to be saved in between batches, i.e. the initial states only need to be computed at the very beginning and after that the LSTM states should be saved after each 'batch' (n=1).
I found this related StackOverflow question: Tensorflow, best way to save state in RNNs?. However this only works if state_is_tuple=False, but this behavior is soon to be deprecated by TensorFlow (see rnn_cell.py). Keras seems to have a nice wrapper to make stateful LSTMs possible but I don't know the best way to achieve this in TensorFlow. This issue on the TensorFlow GitHub is also related to my question: https://github.com/tensorflow/tensorflow/issues/2838
Anyone good suggestions for building a stateful LSTM model?
inputs = tf.placeholder(tf.float32, shape=[None, seq_length, 84, 84], name="inputs")
targets = tf.placeholder(tf.float32, shape=[None, seq_length], name="targets")
num_lstm_layers = 2
with tf.variable_scope("LSTM") as scope:
lstm_cell = tf.nn.rnn_cell.LSTMCell(512, initializer=initializer, state_is_tuple=True)
self.lstm = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * num_lstm_layers, state_is_tuple=True)
init_c = # compute initial LSTM memory state using contents in placeholder 'inputs'
init_h = # compute initial LSTM hidden state using contents in placeholder 'inputs'
self.state = [tf.nn.rnn_cell.LSTMStateTuple(init_c, init_h)] * num_lstm_layers
outputs = []
for step in range(seq_length):
if step != 0:
scope.reuse_variables()
# CNN features, as input for LSTM
x_t = # ...
# LSTM step through time
output, self.state = self.lstm(x_t, self.state)
outputs.append(output)
I found out it was easiest to save the whole state for all layers in a placeholder.
init_state = np.zeros((num_layers, 2, batch_size, state_size))
...
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])
Then unpack it and create a tuple of LSTMStateTuples before using the native tensorflow RNN Api.
l = tf.unpack(state_placeholder, axis=0)
rnn_tuple_state = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(num_layers)]
)
RNN passes in the API:
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell, x_input_batch, initial_state=rnn_tuple_state)
The state - variable will then be feeded to the next batch as a placeholder.
Tensorflow, best way to save state in RNNs? was actually my original question. The code bellow is how I use the state tuples.
with tf.variable_scope('decoder') as scope:
rnn_cell = tf.nn.rnn_cell.MultiRNNCell \
([
tf.nn.rnn_cell.LSTMCell(512, num_proj = 256, state_is_tuple = True),
tf.nn.rnn_cell.LSTMCell(512, num_proj = WORD_VEC_SIZE, state_is_tuple = True)
], state_is_tuple = True)
state = [[tf.zeros((BATCH_SIZE, sz)) for sz in sz_outer] for sz_outer in rnn_cell.state_size]
for t in range(TIME_STEPS):
if t:
last = y_[t - 1] if TRAINING else y[t - 1]
else:
last = tf.zeros((BATCH_SIZE, WORD_VEC_SIZE))
y[t] = tf.concat(1, (y[t], last))
y[t], state = rnn_cell(y[t], state)
scope.reuse_variables()
Rather than using tf.nn.rnn_cell.LSTMStateTuple I just create a lists of lists which works fine. In this example I am not saving the state. However you could easily have made state out of variables and just used assign to save the values.