conv2d_transpose is dependent on batch_size when making predictions - python

I have a neural network currently implemented in tensorflow, but I am having a problem making predictions after training, because I have a conv2d_transpose operations, and the shapes of these ops are dependent on the batch size. I have a layer that requires output_shape as an argument:
def deconvLayer(input, filter_shape, output_shape, strides):
W1_1 = weight_variable(filter_shape)
output = tf.nn.conv2d_transpose(input, W1_1, output_shape, strides, padding="SAME")
return output
That is actually used in a larger model I have constructed like the following:
conv3 = layers.convLayer(conv2['layer_output'], [3, 3, 64, 128], use_pool=False)
conv4 = layers.deconvLayer(conv3['layer_output'],
filter_shape=[2, 2, 64, 128],
output_shape=[batch_size, 32, 40, 64],
strides=[1, 2, 2, 1])
The problem is, if I go to make a prediction using the trained model, my test data has to have the same batch size, or else I get the following error.
tensorflow.python.framework.errors.InvalidArgumentError: Conv2DBackpropInput: input and out_backprop must have the same batch size
Is there some way that I can get a prediction for an input with variable batch size? When I look at the trained weights, nothing seems to depend on batch size, so I can't see why this would be a problem.

So I came across a solution based on the issues forum of tensorflow at https://github.com/tensorflow/tensorflow/issues/833.
In my code
conv4 = layers.deconvLayer(conv3['layer_output'],
filter_shape=[2, 2, 64, 128],
output_shape=[batch_size, 32, 40, 64],
strides=[1, 2, 2, 1])
my output shape that get passed to deconvLayer was hard coded with a predetermined batch shape when training. By altering this to the following:
def deconvLayer(input, filter_shape, output_shape, strides):
W1_1 = weight_variable(filter_shape)
dyn_input_shape = tf.shape(input)
batch_size = dyn_input_shape[0]
output_shape = tf.pack([batch_size, output_shape[1], output_shape[2], output_shape[3]])
output = tf.nn.conv2d_transpose(input, W1_1, output_shape, strides, padding="SAME")
return output
This allows the shape to be dynamically inferred at run time and can handle a variable batch size.
Running the code, I no longer receive this error when passing in any batch size of test data. I believe this is necessary due to the fact that the inference of shapes for transpose ops is not as straightforward at the moment as it is for normal convolutional ops. So where we would usually use None for the batch_size in normal convolutional ops, we must provide a shape, and since this could vary based on input, we must go through the effort of dynamically determining it.

Related

Tensorflow: double-batch data to use triplets: (batch_size, 3, 256, 256, 1)

here is my setup:
I have an autoencoder model, which generates a new image in grayscale (1, 256, 256, 1) by mixing three input images (3, 256, 256, 1). This works quite well, however I gave up the batch size, so in every training step the gradient is calculated on one data chunk instead of a whole batch.
To train on batches, I wrote a Custom Dataloader with tf.Sequence, to get Datasets of dimension (bs, 3, 256, 256, 1).
Further, I want to train the autoencoder with a discriminator, so I built one and created a "GAN-based-model", to alternately train both. Here is the code for it:
full_model = GanBasedModel(
autoencoder.input, discriminator(autoencoder(autoencoder.input)))
Here, in my GAN-based-model, I customized the train_step function like that:
#tf.function
def train_step(self, train_data):
generated = []
real_images = []
for train_input in train_data:
generated.append(autoencoder(train_input))
# some code to get real_images
generated_images = tf.stack(generated)
# some more code
So I got this error InaccessibleTensorError: tf.Graph captured an external symbolic tensor. The symbolic tensor <tf.Tensor 'while/sequential/decoder/residual_block_16/StatefulPartitionedCall:0' shape=(1, 256, 256, 1) dtype=float32> is captured by FuncGraph(name=train_step, id=140089464958688), but it is defined at FuncGraph(name=while_body_12507, id=140089463711200). A tf.Graph is not allowed to capture symoblic tensors from another graph. Use return values, explicit Python locals or TensorFlow collections to access it. Please see https://www.tensorflow.org/guide/function#all_outputs_of_a_tffunction_must_be_return_values for more information.
from line generated_images = tf.stack(generated).
As far as I understand, splitting train_data in the for-loop creates new tensors train_input, which can not be traced by tensorflow anymore.
So is there a better way to write the train_step function?
Or are there even better approaches to create a Dataloader which provides batches of triples for my autoencoder?
Thanks for any help

Convolution Neural Net for Stock Market Prediction, Regression

I am working on a stock market prediction project using sentiment analysis. I am trying to create a CNN model where I am passing 4000 days of stock data with a batch size of 100. At the end of the dense layer, I want to add regression layer to get the price of the stock.
def Model(train_data):
input_layer = tf.reshape(tf.cast(train_data, tf.float32), [-1, 1, 100, 2])
conv1 = tf.layers.conv2d(inputs=input_layer,filters=32,kernel_size=[1, 5],padding="same",
activation=tf.nn.relu,strides=1,kernel_initializer=tf.contrib.layers.variance_scaling_initializer())
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[1, 2], strides=[1,2])
conv2 = tf.layers.conv2d(inputs=pool1,filters=8,kernel_size=[1, 5],padding="same",activation=tf.nn.relu,
strides=1,kernel_initializer=tf.contrib.layers.variance_scaling_initializer())
pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[1, 5], strides=[1,5])
conv3 = tf.layers.conv2d(inputs=pool2,filters=2,kernel_size=[1, 2],padding="same",activation=tf.nn.relu,
strides=1,kernel_initializer=tf.contrib.layers.variance_scaling_initializer())
pool3 = tf.layers.max_pooling2d(inputs=conv3, pool_size=[1, 2], strides=[1, 2])
pool3_flat = tf.reshape(pool3, [40, 1 * 5 * 2])
dense = tf.layers.dense(inputs=pool3_flat, units=5, activation=tf.nn.relu)
dropout = tf.layers.dropout(
inputs=dense, rate=0.2, training=mode == tf.estimator.ModeKeys.TRAIN)
logits = tf.layers.dense(inputs=dropout, units=1)
I am referring https://www.tensorflow.org/tutorials/estimators/cnn for the model, but they are doing classification. Can anybody suggest an approach for regression? The train_data for the model has a shape of [2,4000] where one row is for normalized stock prices and another is for sentiment factor.
The only thing you would have to do would be to add a fully connected layer at the very end, and select a linear activation. Intuitively, this will take the outputs of your Conv layers, and apply y = mx + b to them. Your fully connected output layer would have 40 nodes (one for each output). In fact, you already have one dense layer in that code. If your output is of size 40, just make it 40 instead of 5.
Just a side note, traditionally, CNNs are used for image classification, and only recently did it start migrating to other applications (such as spam detection). I would advise trying a simple feed forward neural network first, and if that does not work, perhaps try a RNN before this.

Tensorflow taking data size ( shape ) as dynamic and thus causing errors during node densing

I am training a model that the feature shape is [3751,4] and I'd like to use reshape and layer dense function built in Tensorflow to make the output labels have the shape [1,6].
The training and testing set are very similar, the only difference is that the testing data set has less batches than training set.
Now I am having two hidden layers in my model that will do something like:
input_layer = tf.reshape(features["x"], [1,-1])
first_hidden_layer = tf.layers.dense(input_layer, 4, activation=tf.nn.relu)
second_hidden_layer = tf.layers.dense(first_hidden_layer, 5, activation=tf.nn.relu)
output_layer = tf.layers.dense(second_hidden_layer, 6, activation=tf.nn.relu)
This network structure is a function that both training and evaluating phase utilize.
Partial code for training is like :
nn = tf.estimator.Estimator(model_fn=model_fn, params=model_params, model_dir='/tmp/nmos_self_define')
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": train_features_numpy},
y=train_labels_numpy,
batch_size = 3751,
num_epochs=None,
shuffle=False)
# Train
nn.train(input_fn=train_input_fn, max_steps=5000)
And testing part is like:
test_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": test_features_numpy},
y=test_labels_numpy,
batch_size = 3751,
num_epochs= 1,
shuffle=False)
ev = nn.evaluate(input_fn=test_input_fn)
print("Loss: %s" % ev["loss"])
print("Root Mean Squared Error: %s" % ev["rmse"])
During training, there is no problem, the function can reshape the input data and do the dense part. During the testing, however, the tensor shape of the reshape function gives something like [1,?], which is different from the training phase ([1,15004]). And this caused the tf.layers.dense functions to fail because it cannot do the dense without knowing the actual shape of the tensor.
The only difference between training and testing from my perspective is the num_epochs, but that shouldn't affect the input shape right? I don't understand why Tensorflow can reshape the tensor with solid values during training while it thinks the testing data input set are dynamic?
Please help and thanks for taking the time reading my question.
What you are doing is flattening the input of multiple batches to a single feature vector of size 15004. What you are most probably trying to accomplish is to reduce the dimension of your feature to a 2D vector with shape (Batches, Nr Features), where Batches is dynamic. There are two common ways to do this. The easiest is to use the flatten layer from tf.contrib like this:
input_layer = tf.contrib.layers.flatten(features["x"])
or you can reshape in such a way that the batch dimension is still dynamic, but then you have to calculate the shape of your input like this:
num_dimensions = features["x"].shape.as_list[1] * features["x"].shape.as_list[2] ...
input_layer = tf.reshape(features["x"], [-1, num_dimensions])

tf.nn.conv2d vs tf.layers.conv2d

Is there any advantage in using tf.nn.* over tf.layers.*?
Most of the examples in the doc use tf.nn.conv2d, for instance, but it is not clear why they do so.
As GBY mentioned, they use the same implementation.
There is a slight difference in the parameters.
For tf.nn.conv2d:
filter: A Tensor. Must have the same type as input. A 4-D tensor of shape [filter_height, filter_width, in_channels, out_channels]
For tf.layers.conv2d:
filters: Integer, the dimensionality of the output space (i.e. the number of filters in the convolution).
I would use tf.nn.conv2d when loading a pretrained model (example code: https://github.com/ry/tensorflow-vgg16), and tf.layers.conv2d for a model trained from scratch.
For convolution, they are the same. More precisely, tf.layers.conv2d (actually _Conv) uses tf.nn.convolution as the backend. You can follow the calling chain of: tf.layers.conv2d>Conv2D>Conv2D.apply()>_Conv>_Conv.apply()>_Layer.apply()>_Layer.\__call__()>_Conv.call()>nn.convolution()...
As others mentioned the parameters are different especially the "filter(s)". tf.nn.conv2d takes a tensor as a filter, which means you can specify the weight decay (or maybe other properties) like the following in cifar10 code. (Whether you want/need to have weight decay in conv layer is another question.)
kernel = _variable_with_weight_decay('weights',
shape=[5, 5, 3, 64],
stddev=5e-2,
wd=0.0)
conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
I'm not quite sure how to set weight decay in tf.layers.conv2d since it only take an integer as filters. Maybe using kernel_constraint?
On the other hand, tf.layers.conv2d handles activation and bias automatically while you have to write additional codes for these if you use tf.nn.conv2d.
All of these other replies talk about how the parameters are different, but actually, the main difference of tf.nn and tf.layers conv2d is that for tf.nn, you need to create your own filter tensor and pass it in. This filter needs to have the size of: [kernel_height, kernel_width, in_channels, num_filters]
Essentially, tf.nn is lower level than tf.layers. Unfortunately, this answer is not applicable anymore is tf.layers is obselete
DIFFERENCES IN PARAMETER:
Using tf.layer* in a code:
# Convolution Layer with 32 filters and a kernel size of 5
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
Using tf.nn* in a code:
( Notice we need to pass weights and biases additionally as parameters )
strides = 1
# Weights matrix looks like: [kernel_size(=5), kernel_size(=5), input_channels (=3), filters (= 32)]
# Similarly bias = looks like [filters (=32)]
out = tf.nn.conv2d(input, weights, padding="SAME", strides = [1, strides, strides, 1])
out = tf.nn.bias_add(out, bias)
out = tf.nn.relu(out)
Take a look here:tensorflow > tf.layers.conv2d
and here: tensorflow > conv2d
As you can see the arguments to the layers version are:
tf.layers.conv2d(inputs, filters, kernel_size, strides=(1, 1), padding='valid', data_format='channels_last', dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer=None, bias_initializer=tf.zeros_initializer(), kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, trainable=True, name=None, reuse=None)
and the nn version:
tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)
I think you can choose the one with the options you want/need/like!

Finding gradient of a Caffe conv-filter with regards to input

I need to find the gradient with regards to the input layer for a single convolutional filter in a convolutional neural network (CNN) as a way to visualize the filters.
Given a trained network in the Python interface of Caffe such as the one in this example, how can I then find the gradient of a conv-filter with respect to the data in the input layer?
Edit:
Based on the answer by cesans, I added the code below. The dimensions of my input layer is [8, 8, 7, 96]. My first conv-layer, conv1, has 11 filters with a size of 1x5, resulting in the dimensions [8, 11, 7, 92].
net = solver.net
diffs = net.backward(diffs=['data', 'conv1'])
print diffs.keys() # >> ['conv1', 'data']
print diffs['data'].shape # >> (8, 8, 7, 96)
print diffs['conv1'].shape # >> (8, 11, 7, 92)
As you can see from the output, the dimensions of the arrays returned by net.backward() are equal to the dimensions of my layers in Caffe. After some testing I've found that this output is the gradients of the loss with regards to respectively the data layer and the conv1 layer.
However, my question was how to find the gradient of a single conv-filter with respect to the data in the input layer, which is something else. How can I achieve this?
Caffe net juggles two "streams" of numbers.
The first is the data "stream": images and labels pushed through the net. As these inputs progress through the net they are converted into high-level representation and eventually into class probabilities vectors (in classification tasks).
The second "stream" holds the parameters of the different layers, the weights of the convolutions, the biases etc. These numbers/weights are changed and learned during the train phase of the net.
Despite the fundamentally different role these two "streams" play, caffe nonetheless use the same data structure, blob, to store and manage them.
However, for each layer there are two different blobs vectors one for each stream.
Here's an example that I hope would clarify:
import caffe
solver = caffe.SGDSolver( PATH_TO_SOLVER_PROTOTXT )
net = solver.net
If you now look at
net.blobs
You will see a dictionary storing a "caffe blob" object for each layer in the net. Each blob has storing room for both data and gradient
net.blobs['data'].data.shape # >> (32, 3, 224, 224)
net.blobs['data'].diff.shape # >> (32, 3, 224, 224)
And for a convolutional layer:
net.blobs['conv1/7x7_s2'].data.shape # >> (32, 64, 112, 112)
net.blobs['conv1/7x7_s2'].diff.shape # >> (32, 64, 112, 112)
net.blobs holds the first data stream, it's shape matches that of the input images up to the resulting class probability vector.
On the other hand, you can see another member of net
net.layers
This is a caffe vector storing the parameters of the different layers.
Looking at the first layer ('data' layer):
len(net.layers[0].blobs) # >> 0
There are no parameters to store for an input layer.
On the other hand, for the first convolutional layer
len(net.layers[1].blobs) # >> 2
The net stores one blob for the filter weights and another for the constant bias. Here they are
net.layers[1].blobs[0].data.shape # >> (64, 3, 7, 7)
net.layers[1].blobs[1].data.shape # >> (64,)
As you can see, this layer performs 7x7 convolutions on 3-channel input image and has 64 such filters.
Now, how to get the gradients? well, as you noted
diffs = net.backward(diffs=['data','conv1/7x7_s2'])
Returns the gradients of the data stream. We can verify this by
np.all( diffs['data'] == net.blobs['data'].diff ) # >> True
np.all( diffs['conv1/7x7_s2'] == net.blobs['conv1/7x7_s2'].diff ) # >> True
(TL;DR) You want the gradients of the parameters, these are stored in the net.layers with the parameters:
net.layers[1].blobs[0].diff.shape # >> (64, 3, 7, 7)
net.layers[1].blobs[1].diff.shape # >> (64,)
To help you map between the names of the layers and their indices into net.layers vector, you can use net._layer_names.
Update regarding the use of gradients to visualize filter responses:
A gradient is normally defined for a scalar function. The loss is a scalar, and therefore you can speak of a gradient of pixel/filter weight with respect to the scalar loss. This gradient is a single number per pixel/filter weight.
If you want to get the input that results with maximal activation of a specific internal hidden node, you need an "auxiliary" net which loss is exactly a measure of the activation to the specific hidden node you want to visualize. Once you have this auxiliary net, you can start from an arbitrary input and change this input based on the gradients of the auxilary loss to the input layer:
update = prev_in + lr * net.blobs['data'].diff
You can get the gradients in terms of any layer when you run the backward() pass. Just specify the list of layers when calling the function. To show the gradients in terms of the data layer:
net.forward()
diffs = net.backward(diffs=['data', 'conv1'])`
data_point = 16
plt.imshow(diffs['data'][data_point].squeeze())
In some cases you may want to force all layers to carry out backward, look at the force_backward parameter of the model.
https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto

Categories

Resources