Pytorch LSTM grad only on last output - python

I'm working with sequences of different lengths. But I would only want to grad them based on the output computed at the end of the sequence.
The samples are ordered so that they are decreasing in length and they are zero-padded. For 5 1D samples it looks like this (omitting width dimension for visibility):
array([[5, 7, 7, 4, 5, 8, 6, 9, 7, 9],
[6, 4, 2, 2, 6, 5, 4, 2, 2, 0],
[4, 6, 2, 4, 5, 1, 3, 1, 0, 0],
[8, 8, 3, 7, 7, 7, 9, 0, 0, 0],
[3, 2, 7, 5, 7, 0, 0, 0, 0, 0]])
For the LSTM I'm using nn.utils.rnn.pack_padded_sequence with the individual sequence lengths:
x = nn.utils.rnn.pack_padded_sequence(x, [10, 9, 8, 7, 5], batch_first=True)
The initialization of LSTM in the Model constructor:
self.lstm = nn.LSTM(width, n_hidden, 2)
Then I call the LSTM and unpack the values:
x, _ = self.lstm(x)
x = nn.utils.rnn.pad_packed_sequence(x1, batch_first=True)
Then I'm applying a fully connected layer and a softmax
x = x.contiguous()
x = x.view(-1, n_hidden)
x = self.linear(x)
x = x.reshape(batch_size, n_labels, 10) # 10 is the sample height
return F.softmax(x, dim=1)
This gives me an output of shape batch x n_labels x height (5x12x10).
For each sample, I would only want to use a single score, for the last output batch x n_labels (5*12). My question is How can I achieve this?
One idea is to apply tanh on the last hidden layer returned from the model but I'm not quite sure if that would give the same results. Is it possible to efficiently extract the output computed at the end of the sequence eg using the same lengths sequence used for pack_padded_sequence?

As Neaabfi answered hidden[-1] is correct. To be more specific to your question, as the docs wrote:
output, (h_n, c_n) = self.lstm(x_pack) # batch_first = True
# h_n is a vector of shape (num_layers * num_directions, batch, hidden_size)
In your case, you have a stack of 2 LSTM layers with only forward direction, then:
h_n shape is (num_layers, batch, hidden_size)
Probably, you may prefer the hidden state h_n of the last layer, then **here is what you should do:
output, (h_n, c_n) = self.lstm(x_pack)
h = h_n[-1] # h of shape (batch, hidden_size)
y = self.linear(h)
Here is the code which wraps any recurrent layer LSTM, RNN or GRU into DynamicRNN. DynamicRNN has a capacity of performing recurrent computations on sequences of varied lengths without any care about the order of lengths.

You can access the last hidden layer as follows:
output, (hidden, cell) = self.lstm(x_pack)
y = self.linear(hidden[-1])

Related

Why do I have the error: Given groups=1, weight of size [8, 1024, 1, 1], expected input[8, 304, 9, 40] to have 1024 channels, but got 304 channels

I am working with yolostereo3d for stereo3d object detection (solely stereo camera, no velydone)on kitti dataset with edgeNext as the backbone instead of resNet.
Before changing the backbone from resNet to edgeNext with the same kitti dataset, everything was working fine. However, I started having the below error afterwards:
RuntimeError: Given groups=1, weight of size [8, 1024, 1, 1], expected input[8, 304, 9, 40] to have 1024 channels, but got 304 channels instead
Here is how I changed the backbone:
class YoloStereo3DCore(nn.Module):
"""
Inference Structure of YoloStereo3D
Similar to YoloMono3D,
Left and Right image are fed into the backbone in batch. So they will affect each other with BatchNorm2d.
"""
def __init__(self, backbone_arguments):
f = open("/home/zakaseb/Thesis/YoloStereo3D/Stereo3D/Sequence.txt", "a")
f.write("yolosterero3dCore_init \n")
f.close()
super(YoloStereo3DCore, self).__init__()
self.backbone =edgenext_small(**backbone_arguments) # Resnet, change backbone from here
base_features = 256 #if backbone_arguments['depth'] > 34 else 64 # meaning which depth of resnet
self.neck = StereoMerging(base_features) #stereomerging outputs features and depth output.
Here is the edgenext_small()
#BACKBONE_DICT.register_module
def edgenext_small(pretrained=False, **kwargs):
FPS # BS=1: 93.84 & # BS=256: 1785.92 for MobileViT_S
model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[48, 96, 160, 304], expan_ratio=4,
global_block=[0, 1, 1, 1],
global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
use_pos_embd_xca=[False, True, False, False],
kernel_sizes=[3, 5, 7, 9],
d2_scales=[2, 2, 3, 4],
classifier_dropout=0.0)
return model
As you can see your backbone returns feature map with 304 channels but the next layer expect 1024 channels. For now there are two solutions:
If your backbone is designed from scratch you can adapt its architecture such that dims argument of EdgeNeXt ends with 1024. Typically:
model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[48, 96, 160, 1024], expan_ratio=4,
global_block=[0, 1, 1, 1],
global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
use_pos_embd_xca=[False, True, False, False],
kernel_sizes=[3, 5, 7, 9],
d2_scales=[2, 2, 3, 4],
classifier_dropout=0.0)
If your backbone has a fixed architecture (for instance if you are using a pre-trained model), you should adapt the architecture after the backbone. For instance you can add a torch.Conv2D(304, 1024, kernel_size) to learn 1024 features from the 304. Otherwise you can change the architecture of the next layer.

I have trouble with the shape of the input into an Keras model

I have a model with 5 input nodes and 1 output node.
model = keras.models.Sequential()
model.add(keras.layers.Dense(5, input_shape=(5, ), activation='relu'))
model.add(keras.layers.Dense(5, activation='relu'))
model.add(keras.layers.Dense(1, activation='exponential'))
model.compile(optimizer="sgd", loss="mean_squared_error")
I am trying to train a batch with these inputs.
input1 = [1, 4, 2, 4, 5]
input2 = [1, 4, 3, 5, 1]
input3 = [1, 4, 3, 3, 2]
input_batch = np.array([input1, input2, input3])
output1 = 2.5
output2 = 3.9
output3 = 1.3
output_batch = np.array([output1, output2, output3])
model.train_on_batch(input_batch, output_batch)
print(model.predict(np.array([1, 5, 2, 3, 1])))
This does not seem to be working so I need some help on how to shape the numpy array so that it fits into the model.
Here's the error message:
ValueError: Error when checking input: expected dense_input to have shape (5,) but got array with shape (1,)
The last dimension of your output is 1, so you have to change labels. Another problem - predict working on batches. So you have to add a batch dimension.
Change these lines:
model.train_on_batch(input_batch, output_batch[..., tf.newaxis])
print(model.predict(np.array([[1, 5, 2, 3, 1]]))) # <= add brackets

How to perform convolution with constant filter in tensorflow/keras

At a certain stage in a resnet, I have 6 features per image i.e. each example is of shape 1X8X8X6, I want to involve each feature with 4 constant filters (DWT) of size 1X2X2X1 with a stride of 2 to get 24 features in next layer and the image to become 1X4X4X24. However, I am unable to use tf.nn.conv2d or tf.nn.convolution for this purpose, conv2d says fourth dimension of input be equal to 3rd dimension of the filter, but how can I do this, I tried doing for the first filter but even this doesn't work:
x_in = np.random.randn(1,8,8,6)
kernel_in = np.array([[[[1],[1]],[[1],[1]]]])
kernel_in.shape
x = tf.constant(x_in, dtype=tf.float32)
kernel = tf.constant(kernel_in, dtype=tf.float32)
tf.nn.convolution(x, kernel, strides=[1, 1, 1, 1], padding='VALID')
try in this way
x_in = np.random.randn(1,8,8,6) # [batch, in_height, in_width, in_channels]
kernel_in = np.ones((2,2,6,24)) # [filter_height, filter_width, in_channels, out_channels]
x = tf.constant(x_in, dtype=tf.float32)
kernel = tf.constant(kernel_in, dtype=tf.float32)
tf.nn.conv2d(x, kernel, strides=[1, 2, 2, 1], padding='VALID')
# <tf.Tensor: shape=(1, 4, 4, 24), dtype=float32, numpy=....>
A simple example of how to fill predefined values to filters in a Keras.conv2d layer in TF2:
model = models.Sequential()
# one 3x3 filter
model.add(layers.Conv2D(1, (3, 3), input_shape=(None, None, 1)))
# access to the target layer
layer = model.layers[0]
current_w, current_bias = layer.get_weights() # see the current weights
new_w = tf.constant([[1,2, 3],
[4, 5, 6],
[7, 8, 9]])
new_w = tf.reshape(new_w, custom_w.shape) # fix the shape
new_bias = tf.constant([0])
layer.set_weights([new_w, new_bias])
model.summary()
# let's see ..
tf.print(model.layers[0].get_weights())

Predicting values using TFLearn neural networks

I am new to TFLearn and I am trying out a simple neural network to predict the output array value when an input array is given.
Actual input for this code would be either pixel values of a grayscale image or features extracted from a grayscale image. Hence the input is in a 2d array format. The output would be the predicted color for each pixel.
In the example code I have used two random arrays of size 9. I need to train the network to predict the 't_y' array when 't_x' array is given as input.
The code runs, but the prediction is very poor.
The code has been adapted from MNIST example of TFLearn found here
This is my code
from random import randint
import numpy as np
import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.estimator import regression
#input
t_x = [3, 8, 7, 4, 0, 7, 9, 5, 1]
#output
t_y = [9, 5, 1, 4, 7, 9, 7, 3, 6]
x = []
y = []
for i in range(1000):
x.append(t_x)
y.append(t_y)
#array of input values
x = np.reshape(x,(-1,3,3,1))
#array of output values
y = np.reshape(y,(-1,9))
network = input_data(shape=[None, 3, 3, 1], name='input')
network = conv_2d(network, 32, 3, activation='relu', regularizer="L2")
network = max_pool_2d(network, 2)
network = local_response_normalization(network)
network = conv_2d(network, 64, 3, activation='relu', regularizer="L2")
network = max_pool_2d(network, 2)
network = local_response_normalization(network)
network = fully_connected(network, 128, activation='tanh')
network = dropout(network, 0.8)
network = fully_connected(network, 256, activation='tanh')
network = dropout(network, 0.8)
network = fully_connected(network, 9, activation='softmax')
network = regression(network, optimizer='adam', learning_rate=0.01,
loss='categorical_crossentropy', name='target')
# Training
model = tflearn.DNN(network, tensorboard_verbose=0)
model.fit({'input': x}, {'target': y}, n_epoch=20)
pred = model.predict(np.reshape(t_x,(-1,3,3,1)))
print "Prediction :", pred[0]
I am assuming it has something to do with the parameter values specified in the 'conv_2d' and 'fully_connnected' functions.
What values would I have to set to get an accurate prediction ?
Format of output
The last layer of your code (fully_connected(network, 9, activation='softmax')) results in 9 neurons with a softmax function, i.e. normalised so that their total sum will add up to 1. This is generally usable (and used in MNIST) for selecting/optimizing a function that selects one of 9 possible output values - the network will output something like [0.01 0.01 0.01 0.9 0.03 0.01 0.01 0.01 0.01], "predicting" that the fourth value is the correct one, and this would be matched against a one-hot target vector (e.g. [0 0 0 1 0 0 0 0 0]).
Needless to say, a softmax output cannot ever be equal to [9, 5, 1, 4, 7, 9, 7, 3, 6], and not even close to that, since the output of all values softmax will add up to 1. Even the layer before that cannot output such values since tanh can only produce values between -1 and 1, and can't ever result in 9.
If you want to predict 9 numbers in the range 1-9, then you might want to use a fully connected layer instead of softmax, and scale your output so that the expected output is in the range of 0 to 1. There's more to that, but this would be a good start.

Tensorflow cnn error: logits and labels must be same size:

I'm trying to create a CNN using Tensorflow that classifies images into 16 classes.
My original image size is 72x72x1, and my network is structured like this:
# Network
n_input = dim
n_output = nclass # 16
weights = {
'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32], stddev=0.1)),
'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64], stddev=0.1)),
'wd1': tf.Variable(tf.random_normal([9*9*128, 1024], stddev=0.1)),
'wd2': tf.Variable(tf.random_normal([1024, n_output], stddev=0.1))
}
biases = {
'bc1': tf.Variable(tf.random_normal([32], stddev=0.1)),
'bc2': tf.Variable(tf.random_normal([64], stddev=0.1)),
'bd1': tf.Variable(tf.random_normal([1024], stddev=0.1)),
'bd2': tf.Variable(tf.random_normal([n_output], stddev=0.1))
}
Here is my conv net function:
def conv_basic(_input, _w, _b, _keepratio):
# Input
_input_r = tf.reshape(_input, shape=[-1, 72, 72, 1])
# Conv1
_conv1 = tf.nn.relu(tf.nn.bias_add(
tf.nn.conv2d(_input_r, _w['wc1'], strides=[1, 1, 1, 1], padding='SAME')
, _b['bc1']))
_pool1 = tf.nn.max_pool(_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
mean, var = tf.nn.moments(_pool1, [0, 1, 2])
_pool1 = tf.nn.batch_norm_with_global_normalization(_pool1, mean, var, 1., 0., 1e-7, 0)
_pool_dr1 = tf.nn.dropout(_pool1, _keepratio)
# Conv2
_conv2 = tf.nn.relu(tf.nn.bias_add(
tf.nn.conv2d(_pool_dr1, _w['wc2'], strides=[1, 1, 1, 1], padding='SAME')
, _b['bc2']))
_pool2 = tf.nn.max_pool(_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
mean, var = tf.nn.moments(_pool2, [0, 1, 2])
_pool2 = tf.nn.batch_norm_with_global_normalization(_pool2, mean, var, 1., 0., 1e-7, 0)
_pool_dr2 = tf.nn.dropout(_pool2, _keepratio)
# Vectorize
_dense1 = tf.reshape(_pool_dr2, [-1, _w['wd1'].get_shape().as_list()[0]])
# Fc1
_fc1 = tf.nn.relu(tf.add(tf.matmul(_dense1, _w['wd1']), _b['bd1']))
_fc_dr1 = tf.nn.dropout(_fc1, _keepratio)
# Fc2
_out = tf.add(tf.matmul(_fc_dr1, _w['wd2']), _b['bd2'])
# Return everything
out = {
'input_r': _input_r,
'conv1': _conv1,
'pool1': _pool1,
'pool1_dr1': _pool_dr1,
'conv2': _conv2,
'pool2': _pool2,
'pool_dr2': _pool_dr2,
'dense1': _dense1,
'fc1': _fc1,
'fc_dr1': _fc_dr1,
'out': _out
}
return out
When I try to run this, I get an error: "tensorflow.python.framework.errors.InvalidArgumentError: logits and labels must be same size: logits_size=[6,16] labels_size=[1,16]"
on the line cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(_pred, y))
I've tried changing the wd1 weight values around, and apart from saying that requested shape requires a multiple of xxx, it just changes the values in the brackets.
These values (especially 6) seem very arbitrary, idk where they are coming from. It would be nice for someone to explain to me how FC layer neuron amounts are chosen, as it also seems a bit arbitrary.
Thanks
EDIT: My full code https://gist.github.com/EricZeiberg/f0b138d859b9ed00ce045dc6b341e0a7
Given your code (and guessing what is missing in it), I think you have these parameters and results (correct me if wrong):
batch_size: 1
num_classes: 16
labels y: type int, shape [batch_size, 1]
outputs _pred: type float32, should be shape [batch_size, num_classes]
In your code, you only use 2 max pooling, which reduce the input feature map from [1, 72, 72, 1] to [1, 18, 18, 64].
At this step, you should write:
# Vectorize
_dense1 = tf.reshape(_pool_dr2, [1, 18*18*64])
You also should replace your matrix wd1 with:
'wd1': tf.Variable(tf.random_normal([18*18*64, 1024], stddev=0.1))
In general in these situations, you need to print each shape, step after step, and realize by yourself where the shape doesn't correspond to what you expect.
Its hard to tell from what you provided, but it seems like you feed inputs with a batch size of 6, but only provide one label for them. Where does your data come from?

Categories

Resources