Lstm for continuos video - python

Can we input a continuous video which contains sequences of both positive classes and negative classes to train LSTM, on several thousand of such videos?
My overall objective is to mark videos realtime with particular
scenes(e.g. if I’ve 0-100 frames and frame number 30-60 contains some
yoga scenes, I need to mark them)
Right now the approach which I’m following is to split the video into two categories of positive sequences and negative sequences and train LSTM (on top of Mobnet CNN, FC replaced by LSTM layers).
But somehow this does not give any improvement compared to Mobnet
alone when we run evaluation on non-split videos.
Both Mobnet and LSTM are trained separately. I save output of Mobnet(FC removed) in numpy arrays and then read these arrays for training LSTM.
Here is the sample of code used for this approach:
epochs = 250
batch_size = 128
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
in_size = 1024
classes_no = 2
hidden_size = 512
layer_no = 2
self.lstm = nn.LSTM(in_size, hidden_size, layer_no, batch_first=True)
self.linear = nn.Linear(hidden_size, classes_no)
def forward(self, input_seq):
output_seq, _ = self.lstm(input_seq)
last_output = output_seq[:,-1]
class_predictions = self.linear(last_output)
return class_predictions
def nploader(npfile):
a = np.load(npfile)
return a
def train():
npdataloader = torchvision.datasets.DatasetFolder('./featrs/',
nploader, ['npy'], transform=None, target_transform=None)
data_loader =,
model = Model().cuda()
loss = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.001)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=100, gamma=0.8)
for epoch in range(0, epochs):
for input_seq, target in data_loader:
output = model(input_seq.cuda())
err = loss(output.cuda(), target.cuda())
scheduler.step(), 'lstm.ckpt')


What it means when your model can't overfit a small batch of data?

I am trying to train RNN model to classify sentences into 4 classes, but it doesn’t seem to work. I tried to overfit 4 examples (blue line) which worked, but even as little as 8 examples (red line) is not working, let alone the whole dataset.
I tried different learning rates and sizes of hidden_size and embedding_size but it doesn’t seem to help, what am I missing? I know that if the model is not able to overfit small batch it means the capacity should be increased but in this case increasing capacity has no effect.
The architecture is as follows:
class RNN(nn.Module):
def __init__(self, embedding_size=256, hidden_size=128, num_classes=4):
self.embedding = nn.Embedding(len(vocab), embedding_size, 0)
self.rnn = nn.RNN(embedding_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, num_classes)
def forward(self, x):
#x=[batch_size, sequence_length]
x = self.embedding(x) #x=[batch_size, sequence_length, embedding_size]
_, h_n = self.rnn(x) #h_n=[1, batch_size, hidden_size]
h_n = h_n.squeeze(0)
out = self.fc(h_n) #out=[batch_size, num_classes]
return out
Input data is tokenized sentences, padded with 0 to the longest sentence in the batch, so as an example one sample would be: [2784, 9544, 1321, 120, 0, 0]. The data is from AG_NEWS dataset from torchtext datasets.
The training code:
model = RNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=LR)
for epoch in range(NUM_EPOCHS):
epoch_losses = []
correct_predictions = []
for batch_idx, (labels, texts) in enumerate(train_loader):
scores = model(texts)
loss = criterion(scores, labels)
correct = (scores.max(1).indices==labels).sum()
epoch_avg_loss = sum(epoch_losses)/len(epoch_losses)
epoch_avg_accuracy = float(sum(correct_predictions))/float(len(labels))
The issue was due to the vanishing gradient.

Pytorch GRU Trained on one class to Predict Unlabelled Data

I am creating a GRU to predict if data derived from traffic packets from a device is considered safe or anomalous. I plan to do this by training a model only on safe/ normal operating data and then having it check what it considers new unseen traffic to be (testing). I wish to only train on the safe data (one class) as an attack could take many forms and I don't want to train the model on labeled attack data and then have it miss an attack type that I didn't train it on (basically I want to overfit on the normal operating data). As such I need it to be able to check if the incoming unlabeled data matches the one class it has already trained on (i.e. does the incoming data match the normal operation of the device) or if it is anomalous.
The issue I am having is that as the model is being trained on only one class it is having trouble differentiating the anomalous unseen data from normal data and considers virtually all data that it sees as normal (same as the class it trained on).
As such I would appreciate it if anyone has any ideas or could point out flaws in the way I have implemented by model.
# Imports
import pandas as pd
import numpy as np
import torch
import torchvision # torch package for vision related things
import torch.nn.functional as F # Parameterless functions, like (some) activation functions
import torchvision.datasets as datasets # Standard datasets
import torchvision.transforms as transforms # Transformations we can perform on our dataset for augmentation
from torch import optim # For optimizers like SGD, Adam, etc.
from torch import nn # All neural network modules
from import Dataset, DataLoader # Gives easier dataset managment by creating mini batches etc.
from tqdm import tqdm # For a nice progress bar
from sklearn.preprocessing import StandardScaler
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Hyperparameters
input_size = 24
hidden_size = 128
num_layers = 1
num_classes = 2
sequence_length = 1
learning_rate = 0.005
batch_size = 8
num_epochs = 5
# Recurrent neural network with GRU (many-to-one)
class RNN_GRU(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(RNN_GRU, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.gru = nn.GRU(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size * sequence_length, num_classes)
def forward(self, x):
# Set initial hidden and cell states
x = x.unsqueeze(1)
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
# Forward propagate GRU
out, _ = self.gru(x, h0)
out = out[:, -1, :]
# Decode the hidden state of the last time step
out = self.fc(out)
return out
class MyDataset(Dataset):
def __init__(self,file_name):
def __len__(self):
return len(self.y_train)
def __getitem__(self,idx):
return self.x_train[idx],self.y_train[idx]
# Initialize network
model = RNN_GRU(input_size, hidden_size, num_layers, num_classes).to(device)
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
# Train Network
for epoch in range(num_epochs):
for batch_idx, (data, targets) in enumerate(tqdm(train_loader)):
# Get data to cuda if possible
data =
targets =
targets =
# forward
scores = model(data)
loss = criterion(scores, targets)
# backward
# gradient descent update step/adam step
# Check accuracy on training & test to see how good our model
def check_accuracy(loader, model):
num_correct = 0
num_samples = 0
# Set model to eval
with torch.no_grad():
for x, y in loader:
x =
y =
scores = model(x)
_, predictions = scores.max(1)
num_correct += (predictions == y).sum()
num_samples += predictions.size(0)
# Toggle model back to train
return num_correct / num_samples
print(f"Accuracy on training set: {check_accuracy(train_loader, model)*100:.2f}%")
print(f"Accuracy on test set: {check_accuracy(test_loader, model)*100:.2f}%")

LSTM-CNN to classify sequences of images

I got an assignment and stuck with it while going down the rabbit hole of learning PyTorch, LSTM and cnn.
Provided the well known MNIST library I take combinations of 4 numbers and per combination it falls down into one of 7 labels.
1111 label 1 (follow a constant trend)
1234 label 2 increasing trend
4321 label 3 decreasing trend
7382 label 7 decreasing trend - increasing trend - decreasing trend
The shape of my tensor after loading of the tensor become (3,4,28,28) where the 28 comes from the MNIST image's width and height. 3 is the batch size and 4 is the channels (4 images).
I'm somewhat stuck with how to pass this into a PyTorch backed LSTM and CNN as basically all Google searches lead to articles where simply one image is passed in.
I was thinking of reshaping it to 1 long array of (pixel values) where I put all of the values of the first image row by row (28) after each other, then appended by the same approach for the second, third and fourth image. So that would make 4 * 28 * 28 = 3136.
Is my way of thinking on how to tackle this a correct one or should I rethink? I'm rather new to this all and looking for some guidance on how to go forward. I've been reading loads of articles, YT videos, ... but all seem to touch the basic stuff or alternatives of the same subject.
I have written some code but running it gives errors.
import numpy as np
import torch
import torch.nn as nn
from torch import optim, softmax
from sklearn.model_selection import train_test_split
#dataset = sequences of 4 MNIST images each
#datalabels =7
x_train, x_test, y_train, y_test = train_test_split(, dataset.data_label, test_size=0.15,
class Mylstm(nn.Module):
def __init__(self, input_size, hidden_size, n_layers, n_classes):
super(Mylstm, self).__init__()
self.input_size = input_size
self.n_layers = n_layers
self.hidden_size = hidden_size
self.lstm = nn.LSTM(input_size, hidden_size, n_layers, batch_first=True)
# readout layer
self.fc = nn.Linear(hidden_size, n_classes)
def forward(self, x):
# Initialize hidden state with zeros
h0 = torch.zeros(self.n_layers, x.size(0), self.hidden_size).requires_grad_()
# initialize the cell state:
c0 = torch.zeros(self.n_layers, x.size(0), self.hidden_size).requires_grad_()
out, (h_n, h_c) = self.lstm(x, (h0.detach(), c0.detach()))
x = h_n[-1, :, 1]
x = self.fc(x)
x = softmax(x, dim=1)
return x
input_size = 28
hidden_size = 256
sequence_length = 28
n_layers = 2
n_classes = 7
learning_rate = 0.001
model = Mylstm(input_size, hidden_size, n_layers, n_classes)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
bs = 0
num_epochs = 5
if np.mod(x_train.shape[0], batch_size) == 0.0:
iter = int(x_train.shape[0] / batch_size)
iter = int(x_train.shape[0] / batch_size) + 1
bs = 0
for i in range(iter):
sequences = x_test[bs:bs + batch_size, :]
labels = y_test[bs:bs + batch_size]
test_images = dataset.load_images(sequences)
bs += batch_size
for epoch in range(num_epochs):
for i in range(iter):
sequences = x_train[bs:bs + batch_size, :]
labels = y_train[bs:bs + batch_size]
input_images = dataset.load_images(sequences)
bs += batch_size
output = model(images)
# calculate Loss
loss = criterion(output, labels)
The error I'm currently getting is:
RuntimeError: input.size(-1) must be equal to input_size. Expected 28, got 784
Change your input size from 28 to 784. (784=28*28).
Input size argument is the number of features in one element of the sequence, so the number of feature of an mnist image, so the number of pixels which is width*hight of the image.

Why is a simple Binary classification failing in a feedforward neural network?

I am new to Pytorch. I was trying to model a binary classifier on the Kepler dataset. The following was my dataset class.
class KeplerDataset(Dataset):
def __init__(self, test=False):
self.dataframe_orig = pd.read_csv(koi_cumm_path)
if (test == False): = df_numeric[( df_numeric.koi_disposition == 1 ) | ( df_numeric.koi_disposition == 0 )].values
else: = df_numeric[~(( df_numeric.koi_disposition == 1 ) | ( df_numeric.koi_disposition == 0 ))].values
self.X_data = torch.FloatTensor([:, 1:])
self.y_data = torch.FloatTensor([:, 0])
def __len__(self):
return len(
def __getitem__(self, index):
return self.X_data[index], self.y_data[index]
Here, I created a custom classifier class with one hidden layer and a single output unit that produces sigmoidal probability of being in class 1 (planet).
class KOIClassifier(nn.Module):
def __init__(self, input_dim, out_dim):
super(KOIClassifier, self).__init__()
self.linear1 = nn.Linear(input_dim, 32)
self.linear2 = nn.Linear(32, 32)
self.linear3 = nn.Linear(32, out_dim)
def forward(self, xb):
out = self.linear1(xb)
out = F.relu(out)
out = self.linear2(out)
out = F.relu(out)
out = self.linear3(out)
out = torch.sigmoid(out)
return out
I then created a train_model function to optimize the loss using SGD.
def train_model(X, y):
criterion = nn.BCELoss()
optim = torch.optim.SGD(model.parameters(), lr=0.001)
n_epochs = 100
losses = []
for epoch in range(n_epochs):
y_pred = model.forward(X)
loss = criterion(y_pred, y)
losses = []
for X, y in train_loader:
losses.append(train_model(X, y))
But after performing the optimization over the train_loader, When I try predicting on the trainn_loader itself, the prediction values are so much worse.
for features, y in train_loader:
y_pred = model.predict(features)
> tensor([[4.5436e-02],
Why is my model not working properly? Is it the problem with the dataset or am I doing something wrong with implementing the Neural net? I will link my Kaggle notebook because more context might be helpful. Please help.
You are optimizing many times (100 steps) on the first batch (first samples), then moving to the next samples. It means that your model will overfit your few samples before going to the next batch. Then, your training will be very non smooth, diverge and go far from your global optimum.
Usually, in a training loop you should:
go over all samples (this is one epoch)
shuffle your dataset in order to visit your samples in a different order (set your pytorch training loader accordingly)
go back to 1. until you reach the max number of epochs
Also you should not define your optimizer each time (nor your criterion).
Your training loop should look like this:
criterion = nn.BCELoss()
optim = torch.optim.SGD(model.parameters(), lr=0.001)
n_epochs = 100
def train_model():
for X, y in train_loader:
y_pred = model.forward(X)
loss = criterion(y_pred, y)
for epoch in range(n_epochs):

Regression loss functions incorrect

I'm trying a basic averaging example, but the validation and loss don't match and the network fails to converge if I increase the training time. I'm training a network with 2 hidden layers, each 500 units wide on three integers from the range [0,9] with a learning rate of 1e-1, Adam, batch size of 1, and dropout for 3000 iterations and validate every 100 iterations. If the absolute difference between the label and the hypothesis is less than a threshold, here I set the threshold to 1, I consider that correct. Could someone let me know if this is an issue with the choice of loss function, something wrong with Pytorch, or something I'm doing. Below are some plots:
val_diff = 1
acc_diff = torch.FloatTensor([val_diff]).expand(self.batch_size)
Loop 100 times to during validation:
num_correct += torch.sum(torch.abs(val_h - val_y) < acc_diff)
Append after each validation phase:
validate.append(num_correct / total_val)
Here are some examples of the (hypothesis, and labels):
[...(-0.7043088674545288, 6.0), (-0.15691305696964264, 2.6666667461395264),
(0.2827358841896057, 3.3333332538604736)]
I tried six of the loss functions in the API that are typically used for regression:
Network code:
class Feedforward(nn.Module):
def __init__(self, topology):
super(Feedforward, self).__init__()
self.input_dim = topology['features']
self.num_hidden = topology['hidden_layers']
self.hidden_dim = topology['hidden_dim']
self.output_dim = topology['output_dim']
self.input_layer = nn.Linear(self.input_dim, self.hidden_dim)
self.hidden_layer = nn.Linear(self.hidden_dim, self.hidden_dim)
self.output_layer = nn.Linear(self.hidden_dim, self.output_dim)
self.dropout_layer = nn.Dropout(p=0.2)
def forward(self, x):
batch_size = x.size()[0]
feat_size = x.size()[1]
input_size = batch_size * feat_size
self.input_layer = nn.Linear(input_size, self.hidden_dim).cuda()
hidden = self.input_layer(x.view(1, input_size)).clamp(min=0)
for _ in range(self.num_hidden):
hidden = self.dropout_layer(F.relu(self.hidden_layer(hidden)))
output_size = batch_size * self.output_dim
self.output_layer = nn.Linear(self.hidden_dim, output_size).cuda()
return self.output_layer(hidden).view(output_size)
Training code:
def train(self):
if self.cuda:
dh = DataHandler(
# loss_fn = nn.L1Loss(size_average=False)
# loss_fn = nn.L1Loss()
# loss_fn = nn.SmoothL1Loss(size_average=False)
# loss_fn = nn.SmoothL1Loss()
# loss_fn = nn.MSELoss(size_average=False)
loss_fn = torch.nn.MSELoss()
losses = []
validate = []
hypos = []
labels = []
val_size = 100
val_diff = 1
total_val = float(val_size * self.batch_size)
for i in range(self.iterations):
x, y = dh.get_batch(self.batch_size)
x = self.tensor_to_Variable(x)
y = self.tensor_to_Variable(y)
loss = loss_fn(, y)
It looks like you've misunderstood how layers in pytorch works, here are a few tips:
In your forward when you do nn.Linear(...) you are definining new layers instead of using those you pre-defined in your network __init__. Therefore, it cannot learn anything as weights are constantly reinitalized.
You shouldn't need to call .cuda() inside net.forward(...) since you've already copied the network on gpu in your train by calling
Ideally the net.forward(...) input should directly have the shape of the first layer so you won't have to modify it. Here you should have x.size() <=> Linear -- > (Batch_size, Features).
Your forward should look close to this:
def forward(self, x):
x = F.relu(self.input_layer(x))
x = F.dropout(F.relu(self.hidden_layer(x)),
x = self.output_layer(x)
return x

